- What is IMPORTXML in Google Sheets?
- How to use IMPORTXML in Google Sheets?
- Google Sheets IMPORTXML is not working
If you use spreadsheets to store data from the web or other external sources, both Excel and Google Sheets offer built-in functions that allow you to import data without having to recur to copying and pasting from the web.
For example, the IMPORTHTML function in Google Sheets allows you to import data from actionable tables and lists on websites. Moreover, the Google Sheets IMPORTDATA function lets you import .csv and .tsv files from a URL into your spreadsheet.
In this article, you’ll learn what the IMPORTXML function is in Google Sheets, understand the syntax of the IMPORTXML formula, and how to use the IMPORTXML formula through examples. In case this function is not working, you can also find recommendations on how to solve this issue at the end.
What is IMPORTXML in Google Sheets?
The IMPORTXML function imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds. The IMPORTXML function allows you to import data from websites for data mining or web scraping.
Although this function works with most webs, there may be web owners that block these types of functions, as well as other bots, to access their data. In the last section, you can learn how to work around these issues.
The syntax for the IMPORTXML function is:
- url refers to the URL of the website. You can represent this value in the formula by including the protocol “http://” or “https://” and URL between double quotation marks or a reference to a cell containing the corresponding text.
- xpath_query is the parameter in charge of telling the function what data to import. For example, if you define this parameter as “//h1/@title”, you’ll get all the attributes from < h1 > tags on the website.
As you can see, XPath is a type of query language in itself that is used to retrieve data from the web. If you want to learn more about the XPath language, read this tutorial provided by W3Schools.
Google offers multiple options to import data from external sources. Here’s how to use IMPORTDATA to import .csv and .tsv files from a URL into Google Sheets.READ MORE
How to use IMPORTXML in Google Sheets?
The IMPORTXML function allows you to import data from websites directly into your Google Sheets. Now you’ll learn the different ways this function can be used through practical examples.
There are many ways to use the IMPORTXML function in Google Sheets. Here, you’ll learn the most widely used according to the type of XPath query.
Use the “//@href” to scrape all links
Web scraping is a common practice for businesses that want to grow with web data. For example, let’s say you’re looking into your competitor’s website and want to have a list of all the links used on their website. This is how you can use IMPORTXML to scrape all links on a website.
- 1. Go to your Google Sheets and type in “=IMPORTXML” to trigger the built-in function.
- 2. Type in the URL from the website you’d like to scrape all links. Include the URL between quotation marks.
- 3. Include the XPath query that will scrape all links, “//@href”, between quotations marks and close with parenthesis.
- 4. Press “Enter” to obtain all links from the website, as shown below.
Use the “//a[not(contains(@href, ‘example.com’))]/@href” to scrape external links
An external link is a hyperlink that targets a domain outside the source website; these usually have more impact on search engine rankings since they represent more of an objective stamp of approval for a website. This is how you can scrape an external link using the IMPORTXML function.
- 1. Repeat steps 1-3 from the previous section.
- 2. Include the XPath query that will scrape all links, “//a[not(contains(@href, ‘example.com’))]/@href”, between quotations marks and close with parenthesis.
- 3. Press “Enter” to obtain all external links from the website, as shown below.
How to Use IMPORTHTML in Google Sheets
Google offers multiple options to import data from external sources. Here’s how to use IMPORTHTML to import tables and lists from a website into Google Sheets.READ MORE
Use the “//link[@rel=’canonical’]/@href” to scrape the canonical link
A canonical link is the "preferred" version of a web page that webmasters specify to prevent duplicate content. This is how you can scrape a canonical link using the IMPORTXML function.
- 1. Repeat steps 1-3 from the previous section.
- 2. Include the XPath query that will scrape all links, “//link[@rel=’canonical’]/@href”, between quotations marks and close with parenthesis.
- 3. Press “Enter” to obtain the canonical link from the website, as shown below.
If you’re interested in importing financial data in your Google Sheets, you can combine the IMPORTXML function with either the GOOGLEFINANCE or YAHOOFINANCE functions. However, this will require using a split function that is a bit more complex.
Google Sheets IMPORTXML is not working
If you’ve encountered errors when using the IMPORTXML function, such as “Could not fetch the URL” or “Imported content is empty”, check the following:
- Is the URL correct? The data you’re trying to scrape may have been moved to another URL, so it’s worth double-checking.
- Is it the protocol “https” or “http”? The protocol may have changed as well without you noticing it. Sometimes, websites are not optimized to redirect from one to the other, so you’ll need to make sure to use the proper one.
- Does the website block scraping? Some web owners include blocks to stop others from reading or scraping their web content.
As you can see, the IMPORTXML function in Google Sheets can be a powerful function when it comes to importing data from external sources, especially websites. The example provided here can be applied to any domain. However, if you own a website or work with web data, this function is the fastest and easiest way to perform web scraping without having to leave your spreadsheet.
You’ve learned what the IMPORTXML function is and the syntax behind the IMPORTXML formula. Through a set of useful examples, you’ve also seen how the IMPORTXML function can be used and applied to your data. You should also feel comfortable fixing any issue that may occur in case the IMPORTXML doesn't seem to work properly.
If you want to learn more about how to import data, read the following articles below: