Tuesday, July 2, 2024

How To Use Google Sheets For Net Scraping With AI

Scraping information from webpages is a comparatively superior process that, till just lately, required a level of technical ability. The thought of diving into code or scripts for information extraction appeared overwhelming for a lot of, myself included.

Knowledge scraping can energy many search engine optimisation duties, equivalent to auditing, competitor evaluation, and inspecting web site and information construction.

Google sheets presents easy options to assist.

A type of options is the IMPORTXML perform that enables customers to scrape webpage information utilizing just some parameters. It makes information extraction accessible to a wider viewers, particularly to those that weren’t well-versed in programming languages.

Whereas this perform is spectacular, the true breakthrough got here with the adoption and integration of generative AI into the combination.

On this information, we’ll present you easy methods to use Google Sheets and AI, significantly ChatGPT, for internet scraping while not having superior coding abilities.

The Instruments: AI And Chatbots

We at the moment are all accustomed to AI, ChatGPT, and related chatbots.

Actually, many people use options like ChatGPT to put in writing our personal code, scripts, and applications with or without very restricted programming information.

It is so simple as offering detailed directions within the type of prompts and dealing with the chatbot to construct instruments that solely till just lately we believed have been means above us.

However most significantly, these are instruments which might be deeply altering the way in which we method our day-to-day work.

For instance, if we ask ChatGPT the next query, “What’s the IMPORTXML perform and the way can I exploit it in Google Sheets to scrape the title of an HTML webpage? Present the required code to try this in Google Sheets,” the response is extraordinarily correct. In a matter of seconds, we’ve got our formulation prepared to make use of in Google Sheets.

However to be trustworthy, that was a really primary and easy process that we may have simply accomplished with out ChatGPT.

The Job

So, how does this work if we wish to extract information that may be a bit much less commonplace in comparison with a web page title or description?

For instance, how does this work if we wish to extract the next information from the PPC entrance web page of Search Engine Journal?

Record all featured articles, their authors, the hyperlink URLs, and the article description for the columns listed on https://www.searchenginejournal.com/class/paid-media/pay-per-click/.

Can we try this straight with ChatGPT?

Executing With ChatGPT

When creating prompts, it took a number of makes an attempt to supply directions that have been detailed sufficient for the chatbot to completely perceive the target of the duty and return good outcomes.

In lots of circumstances, it felt just like the AI was underneath strain to return fast outcomes regardless of their accuracy.

However let me clarify.

The duty was to investigate the web page and record all featured articles, their authors, the hyperlink URLs, and the outline for every of the 30 articles listed on the web page. Then compile the information right into a desk and at last export it right into a CSV file.

Easy proper?

At first, ChatGPT returned only a pattern of seven articles and solely their titles and URLs; after a reworked immediate, it managed to record and export all 30 articles and their hyperlinks.

Now, that was good. So, to finish the duty, we simply wanted so as to add the authors and the article descriptions.

However right here is the place the bot stumbled and was not in a position to present an correct description of every article regardless of us offering examples of the web page aspect it wanted to seek out and duplicate.

ChatGPT stored ignoring the directions and offering its personal article descriptions time and time once more.

ChatGPT even failed once we tried with a special method and downloaded and uploaded a replica of the web page HTML.

ChatGPT extractScreenshot from ChatGPT, February 2024

This time, it was in a position to present correct information for seven articles however couldn’t go previous that. The problem reported:

“…the construction and content material of the web page current important challenges for complete information extraction in a single session.

The web page is kind of in depth and complicated, and it’s not possible to extract all 30 articles within the present format of interplay.”

ChatGPT extracting from 30 articlesScreenshot from ChatGPT, February 2024

ChatGPT + Google Sheets

So, going again to IMPORTXML and Google Sheets.

This time, getting ChatGPT to supply the formulation for every area was like a breeze.

 ChatGPT extracting instructionsScreenshot from ChatGPT, February 2024

Listed below are a number of the formulation, as prompt by the chatbot, that you could simply attempt your self in Google Sheets to extract:

Title

=IMPORTXML("https://www.searchenginejournal.com/class/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a")

Creator Title

=IMPORTXML("https://www.searchenginejournal.com/class/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[1]/a")

URL Hyperlink

=IMPORTXML("https://www.searchenginejournal.com/class/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a/@href")

Description

=IMPORTXML("https://www.searchenginejournal.com/class/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[2]")

Very quickly, we have been in a position to extract the information into the spreadsheet.

Google SheetsScreenshot from Google Sheets, February 2024

Moreover, by utilizing merely constructed nested formulation, we will rapidly pull the information from a number of pages on the similar time.

Within the instance under, I used to be in a position to extract the identical information associated to every article (title, writer, URL hyperlink, and outline) for the primary 10 pages of the PPC part.

The result’s a complete of 300 articles scraped in lower than a minute!

Google Sheets extract resultsScreenshot from Google Sheets, February 2024

Evaluating The Two

So, how do ChatGPT vs. ChatGPT + Google Sheets IMPORTXML evaluate?

In my expertise, I couldn’t discover a simple and fast means to make use of ChatGPT to scrape the information I used to be in search of – thoughts you, that doesn’t imply that this isn’t potential, and there could be a number of methods to do that, however I didn’t discover any.

What labored for me was a mixture of the totally different instruments, and that served me rather well for my supposed goal.

ChatGPT was extraordinarily helpful for writing the IMPORTXML formulation I wanted to make use of in Google Sheets, and people formulation did the remainder.

An extra bonus of the ChatGPT + Google Sheets choice is that you could simply use the free 3.5 model of ChatGPT and get the instrument to construct your IMPORTXML formulation, as an alternative of getting model 4 to scan the web page and extract the information.

Key Takeaway

This highlights a essential side of how AI has remodeled how we predict and work.

The most effective instrument for the job isn’t merely utilizing AI, Google Sheets, or any particular software program alone however fairly a mixture of instruments and abilities.

It’s on this built-in method that we develop workflows which might be environment friendly and efficient, thus enhancing our general productiveness.

Extra assets: 


Featured Picture: Visible Technology/Shutterstock

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles