Step-by-step guide

Step 1: Load your CSV or Excel file on Datablist

Create a free account and import your data file. Datablist is a powerful CSV editor. Perfect for opening large CSV files or Excel files with a list of items.

Create a new collection and import your file.

Step 2: Select the "Smart Scraper" enrichment

Click on the "Enrich" button, and search for "Smart Scraper".

Smart Scraper
Smart Scraper

Step 3: Configure options and enable proxy if needed

The next step is to configure the scraper.

A nice feature is the option to automatically follow "About us" links. When enabled, the Smart Scraper scans the webpage to find links pointing to an "About us" page. It uses a list of common "About Us" paths, as well as link anchors analysis to match "About us" page patterns.

Another option is to define the use of a proxy. Some websites are protected from scraping or have rate limits. You can use a proxy automatically when the Smart Scraper receives an error. The proxy option is useful for e-commerce websites or if you are scraping several pages of the same website.

Note: The cost per URL to scrape is 0.50 credits with the proxy, and 0.10 credits without. When the proxy is used as a fallback, you won't be charged for the proxy if the URL returns a valid response with a simple scraping.

Connect WebPage URLs input
Connect WebPage URLs input

In the advanced settings, you can define blacklist terms to exclude emails or links to be scraped.

Step4: Select the column with the Webpages as inputs

Now, you need to select the column from your collection with the website or webpage to scrape.

Move to the "Input Property" section and select the property using the dropdown menu.

The enrichment returns extracted texts, email addresses, phone numbers, and social links. Create a property or map to an existing property to store the results.

When multiple phone numbers/links/emails are found, they are returned with a comma between each.

Connect Texts, Emails and Social links outputs
Connect Texts, Emails and Social links outputs

How to use the extracted texts with ChatGPT?

The Smart Scraper not only returns phone numbers, emails, social links, etc. It also returns an aggregation of relevant texts found on the scraped page, including on the "About us" page.

It automatically discards header texts, footer texts, etc., and tries to keep only texts that bring context information.

This text is perfect to be used as input in a ChatGPT prompt.

For example, you can use the extracted texts to segment websites that target B2B and B2C customers.

Write ChatGPT Prompt using extracted texts
Write ChatGPT Prompt using extracted texts

Giving you the following results:

B2B/B2C classification results
B2B/B2C classification results

Use Cases

Lead Generation

Gathering email addresses and LinkedIn profile links from webpages is perfect to enrich company data. Sales teams can use this information to reach out to relevant individuals or businesses with targeted sales pitches or marketing campaigns.

Recruitment and Talent Sourcing

HR professionals and recruiters can use scraped email addresses and LinkedIn profile links to identify potential candidates for job openings. This enables them to proactively reach out to qualified candidates and build a talent pipeline.