Step-by-step guide

Step 1: Load your CSV or Excel file on Datablist

Create a free account and import your data file. Datablist is a powerful CSV editor. Perfect for opening large CSV files or Excel files with a list of items.

Create a new collection and import your file.

Step 2: Select the "Bulk Scraper" enrichment

Click on the "Enrich" button, and search for "Bulk Scraper".

Bulk Scraper
Bulk Scraper

Configure CSS Selectors

The Bulk Scraper uses two ways to extract data from the HTML page: CSS Selectors and Regular Expressions.

CSS selectors allow you to target specific parts of an HTML document to extract information.

A CSS selector is defined with the following information:

  • CSS Selector - The CSS path to the HTML element. Read this guide to learn how to write CSS Selector.
  • CSS Selector Content - Data to extract for the HTML element.
    • InnerText - Extract the text inside the HTML element. If the HTML element contains nested HTML elements, their texts are also extracted.
    • HTML - Extract the outer HTML code for the HTML element
    • Attribute - Extract a specific attribute text from the HTML element.
  • Selector Attribute - Available when the CSS Selector Content is set on Attribute. Define the attribute to extract. Example: href, rel, title.

Note: When several elements match a CSS selector, all the results are returned concatenated with a semicolon (;)

CSS Selector
CSS Selector

Selector Attribute field available on CSS Selector Content: Attribute.

CSS Selector Attribute
CSS Selector Attribute

Examples of CSS Selectors

To learn how to write CSS Selector paths, please read this guide.


Getting the text of an HTML element.

<div class="section product-data">
    <div class="product-name">New Phone</div>
</div>

The CSS selector would be .section.product-data .product-name with the CSS Selector Content to InnerText.


Getting the text of the first div after a custom HTML attribute.

<div data-testid="block-content">
    <div>Info To Scrape</div>
    <div>Useless Info</div>
    <div>Useless Info</div>
</div>

The CSS selector would be [data-testid="block-content"] > div:first-child with the CSS Selector Content to InnerText.


Getting the URLs for links:

<div class="social-media">
    <a href="https://fr.linkedin.com/company/datablist">Linkedin</a>
    <a href="https://www.twitter.com/datablist">Twitter</a>
</div>

The CSS selector would be .social-media with the CSS Selector Content to Attribute and the Attribute to href.


How to test CSS Selectors

An easy way to test your CSS Selectors before running them in bulk is to use your browser console.

To test for InnerText:

Array.from(document.querySelectorAll('{css-selector-path}')).map(elem => elem.textContent).join(';')
Test CSS Selector InnerText
Test CSS Selector InnerText

To test for HTML:

Array.from(document.querySelectorAll('{css-selector-path}')).map(elem => elem.outerHTML).join(';')
Test CSS Selector InnerText
Test CSS Selector InnerText

To test for the content of an Attribute:

Array.from(document.querySelectorAll('{css-selector-path}')).map(elem => elem.getAttribute('{attribute}')).join(';')
Test CSS Selector Attribute
Test CSS Selector Attribute

If you need help writing your CSS Selectors, please contact us.

Configure Regular Expressions

The second way to scrape data from several URLs is to use regular expressions. The bulk scraper matches the RegEx against the HTML code source.

If the pattern contains capturing groups, they are returned. And if there are no groups, the scraper returns the strings matching the whole pattern.

Bulk Scrape with Regular Expressions
Bulk Scrape with Regular Expressions

Capturing groups or pattern-matching

When writing a Regex, you can add a capturing group using parenthesis. When a capturing group is defined, the bulk scraper will return only the group text.

For example, in the HTML code:

Example HTML Code
Example HTML Code

To capture only the "US" text from the Shopify.country line, you would write:

Shopify\.country\s=\s"(\w+)";

Notice the parenthesis in (\w+).

To capture the whole line, you would write:

Shopify\.country\s=\s"\w+";

Notice I removed the parenthesis.

Use Cases

Lead Generation

Bulk scraping allows you to enrich URLs from various sources, such as directories, social media platforms, forums, etc.. Using CSS Selector, bulk scraping lets you get structured information from HTML pages.

Price Monitoring

E-commerce businesses can use URL scraping to monitor competitor websites and track product prices, discounts, and promotions. This information can be used for competitive intelligence and pricing strategies to stay competitive in the market.

Job Board Scraping

Job boards often contain valuable job postings information. Scraping URLs from job boards allows businesses to aggregate job postings automatically, providing valuable insights into hiring trends, job requirements, and competitor recruitment strategies.