Datablist Extractor: Extract domains, email addresses, mentions, etc.

With Datablist Extractor, you can now extract the domains from a list of email addresses, or find all URLs in texts.

Domains, Emails, URLs, mentions (@xx), tags (#xx), etc. are structured entities to use later to enrich a company, a contact, or websites.

This was ranked high in the requested features. And it will play nice with future enrichments (see "Notes on enrichments" below).

For the first release, the following extractors are available:

  • Extract the domain from an email address
  • Extract the domain from an URL
  • Extract URL(s) from a text
  • Extract mentions (ex: @name) from a text
  • Extract tags (ex: #string) from a text
  • Extract emails from a text

Feel free to contact me if you need other extractors.

Datablist Extractor is available from the "Edit" button.

Deduplication with Fuzzy Matching

Datablist Duplicates Finder is getting better with fuzzy matching. Fuzzy comparisons work by calculating the similarity between two strings with a distance function. And a threshold lets you decide when the strings must be considered similar.

Fuzzy matching is perfect to find duplicate leads with people or company name typos. Or to find items with the same postal addresses written with variations.

Datablist implements two distances algorithms:

The threshold goes from 20 to 100. 100 for an exact match. The default value is set to 80.

Apollo.io People and Company enrichments

This summer, I've added two enrichments connected to the Apollo.io API. One for people and the other for companies.

Apollo.io People Enrichment

The enrichment is connected to Apollo.io People Enrichment. With at least a name and a company domain (or email address), Apollo returns all the business data for your contacts.

Among the returned values, you find:

  • Email Address
  • Phone Number
  • Title
  • Seniority
  • LinkedIn Profile URL
  • Address (city, state, country)
  • Company name, website, LinkedIn URL

Apollo free tiers in generous for API calls. You get 600 enrichment per day using their API. Create an account on Apollo.io, and get an API Key at https://developer.apollo.io/keys/.

Apollo.io Company Enrichment

In addition to the Apollo.io People Enrichment, Datablist now has an enrichment for company data using the Apollo.io API.

It takes a company domain (or URL) and returns:

  • Company Name
  • Website
  • LinkedIn URL
  • Twitter URL
  • Facebook URL
  • Crunchbase URL
  • AngelList URL
  • Address/Country
  • Phone Number
  • Industry
  • Founded Year
  • Number of employees

Notes on enrichments

Datablist Enrichments will be my next focus. Now that the foundation for data cleaning and data consolidation is done, I can move to the next layer.

For enrichments, first I see a revamp of the "Enrichment Runner" to make it simpler to use and to better handle errors. Datablist will get connected to more third-party APIs to enrich people, email addresses, and companies. As well as some native premium enrichments to be used with Datablist Credits System.

Each data provider has some specificity, some can work with LinkedIn URLs, others with email addresses, and some are best suited for the USA or Europe. Costs add up when you have to subscribe to each provider. Datablist will help you save money with those integrations.

Contact me if you want to share ideas and/or suggest integrations.

Generate PDF for a list of URLs

This enrichment takes an URL, opens a headless Chrome browser, and triggers a print. The result is saved and the download link is returned for each URL.

You can specify the page orientation.

Improvements

New domain output for the Free Email Validator

Datablist free email validation service now returns the domain from the list of email addresses.

Combined with the "Business Email" output (returns True if the domain is not from a generic email provider (Gmail, Yahoo, etc.)), you can get company data from your email list with the Apollo.io Company Enrichment.

Convert timestamp to Datetime

A new data type conversion is available to get a Datetime from a Unix timestamp. A timestamp is a way to represent a date using the number of seconds from the Unix Epoch on January 1st, 1970 at UTC. Datablist detects timestamps in seconds or milliseconds and returns a formatted Datetime.

Improvement with Copy-Pasting

In spreadsheet tools, pasting tabulated data overwrites the cell's values. With Datablist, and its structured data and items, pasting data creates new items.

This is what users are expecting 90% of the time (I think). And still, copy-pasting to edit multiple cell values in bulk is great.

Datablist should be able to perform both. A first iteration has been deployed to edit several cells after pasting tabulated data when the data contains only one column.

For now, it only works when the pasted data has one column. Datablist shows a confirmation dialog to know if it must create new items or edit the current cells.

Another change has been released to improve what text is set to the clipboard on a "copy" action. If you perform a copy to clipboard (ctrl+c) and get something that doesn't feel right, please tell me.

Other Improvements & Fixes

  • Show memory error notification. To get fast interactions, Datablist uses a local database that lives inside your web browser. When importing a CSV file, Datablist stores the data on this database and synchronizes it with Datablist servers (when Cloud Syncing is enabled). Web browsers may prevent Datablist to store data. This happens during private browsing with some web browsers, or when your hard drive is full. Datablist now shows an error notification when it can't store data locally.
  • Improve value unicity processing. Now after the cell edition and copy-pasting.
  • Fix import for CSV files with multiple similar headers
  • Import TXT files with a single line and only comma-separated values
  • Skip deleted properties during full-text search