Datablist Extractor: Extract domains, email addresses, mentions, etc.
With Datablist Extractor, you can now extract the domains from a list of email addresses, or find all URLs in texts.
Domains, Emails, URLs, mentions (@xx), tags (#xx), etc. are structured entities to use later to enrich a company, a contact, or websites.
This was ranked high in the requested features. And it will play nice with future enrichments (see "Notes on enrichments" below).
For the first release, the following extractors are available:
- Extract the domain from an email address
- Extract the domain from an URL
- Extract URL(s) from a text
- Extract mentions (ex: @name) from a text
- Extract tags (ex: #string) from a text
- Extract emails from a text
Feel free to contact me if you need other extractors.
Datablist Extractor is available from the "Edit" button.
Deduplication with Fuzzy Matching
Datablist Duplicates Finder is getting better with fuzzy matching. Fuzzy comparisons work by calculating the similarity between two strings with a distance function. And a threshold lets you decide when the strings must be considered similar.
Datablist implements two distances algorithms:
The threshold goes from 20 to 100. 100 for an exact match. The default value is set to 80.
Apollo.io People and Company enrichments
This summer, I've added two enrichments connected to the Apollo.io API. One for people and the other for companies.
Apollo.io People Enrichment
The enrichment is connected to Apollo.io People Enrichment. With at least a name and a company domain (or email address), Apollo returns all the business data for your contacts.
Among the returned values, you find:
- Email Address
- Phone Number
- LinkedIn Profile URL
- Address (city, state, country)
- Company name, website, LinkedIn URL
Apollo free tiers in generous for API calls. You get 600 enrichment per day using their API. Create an account on Apollo.io, and get an API Key at https://developer.apollo.io/keys/.
Apollo.io Company Enrichment
In addition to the Apollo.io People Enrichment, Datablist now has an enrichment for company data using the Apollo.io API.
It takes a company domain (or URL) and returns:
- Company Name
- LinkedIn URL
- Twitter URL
- Facebook URL
- Crunchbase URL
- AngelList URL
- Phone Number
- Founded Year
- Number of employees
Notes on enrichments
For enrichments, first I see a revamp of the "Enrichment Runner" to make it simpler to use and to better handle errors. Datablist will get connected to more third-party APIs to enrich people, email addresses, and companies. As well as some native premium enrichments to be used with Datablist Credits System.
Each data provider has some specificity, some can work with LinkedIn URLs, others with email addresses, and some are best suited for the USA or Europe. Costs add up when you have to subscribe to each provider. Datablist will help you save money with those integrations.
Contact me if you want to share ideas and/or suggest integrations.
Generate PDF for a list of URLs
This enrichment takes an URL, opens a headless Chrome browser, and triggers a print. The result is saved and the download link is returned for each URL.
You can specify the page orientation.
New domain output for the Free Email Validator
Datablist free email validation service now returns the domain from the list of email addresses.
Combined with the "Business Email" output (returns True if the domain is not from a generic email provider (Gmail, Yahoo, etc.)), you can get company data from your email list with the Apollo.io Company Enrichment.
Convert timestamp to Datetime
A new data type conversion is available to get a Datetime from a Unix timestamp. A timestamp is a way to represent a date using the number of seconds from the Unix Epoch on January 1st, 1970 at UTC. Datablist detects timestamps in seconds or milliseconds and returns a formatted Datetime.
Improvement with Copy-Pasting
In spreadsheet tools, pasting tabulated data overwrites the cell's values. With Datablist, and its structured data and items, pasting data creates new items.
This is what users are expecting 90% of the time (I think). And still, copy-pasting to edit multiple cell values in bulk is great.
Datablist should be able to perform both. A first iteration has been deployed to edit several cells after pasting tabulated data when the data contains only one column.
For now, it only works when the pasted data has one column. Datablist shows a confirmation dialog to know if it must create new items or edit the current cells.
Another change has been released to improve what text is set to the clipboard on a "copy" action. If you perform a copy to clipboard (ctrl+c) and get something that doesn't feel right, please tell me.
Other Improvements & Fixes
- Show memory error notification. To get fast interactions, Datablist uses a local database that lives inside your web browser. When importing a CSV file, Datablist stores the data on this database and synchronizes it with Datablist servers (when Cloud Syncing is enabled). Web browsers may prevent Datablist to store data. This happens during private browsing with some web browsers, or when your hard drive is full. Datablist now shows an error notification when it can't store data locally.
- Improve value unicity processing. Now after the cell edition and copy-pasting.
- Fix import for CSV files with multiple similar headers
- Import TXT files with a single line and only comma-separated values
- Skip deleted properties during full-text search