Changelog

New features, improvements and fixes to Datablist.

April 2023

Deduplicate items across collections

I use Datablist to create lists of prospects. I have lists of companies from LinkedIn, a list from my user base, lists from scraping, company databases, etc.

All those lists have different properties. So, it doesn't make sense to create a single list to manage all my prospects. I like to keep them in different collections.

Until now, I couldn't check duplicate leads across all of my prospect's collections. From all the feedback I received, I was not alone to have this issue.

In April, I made big changes to the Duplicates Finder. I enabled deduplication across multiple collections and I moved the Duplicates Finder from an exact match algorithm to a probabilistic one.

I'm very confident this feature will help you deal with your lists of contacts the way it helps me. It's great to find engaged leads who appear in several communities. And to cross-check it with your user base.

You can check our updated Duplicates Finder documentation to learn more.

Improved deduplication algorithm

Match duplicate items that have empty values

Building a deduplication algorithm is complex. A brut force algorithm doesn't scale well. A list of 200 000 items generates 200 000*199 999/2 = 19 999 900 000 unique item pairs.

The previous "Duplicates Finder" algorithm was fast but worked only for exact matches. If you had a collection with leads and you ran the algorithm on the "names", "email addresses" and "company websites". It found duplicate items that had the same values.

If a lead had an empty company website, or no email address, the lead was often ignored.

With the new deduplication algorithm, the Duplicates Finder finds duplicate items even with some empty values. It computes a similarity score between items that work with incomplete data.

You can check our updated Duplicates Finder documentation to learn more.

Probabilistic similarity score

As I said above, the Duplicates Finder now uses a similarity score to find duplicate items. Datablist takes two items and calcules the similarity between them.

It opens a lot of possibilities to compare items that are not 100% similar. I've released two new algorithms to find duplicate items with minor differences.

The first one is the "Smart Algorithm":

  • It removes all spaces and punctuation characters (before, after, between words)
  • It matches words in different orders
  • It removes URL protocol for URL comparaison

For example:

Item Id | Full Name | Company Website
00001 | James-Bond | https://www.acme.com
00002 | bond james | http://www.acme.com
00003 | james bond |

Would all pop up as duplicate items.

The second algorithm uses the "Metaphone" phonetic algorithm. It converts texts to codes to match similar-sounding words.

For example:

Item Id | Full Name | Company
00001 | Filip Dupon | google
00002 | Dupont-Philip | GOOGL
00003 | Dupond philippe | gogle

Would be flagged as duplicate items.

You can check our updated Duplicate Finder documentation to learn more.

Optimized duplicate group listing and merging for large lists

And one more thing, I've improved the Duplicate Finder results page to scale with thousands of duplicate groups. The page could freeze before when you had a lot of items flagged in duplicate groups.

The new page load the items on demand so it scales up to thousands of items.

A new "Don't process" action was added. It removes the duplicate group from the results listing. Skipped groups are ignored during the "Auto Merge" action.

New enrichments

Name Parser

Return the gender, country, and all name parts (First Name, Last Name, Title, etc.) from a person's full name.

Extract the name from an email address

Use probabilistic analysis to parse an email address and extract a first name and a last name.

Location Lookup

Return the City, Country, Latitude, and Longitude for a location. Read our new guide to extract the City and Country from a list of addresses.

Improvements & Fixes

  • Fix auto detect of data type for numbers with more than 22 digits. They will now be imported as Text.
  • Fix the issue with running enrichments before the credits balance is loaded
  • Fix the issue with running enrichments before the enrichment options are loaded
  • Change Payment Method and Password directly in your Datablist account

March 2023

Move items between collections

In March, I released a new feature to move items between two collections. Moving items is useful to clean and segment your data. You can move items once they are enriched, or split your master collection into sub-collections.

Read our documentation to learn how to move items between collections.

JavaScript code

Save JavaScript code into your code library

Writing JavaScript code is both complex and powerful. You can write JavaScript code to fill a property using data from the other properties (for example to set a "valid" property based on the value of other properties). Or you can edit your data with complex operations that would be impossible with simple spreadsheet formulas.

But re-writing every time your JavaScript code is error-prone. With the new "Code Library" released in March, you can save your JavaScript code in your account and run it directly.

Read our documentation or contact me if you need help writing JavaScript.

Call APIs from your JavaScript code

I've disabled the limitations on JavaScript code for standard users. You can now write JavaScript code to interact with external APIs using the fetch interface.

Check our documentation or contact me to discuss your use case.

Datablist API for standard users

Another new release to help you build complex workflow on Datablist with the opening of Datablist API. Datablist API is restricted to standard users.

It works with "Personal API Keys" that let you get access tokens to interact with Datablist API.

Please check our Developers' Documentation and our Postman collection.

Enrichments improvements

Save enrichment configuration

Previously, you had to set the enrichment settings and configuration every time you opened the enrichment drawer.

This was not ideal for day-to-day use. And you could make mistakes during the mapping.

Your settings and properties mappings are now saved in your browser. When you open an enrichment, the configuration will be automatically filled based on your previous run.

Settings with text values can be sensitive. Some enrichment use settings to pass "API Key" for example. To avoid your setting values to be accessed, they are encrypted with a 256 bits key.

This feature is enabled by default. You can disable it by clicking the setting icon at the bottom of the enrichment drawer.

Overwrite items with enrichment results

Another improvement with Enrichments is the "Overwrite value" option. By default, Datablist doesn't edit your cell if it already contains data.

With this option enabled, the enrichment results will overwrite existing values.

New enrichments

Moz.com

If you are managing company leads, you will like the new "Moz.com" integration. It lets you process domains to get domain authority, the number of backlinks, etc. from Moz.

Entities Extractor

Extract company names, person names, or locations from any text. This action uses machine learning to process your data automatically.

The model is trained in Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese, and Chinese.

GPS Coordinates Finder

This enrichment uses Bing Maps API to get Latitude and Longitude coordinates from an address.

Improve export for large collections

Datablist has a 1.5 million rows limit for CSV files. But you can import big CSV files by splitting them and performing multiple imports. There is no hard limit on the number of items a collection can store. It depends on your browser database.

I improved the export mechanism to work with collections containing several millions of items. You will now see a process notification showing how many items have been collected for the export file.

And two options have been added to deal with exports of large collections. You can now set a count and an offset parameter to export your collection into several files.

Improvements & Fixes

  • Improve LinkedInProfileFinder and fix throttling errors
  • Show how many items are currently processing during an action/enrichment run
  • Fix copy-pasting when the drawer is open
  • New Number to Text conversion in "Clean -> Text <=> Number"
  • Add "Line Break" delimiter for "Merge Properties"
  • New mathematics operation for numbers in BulkEdit. Add, Subtract, Multiply, Divide.
  • Fix sorting on native collection properties "createdAt" and "updatedAt"
  • Prevent running Javascript Code if the preview raises an error
  • New Search engine in the documentation
  • New documentation page for "Run JavaScript"
  • Fix filtering on equal DateTime comparison

February 2023

Split property

During data cleaning and data normalization, you often need to split text from a property into multiple properties. This is useful to get the domain from email addresses, or to split a Full Name into First Name and Last Name parts.

The splitting algorithm takes a property, a delimiter, and the number of parts. Check the split property documentation.

Merge properties

And the inverse of "Split property" is the "Merge Properties" tool. This one takes multiple properties and a delimiter character(s). Check the merge properties documentation.

New Bulk Edit functions

3 new edit functions are available for bulk editing.

Miscellaneous

  • Custom API key for Deepl - Premium users can now use their own Deepl API key to run the Deepl action and translate CSV files.
  • Allow async function in "Run JavaScript"

Fixes

  • Fix copy pasting with multiline text and delimiters other than comma

January 2023

In December 2022, I published a blog article introducing Datablist for Lead Management. I shared a preview of the next features to be developed on Datablist.

In January, the first iteration was released with tons of new features!

New header

A new collection header was released in January. It is responsive and easier to read.

It is split into 3 parts: Collection Information, Search and Filters, and actions. Filters can be saved with "Saved Filters" (more about that later).

Actions are grouped in 4 menus: Import, Export, Clean, Enrich, and Edit. Read our Datablist for Lead Management post to learn more on that.

The columns headers have been improved. The items count is available on the top right, and a menu to manage a property pop up on a column click.

A "New Property" shortcut has been added near the right of the columns.

Clean and Edit features

Cleaning and editing data is a strong focus for 2023 (see our Datablist 2022 in review blog post).

In January, I added two long awaited features: Find and Replace and Bulk Edit. You can read the Find and Replace documentation and the Bulk Edit documentation to learn more.

DataType conversion

A core concept with Datablist is to work with data types. In spreadsheet tools, data is mostly text with some "formatting". Working with date, boolean, number, etc. is a pain in Google Sheets or Microsoft Excel.

A new DataType conversion tool has been released in January to quickly create Datetime, Number, and Checkbox properties from a Text property. Check our Text to Datetime, Number, Checkbox documentation.

This tool will be improved to convert data from and to any DataType.

Import from collection

Import data into your collection from another collection. With this feature, it is now easier to have segmented lead lists and build a master list.

Join on a property during the import

Consolidating data is an impossible task on spreadsheet tools. It was previously possible in Datablist with the "unique values" option in a property.

Joining CSV files or two collections to consolidate with an identifier is now accessible directly during the import process.

When importing data into an existing collection with data, a new "Join" toggle is available on the mapping properties.

Then, in the options, you can select LEFT OUTER JOIN or FULL OUTER JOIN to import only matching items or all the items from the import file.

Saved Filters

Creating segments in a collection is now possible with "Saved Filters". This feature is accessible when you have at least one filter enabled on your collection. Click "Save Filters" to save them in your account. Saved Filters are shared with your team members.

Miscellaneous

  • Use the import file name to rename the collection after an initial import
  • Option to prevent duplicating data during a collection cloning
  • Duplicate property with an option to copy values
  • Show "tooltip" with editing how-to after double-clicking on a cell
  • Add keyboard shortcut to close modal with Escape on Delete Collection and Duplicate Property modals
  • Improve copy-pasting behavior
  • Disable horizontal over-scroll on datatable to avoid the browser previous page behavior

Fixes

  • Fix deduplication algorithm on multi-properties analysis. The bug led to misses in the duplicate check.

November 2022

Filtering with "or" operation

You can now change how multiple filters are combined.

Select the "or" operation to get items matching at least one filter.

Select the "and" operation to get items matching all the filters.

Currently, only one operation can be used for all filters.

Create a new collection from selected items

We added a shortcut to creating a new collection with items. The action is available from the "selected" items actions.

It clones the current collection properties and copies the selected items in it.

Auto-Detect CSV file encoding

The CSV format is text-based but the encoding is not standardized. Two CSV files generated by Microsoft Excel can use two different encodings if one is generated from a system configured in French or in English.

And the bad news, the encoding is not stored in the CSV format.

When loading a CSV file, Datablist has used UTF-8 encoding so far. It worked for English-based CSV files. Or if by chance the CSV file was encoded in UTF-8.

For CSV files with accents or special characters, you ended up with weird characters.

Datablist now analyzes your CSV file to list potential encodings. A score is calculated for each encoding and the one with the highest score is used.

You can change the encoding if the imported data still contains weird characters.

Fixes & Improvements

  • Fix cloud synchronization errors after "undoing" items delete
  • Fix Excel file import when header names are numbers
  • Fix cloning empty collection
  • New "is not" and "does not contain" filters