Leads Deduplication is part of a good data hygiene routine. Having duplicates impacts your leads' data quality. Your sales productivity drops and the problem only grows until you start doing leads deduplication regularly.

If you want to clean your leads database before moving it to a new CRM or if you want to clean your current CRM, this guide will help you merge your duplicate leads.

CRM systems like Salesforce or Hubspot have built-in deduplication features but they are limited. They detect redundant contacts but the merging process is time-consuming. Exact matchs are merged automatically while conflicting values require to be merged manually one by one.

Datablist is perfect to perform data manipulation on large datasets. The Duplicates Finder detects duplicate records and has a powerful automatic merging feature to merge duplicate leads without losing data. Exact matches are removed and conflicting values combined. Datablist unique algorithm consolidates your conflicting Notes, Emails, or Phone Numbers into a single lead record.

In this step-by-step guide, you will learn:

Notes: This guide is about Lead Deduplication. But the process is similar for any list of records: Contacts, Companies, Products, etc. you want to dedupe.

Find duplicate leads

To begin, import your Leads Database into Datablist.

With Datablist, data is organized in collections. A collection stores a list of records sharing the same data model. You must import your leads using external files. Datablist supports CSV and Excel files. Click "Import CSV/Excel", then select the file with your lead list.

Click the + to create a new collection. Give it a name (and an icon 🚀). Or click "Start with a CSV/Excel file" from the home screen.

Create collection shortcut
Create collection shortcut

Then, move to the "Properties" screens. This step lists the columns found when parsing the CSV file. Datablist checks each column to detect which data type should be used. For example, email addresses and urls are automatically detected.

Manually select data type when needed. Disable import if you have CSV columns that must not be imported.

CSV Column Mapping
CSV Column Mapping

Text next import step displays a preview with the content of your file. Click "Import {x} items" to launch the import process.

If your leads are spread across several files, import them all into a single collection. When your collection already has data, a mapping step will be shown during the import process to map your CSV columns with the existing collection properties.


Now your leads database is loaded into a Datablist collection, click on "Duplicates Finder" in the header to run a duplicates analysis.

Start Duplicates Finder
Start Duplicates Finder

Select how your leads should be compared to start the dedupe process. Two modes are available:

  • All Properties - Two records will be considered duplicates if they have an exact match on all of their property values.
  • Selected Properties - Records will be checked on specific properties.

Notes - In Datablist, the term "Property" is a synonym with Field or Column in other systems.

For lead deduplication, select "Selected Properties".

Select Merging Mode
Select Merging Mode

Now select what identifier(s) is unique for a lead. This can be the email address for people or the company url for businesses.

Select a unique identifier for your lead records
Select a unique identifier for your lead records

Then click "Next". A review step is shown. Click "Run duplicates check" to run the analysis.

Important

  • The analysis is a read-only process. No data modification will be done until the next phase and the leads merge.
  • Datablist compares text using a case-insensitive algorithm. If two values are similar but one has uppercases, they will be listed as duplicate leads.

Automatically dedupe leads

Datablist Duplicates Finder offers two mechanisms to remove duplicate leads: automatically and manually. To merge your duplicate leads, start with the Auto Merge and then deal with the remaining records manually.

Auto Merge works with 3 algorithms:

  • Merge non-conflicting leads - This algorithm runs a "smart merge". It works by merging records with similar or complementary values.
  • Combine conflicting values - This algorithm combines text values from conflicting properties using a delimiter.
  • Drop conflicting values - This algorithm keeps the value from a master item and deletes other conflicting values to merge leads into a single record.
Auto Merge Algorithms
Auto Merge Algorithms

The merging and combining algorithms are safe algorithms. Data from all duplicate leads are kept during the merging. But the drop conflicting values delete all but one value for a specific property.

Here is an example of how each algorithm works:

Merging non-conflicting leads

email            |     First Name   |    Last Name
james@gmail.com  |     James
james@gmail.com  |                  |     Bond

Will be merged into:

email            |     First Name   |    Last Name
james@gmail.com  |     James        |     Bond

Combining the Phone property with a semi-colon

email            |       Phone       |     First Name   |    Last Name    
james@gmail.com  |  +33 1 34 65 23   |      James       |                 
james@gmail.com  |  06 13 42 78 23   |                  |     Bond        

Will be merged into:

email            |   Phone                         |     First Name   |    Last Name
james@gmail.com  |  +33 1 34 65 23;06 13 42 78 23  |     James        |     Bond    

Drop conflicting values on AccountId

AccountId        |       email          |     First Name   |    Last Name    |  Job Title
934DSFG39FGDS    |     james@gmail.com  |      James       |                 |
ODFJSDK123aSD    |     james@gmail.com  |                  |     Bond        |    CEO

Will be merged into:

AccountId        | email            |     First Name   |    Last Name    |  Job Title
ODFJSDK123aSD    | james@gmail.com  |     James        |     Bond        |     CEO

How to configure the Auto Merge for leads merging?

The 3 algorithms cover most of the lead deduplication use cases.

To dedupe your leads listing:

  • Use the combining values option for text properties such as Notes, Phone Number, Email Address.
  • Use the drop conflicting values option for:
    • Technical properties such as Account Id that require a single value.
    • Properties that are "Relation". For example Lead owner, Account.
    • Non-text properties that can't be combined. For example datetime such as Last Activity, Contacted on, and checkboxes.

Important: See the Update Update your CRM with your updated leads list to deal with re-importing data into your CRM.

Please contact us if you have questions about the Auto Merge feature.

Manually merge remaining duplicate leads

Use Datablist Merging Assistant to merge manually your remaining duplicate leads.

Scroll to the section "Or merge duplicate items manually" to see your remaining duplicate records.

On the left of each duplicate lead group, a "Merge Items" button opens the Merging Assistant.

Merge duplicates
Merge duplicates

It opens a merging tool. On the right, Datablist selects the record with the most data as "Primary item". And on the left, the remaining duplicate leads are called "Secondary Items".

Merging Assistant
Merging Assistant

When possible, property values from secondary items are auto-selected to be merged into the primary items. If several values conflict, you will have to make a decision and select which value to keep.

If the resulting "Primary item" suits you, click the Merge button to confirm the merge process. All the secondary leads will be deleted to keep only one combined lead record.

You can also edit or delete your duplicate leads directly from this listing.

Update your CRM with your updated leads list

Manage multiple values in a single cell

Datablist combines values into a single cell. You can end up with a listing with several values merged with a delimiter.

Fox example a merged Phone property:

email            |   Phone                         |     First Name   |    Last Name
james@gmail.com  |  +33 1 34 65 23;06 13 42 78 23  |     James        |     Bond    

If your CRM uses multiple fields to store phone numbers, you will want to process your leads to have your values split. A better record would be:

email            |   Phone 1          |   Phone 2          |     First Name   |    Last Name
james@gmail.com  |  +33 1 34 65 23    |   06 13 42 78 23   |     James        |     Bond    

To manage this transformation, you can:

  • Export your leads listing into an Excel file and post-process it with Excel or Google Sheets.
  • Or run a script code directly in Datablist to process this splitting.
Split values on delimiter directly on Datablist

First, create extra properties to store your multiple values if they are not created. Create a Phone 2, Phone 3 properties or Email 2, Email 3 that will store a single value after the split.

Create Properties - Step 1
Create Properties - Step 1
Create Properties - Step 2
Create Properties - Step 2

Then, click on "Run Javascript" in the header to open a simple script editor.

Run Javascript
Run Javascript

Adapt the script below to fit your properties:

function runOnItem(item){
  if(!item.phone) return null;

  var parts = item.phone.split(';');

  if(parts.length===1) return null;

  return {
    phone1: parts[0],
    phone2: parts[1]
  }
}

Note: Process each combined property separately. If you have a property with phone numbers, and another one with email addresses. First process the phone number with a script, then run a second one for your email addresses.

Here is an example of code that split the content of a property with the key phone1. The split is done on a semicolon. And the resulting phone numbers are stored in 2 properties: phone1 and extraphone.

Javascript code to split on semicolon
Javascript code to split on semicolon

Please contact us if you have questions about how to write the script.

FAQ

What is Lead Duplication?

Lead Deduplication, or lead deduping, is the process of finding and merging duplicate records to have a clean list of unique entries.

A lead represents a person or a company. A duplicate lead is when there are multiple records for the same person or company.

Duplicate leads appear when you have several lead sources that pour into a single lead list. Examples of lead sources are lead magnets, webinars, newsletters, or manual entries.

How is it different from Salesforce and Hubspot deduplication?

In Salesforce, you can setup up matching rules to detect duplicate records. When duplicates are found, exact duplicate leads will be automatically merged. But Salesforce and Hubspot cannot automatically merge conflicting leads. A manual merging assistant let you merge your leads one by one. For big listings, automatic deduplication tools such as Datablist will save you time in your dedupe process.

How does Datablist compare values to find duplicates?

Before merging the duplicates, Datablist runs an algorithm to detect duplicate leads. This algorithm has two ways to compare records. A full comparison to find leads that have the same values for all of their fields. And a comparison with one or more selected fields.

Running the Duplicate Finder on a selected field is recommended for lead deduplication. Select only the field that identifies a lead. An email address or a company website are good identifiers.

The deduplication algorithm ignores the letter case. Uppercase or lowercase letters don't impact the algorithm.

Can all lead properties be combined?

You can only combine text-based properties. The combining algorithm uses a string delimiter to merge several values. All data types that inherit from a text are compatible: Text, LongText, Email, Url, etc. But Checkbox and Datetime are not compatible with the combining algorithm and are ignored during the combining process.

If set a Checkbox property in the combine algorithm, if two records have different values (checked, and not checked), the records won't be merged. You will still be able to merge them manually.

How to deal with conflicting values I don't want to combine?

Please use the Drop conflicting values setting to merge duplicate leads without combining values. This is useful for Checkbox, Datetime properties that can't be combined.

How to move conflicting values to other properties?

To move conflicting values to another property, you must split your process into two steps:

  • Merge conflicting values into a single property using a delimiter
  • Then, with a script, you can split all values with a delimiter into two or more properties.

See managing multiple values in a single cell for a step-by-step guide.

How many leads can I process?

Datablist Duplicates Finder works with big listings containing hundred of thousand of records. For better performance, be sure to run the dedupe algorithm on a laptop or desktop computer. Modern web browsers such as Google Chrome, Apple Safari, and Mozilla Firefox will be faster than Microsoft Edge.

How long is the deduplication process?

Deduplication is almost instant. For big lead lists with hundred of thousand of record, the process will take a few seconds.

You don't have to wait several hours to see the results of your deduplication settings. A good practice is to clone your collection before running the merging algorithms. If you are not happy with the merging result, just start again using the cloned collection.

Do I need to resolve all duplicate conflicts?

No. Your collection items are updated on every deduplication operation. You can perform incremental deduplication. Start with auto-merge with only exact matching items, visualize the remaining duplicates, set up combining rules, etc. until you have removed all the duplicates.

Can Datablist connect directly to the CRM API to deduplicate leads?

Not yet. At the moment, leads must be imported to Datablist manually with external files. CSV files and Excel files are recommended methods to import your leads into Datablist.