Dealing with duplicates is an exhausting task with spreadsheet tools. Datablist provides a built-in deduplication feature to find duplicate values in your collections. Duplicates can then be removed or merged (automatically or using a merging assistant).

To find the duplicates in a collection, click on the "Duplicates Finder" button in your collection header.

Duplicates Finder
Duplicates Finder

Select duplicate check method

The first step is to select how to compare items to find duplicate values. Two methods are available:

  • All Properties - Look for items with similar values for all properties. Two items would be considered similar when all of their properties match.
  • Selected properties - Select the properties to be used for similarity check. Two items would be considered similar when they have similar values on all the selected properties.

Most of the time, Selected Properties is the best option. This is perfect to deduplicate contacts based on their email addresses and phone numbers, or companies using their website URLs. Use Selected Properties when you have one or several properties that identify a product, person, company, etc.

Configure Properties

Next, if you click on the Selected Properties method, you have to select which properties will be checked.

For example, in a collection with people items, you would select the Email property.

Example for Selected Properties
Example for Selected Properties

Review and run duplicates check

Then, a review screen is shown with the number of items in your collection and the duplicate check mode selected. Click "Run duplicates check" to continue.

The duplicates analysis is a read-only operation. It will not perform any changes in your collection items until you decide to merge items.

Note:
The algorithm to compare item values is case insensitive. For example, "JOHN@GMAIL.COM" and "john@gmail.com" will be duplicates.

Example for Selected Properties
Example for Selected Properties

Duplicates Listing

After running through all your items for duplicate items, Datablist will list all the duplicates found.

From this page, two actions are available:

The recommended way to deal with duplicates is to run auto merge duplicates first (1 in the screenshot below) and then pursue with Datablist merging assistant for conflicting items (2 in the screenshot below).

Duplicate Item listing
Duplicate Item listing

Automatically merge duplicates

To save you time, Datablist provides an "Auto Merge" algorithm. With "Auto Merge", most of the merging cases are possible. It handles merging non-conflicting items, combining duplicate values into a single item, and dropping duplicate values during the merging. The auto merge is triggered by clicking on the "Auto Merge duplicates when possible" button.

The merging non-conflicting items algorithm is always activated. Other merging algorithms need to be switched on to be activated.

Merge non-conflicting items

Datablist is able to merge non-conflicting items automatically without losing any information. It works as follows:

  • If all the duplicate items have the same property values, only one item will be kept and the others will be deleted.
  • If the duplicate items are complementary, the item with the most information will be selected as the primary item and its property values will be filled using other item property values. Then all items except the primary item will be deleted.
  • If duplicate items have conflicted property values, items will be skipped for manual merging (or to be combined with the combining duplicates algorithm, or dropped)

For example, the following duplicates:

email            |     First Name   |    Last Name
james@gmail.com  |     James
james@gmail.com  |                  |     Bond

Will be merged into:

email            |     First Name   |    Last Name
james@gmail.com  |     James        |     Bond

Combine conflicting properties

The Auto Merge algorithm comes with a second option to combine values from duplicates into a single text property. It keeps the data from all the duplicate items and merges them with a delimiter into a single item.

One or more properties can be combined and 4 delimiters are available: Line Break, Semicolon, Comma and Space.

Combine Duplicates - Options
Combine Duplicates - Options

For example, combining the Phone property with a semi-colon:

email            |       Phone       |     First Name   |    Last Name    
james@gmail.com  |  +33 1 34 65 23   |      James       |                 
james@gmail.com  |  06 13 42 78 23   |                  |     Bond        

Will be merged into:

email            |   Phone                         |     First Name   |    Last Name
james@gmail.com  |  +33 1 34 65 23;06 13 42 78 23  |     James        |     Bond    

This is useful when merging a list of leads or companies. If you have two leads with the same email address but different "Notes" and "Phone Number" values, combining them will create a consolidated lead.

Note: The algorithm only combines text values. If the property you want to combine is a Checkbox, Number, Datetime, etc. (see full data types list), it will not be combined.

Drop conflicting values

The last option available when merging duplicate items is to drop conflicting values.

Use this option:

  • When the property must be a single value.
    • For example, if you are cleaning a CRM lead list. You will have an external CRM "record id". To re-import your cleaned data back into your CRM, the "record id" must be a single value.
  • When it doesn't make sense to keep several values.
    • For example, if you have a "last activity" property with conflicting values. Date, Checkbox, and Number can't be combined so you can drop conflicting data to keep only one value.

Important
Datablist selects one of the duplicate items to be a master item. When conflicts exist to merge items, the value from the master item is kept. The item with the most properties containing data is selected as the master.

For example, configuring the algorithm to drop conflicting values on AccountId:

AccountId        |       email          |     First Name   |    Last Name    |  Job Title
934DSFG39FGDS    |     james@gmail.com  |      James       |                 |
ODFJSDK123aSD    |     james@gmail.com  |                  |     Bond        |    CEO

Will be merged into:

AccountId        | email            |     First Name   |    Last Name    |  Job Title
ODFJSDK123aSD    | james@gmail.com  |     James        |     Bond        |     CEO

Algorithm Explanation

  • The second item has 4 properties with data (AccountId, email, Last Name, Job Title). It is the master item.
  • First Name, Last Name, and Job Title are merged without conflict.
  • AccountId has a conflict. So the value from the master item is kept.

Manually merge conflicting items

For some duplicate items, conflicting values prevent them to be merged without losing data. For example, imagine two items that store contact information with the following data:

# Contact A
Name: John Doe
Job Title: Marketing Manager
Email: johndoe@company.com
# Contact B
Name: John D.
Job Title: Business Development
Email: johndoe@company.com

We know they represent the same person because they have similar email addresses. Unfortunately, they have two different job titles, the name is not the same, and merging them into a single item means choosing which data to keep.

This is where to use Datablist Merging Assistant. A "merge" button is available on the left of every duplicates group.

Manual Merging
Manual Merging

It opens the merging assistant for the items. On this modal, you can select which value to keep for each conflicting property. After merging them, all except one (the master item) will be deleted.

Merging Assistant
Merging Assistant

Please check our Datablist Merging Assistant documentation page to learn more.

Edit or delete items

Another way to deal with duplicate values is to directly edit an item's data or delete unnecessary items. You can directly perform those actions with the buttons listed on each item.

Edit and delete buttons
Edit and delete buttons

To learn more

Read our guides for step by step tutorials on how to deal with duplicates for CSV and Excel files: