Dealing with duplicates is an exhausting task with spreadsheet tools. Datablist provides a built-in deduplication feature to find duplicate values in your collections. Duplicates can then be removed or merged (automatically or using a merging assistant).
To find the duplicates in a collection, click on the "Duplicates Finder" button in your collection header.
Select duplicate check method
The first step is to select how to compare items to find duplicate values. Two methods are available:
- All Properties - Look for items with similar values for all properties. Two items would be considered similar when all of their properties match.
- Selected properties - Select the properties to be used for similarity check. Two items would be considered similar when they have similar values on all the selected properties.
Most of the time, Selected Properties is the best option. This is perfect to deduplicate contacts based on their email addresses and phone numbers, or companies using their website URLs. Use Selected Properties when you have one or several properties that identify a product, person, company, etc.
Next, if you click on the Selected Properties method, you have to select which properties will be checked.
For example, in a collection with people items, you would select the Email property.
Review and run duplicates check
Then, a review screen is shown with the number of items in your collection and the duplicate check mode selected. Click "Run duplicates check" to continue.
The duplicates analysis is a read-only operation. It will not perform any changes in your collection items until you decide to merge items.
The algorithm to compare item values is case insensitive. For example, "JOHN@GMAIL.COM" and "firstname.lastname@example.org" will be duplicates.
After running through all your items for duplicate items, Datablist will list all the duplicates found.
From this page, two actions are available:
The recommended way to deal with duplicates is to run auto merge duplicates first (1 in the screenshot below) and then pursue with Datablist merging assistant for conflicting items (2 in the screenshot below).
Automatically merge duplicates
To save you time, Datablist provides an "Auto Merge" algorithm. With "Auto Merge", most of the merging cases are possible. It handles merging non-conflicting items, combining duplicate values into a single item, and dropping duplicate values during the merging. The auto merge is triggered by clicking on the "Auto Merge duplicates when possible" button.
The merging non-conflicting items algorithm is always activated. Other merging algorithms need to be switched on to be activated.
Merge non-conflicting items
Datablist is able to merge non-conflicting items automatically without losing any information. It works as follows:
- If all the duplicate items have the same property values, only one item will be kept and the others will be deleted.
- If the duplicate items are complementary, the item with the most information will be selected as the primary item and its property values will be filled using other item property values. Then all items except the primary item will be deleted.
- If duplicate items have conflicted property values, items will be skipped for manual merging (or to be combined with the combining duplicates algorithm, or dropped)
For example, the following duplicates:
email | First Name | Last Name email@example.com | James firstname.lastname@example.org | | Bond
Will be merged into:
email | First Name | Last Name email@example.com | James | Bond
Combine conflicting properties
The Auto Merge algorithm comes with a second option to combine values from duplicates into a single text property. It keeps the data from all the duplicate items and merges them with a delimiter into a single item.
One or more properties can be combined and 4 delimiters are available:
For example, combining the Phone property with a semi-colon:
email | Phone | First Name | Last Name firstname.lastname@example.org | +33 1 34 65 23 | James | email@example.com | 06 13 42 78 23 | | Bond
Will be merged into:
email | Phone | First Name | Last Name firstname.lastname@example.org | +33 1 34 65 23;06 13 42 78 23 | James | Bond
This is useful when merging a list of leads or companies. If you have two leads with the same email address but different "Notes" and "Phone Number" values, combining them will create a consolidated lead.
Note: The algorithm only combines text values. If the property you want to combine is a Checkbox, Number, Datetime, etc. (see full data types list), it will not be combined.
Drop conflicting values
The last option available when merging duplicate items is to drop conflicting values.
Use this option:
- When the property must be a single value.
- For example, if you are cleaning a CRM lead list. You will have an external CRM "record id". To re-import your cleaned data back into your CRM, the "record id" must be a single value.
- When it doesn't make sense to keep several values.
- For example, if you have a "last activity" property with conflicting values. Date, Checkbox, and Number can't be combined so you can drop conflicting data to keep only one value.
Datablist selects one of the duplicate items to be a master item. When conflicts exist to merge items, the value from the master item is kept. The item with the most properties containing data is selected as the master.
For example, configuring the algorithm to drop conflicting values on AccountId:
AccountId | email | First Name | Last Name | Job Title 934DSFG39FGDS | email@example.com | James | | ODFJSDK123aSD | firstname.lastname@example.org | | Bond | CEO
Will be merged into:
AccountId | email | First Name | Last Name | Job Title ODFJSDK123aSD | email@example.com | James | Bond | CEO
- The second item has 4 properties with data (
Job Title). It is the master item.
Last Name, and
Job Titleare merged without conflict.
AccountIdhas a conflict. So the value from the master item is kept.
Manually merge conflicting items
For some duplicate items, conflicting values prevent them to be merged without losing data. For example, imagine two items that store contact information with the following data:
# Contact A Name: John Doe Job Title: Marketing Manager Email: firstname.lastname@example.org
# Contact B Name: John D. Job Title: Business Development Email: email@example.com
We know they represent the same person because they have similar email addresses. Unfortunately, they have two different job titles, the name is not the same, and merging them into a single item means choosing which data to keep.
This is where to use Datablist Merging Assistant. A "merge" button is available on the left of every duplicates group.
It opens the merging assistant for the items. On this modal, you can select which value to keep for each conflicting property. After merging them, all except one (the master item) will be deleted.
Please check our Datablist Merging Assistant documentation page to learn more.
Edit or delete items
Another way to deal with duplicate values is to directly edit an item's data or delete unnecessary items. You can directly perform those actions with the buttons listed on each item.
To learn more
Read our guides for step by step tutorials on how to deal with duplicates for CSV and Excel files: