Excel files are widely used to store listing data. To manage lists of customers or companies, or to clean products catalogs, Microsoft Excel is an easy-to-use spreadsheet tool to manage data. But like any spreadsheet tool, it's not possible to add data constraints to forbid duplicate information. When several people edit the same Excel file or when data get added frequently, the likelihood of finding duplicate values increases.

Hopefully, it's possible to process the data afterward to remove or merge duplicate values.

Removing duplicates is easy using Microsoft Excel (Menu "Data" -> "Remove Duplicates") but merging duplicates without losing information gets tricky. When merging duplicate rows, you want to remove similar rows and merge complementary rows when possible. Finally, conflicting values should be flagged to be processed manually.

Datablist is an easy-to-use online tool to manipulate data. It's a perfect tool to find and merge duplicate values in an Excel file. Datablist Duplicates Finder automatically finds and merges complementary duplicates. And conflicting items are skipped for manual action. You can use Datablist directly online for free without even signing up.

Why not use Excel to merge duplicates?
Microsoft Excel is a very powerful application and has a "Remove Duplicates" feature. However, "Remove Duplicates" keeps the first "matching" row and deletes any other duplicate rows. If two rows are complementary, it leads to a loss of data. Merging duplicates should not be complicated. Spreadsheet tools are great but they are generic and they don't focus on data manipulation. Datablist targets structured data listing and provides easy-to-use data manipulation features.

In the following guide, you will learn to find and merge duplicates from an Excel file without losing information. To illustrate this guide, we'll use an Excel file containing Contacts data with 4 fields: First Name, Last Name, Email, Job Title and we will find and merge all contacts based on their email address.

Click here to download the tutorial Excel file.

The process can be summed up with the following steps:

  1. Load your Excel file
  2. Find Duplicates
  3. Automatically merge duplicates with no conflict
  4. Manually merge remaining duplicates

Step 1: Load Excel file

The first step is to import your Excel file into Datablist. To do so, create a collection that will hold your Excel listing data. Use the "Import CSV/Excel" assistant to load your file.

Create a collection to hold your data
Create a collection to hold your data

Create properties and map with Excel columns

A Datablist collection is like a spreadsheet but with a data structure. And spreadsheet columns are called "Properties" in Datablist. If a property is defined as Email, Datablist will ensure only valid email addresses are filled out. Same with dates, checkboxes, etc.

Once your Excel file is loaded, you need to select which columns will be imported and map them with an existing collection property or create new properties.

Datablist auto detects common data types by analyzing the first rows to find patterns.

Import Excel file

Important

  • Cells with formulas will be computed during import.
  • Your data must be in the file first worksheet

If your data is scattered across different worksheets or files, please load them all in one Datablist collection and map them with the same properties.

Step 2: Find duplicates in the Excel file

With your Excel file data now imported, it's time to use Datablist Duplicates Finder to find and merge duplicates.

In the collection data listing, click the "Duplicates" button in the header (top right).

Two modes are available:

  • All Properties - In this mode, Datablist looks for items with similar values for all properties. Two items would be considered similar when all of their properties match.
  • Selected Properties - For this second mode, you will select the properties to be used for similarity check. Two items would be considered similar when they have similar values on all the selected properties.

In this example, the email property is enough to identify a contact, so you can select the Selected Properties mode with the email property.

The duplicate analysis is a non-destructive action. Until you decide to merge automatically or manually your items, items are not impacted.

Find duplicate items on one or more property/ies

Step 3: Bulk merge duplicate items with no conflict

Once all duplicate rows are listed, it's time to merge them! The process is done in two steps:

  • First, use Datablist "smart merge" feature to automatically merge non-conflicting items.
  • Then, use the "Merging Assistant" for the remaining duplicate values.

Click "Merge non-conflicting duplicates" to run the "smart merge". It works by merging records with similar or complementary values.

For example:

email            |     First Name   |    Last Name
james@gmail.com  |     James
james@gmail.com  |                  |     Bond

Will be merged into:

email            |     First Name   |    Last Name
james@gmail.com  |     James        |     Bond
Bulk merge complementary duplicate values
Bulk merge complementary duplicate values

Step 4: Manually merge conflicting items

Automatic merging is great for heavy lifting. If after running it, duplicate items remain, it means they are conflicting. Conflicts occur when at least two items have different values for a property. To merge them, you need to select which value to keep.

To do so, you can directly edit the items from the listing or launch Datablist Merging assistant. The assistant selects a primary item by looking at the item with the most data. Then, you can select property values from any other item to be merged into the primary item. At the end of the process, all items are deleted except for the primary item.

Manually merge remaining duplicates

Export results in CSV or Excel format

Well done! Your data is deduped!

To export the collection to be used by another application, click "Export" in the collection header and select CSV or Microsoft Excel.

Export in Microsoft Excel format
Export in Microsoft Excel format