CSV files are everywhere to define structured data for software applications or data sets. Despite being ubiquitous, CSV manipulation is hard and often requires technical knowledge.

For simple manipulations, spreadsheets are enough. However, they are limited when it comes to:

In this guide, we will focus on the last two manipulations: removing and/or merging duplicates from CSV files.

If you have several CSV files that you want to join together using a unique column, go to the join csv files guide.

Remove duplicates from CSV files

You have exported your data from an application in a CSV file and want to get rid of the duplicates? You have concatenated several CSV files into a big one and now need to clean it? Or maybe you have several sources of data no longer in sync and want to merge all those versions into a single listing. Microsoft Excel or Google Sheets will not help you with your problem because they don't allow you to set up a unique constraint on a column.

On a small CSV file, this task can be done manually. On bigger files, it would take hours to process the entries one by one. And you risk human errors.

What you need is a tool to automatically detect CSV entries with similar values in a specific column. Then depending on your settings, when several entries with the same unique value are detected, the entries are either dropped or merged.

Datablist is great to perform simple data operations that are not possible with spreadsheets.

In this guide, we will work with a single CSV file that contains thousands of entries. Not only we will remove duplicate entries, but we will merge data from the matching entries and consolidate them.

You can download the tutorial CSV file here!

The CSV file contains 4 columns: First Name, Last Name, Email, Job Title. We want to merge entries that share the same email address. The process can be summed up with:

  • Load your CSV file into a Datablist collection
  • Create properties matching CSV columns
  • Define a unique constraint on the email property
  • Import the CSV file

Step 1: Load your CSV file into a Datablist collection

The first step in our process is to load the CSV file into Datablist. Log in to Datablist or join the beta to get started.

When creating a new collection, we can select data types with built-in properties (see also our article on common data models) or create an empty collection with no property.

For this example, we will select an empty collection so the properties will be created from the CSV columns.

Once the collection is created, click on the "Import CSV" button.

Create a new collection

Step 2: Create properties for your CSV columns

With the CSV file now loaded, we can create properties for our collection using the CSV column names. Datablist lists every column found in the CSV so we can create a property for each of them.

Create collection properties

Step 3: Define your unique constraint

Here is the important step! We want to remove duplicates entries from the CSV! In our example, duplicates are entries with the same email address. To add this constraint, we add the Unique values option on the email property.

With this Unique values constraint, Datablist will ensure two entries can't share the same email. When it happens, entries will be merged or skipped depending on the option we choose on the Import step.

Define a unique constraint on the email property

Step 4: Review and import

CSV Join Merging Options Merging Options

The last step is to select the merging mode. What should Datablist do when several entries have the same Email?

The default mode is "Soft Merge". When an entry from the CSV has a value (Email in our example) that already exists (in the CSV entries, or the collection), it will merge the two entries without overwriting existing data (see schema above).

Review and import!

We click the "Import items" button and we are done! 💪 Duplicate CSV entries are merged and we have a clean collection! If needed, we are now able to import other CSV files into this collection.

Export to CSV if needed

If you need to use the result in another tool, click on the "Export" button to export the collection as another CSV file.

CSV Export CSV Export

If you have any questions, please contact us. And if you want to join the beta, please fill in the form bellow 👇.