Cleaning duplicates involves more than hitting "delete"!

Some records are exact matches. Others contain conflicting values. Many are complementary and need merging instead of removal.

Depending on your workflow, you may need to merge records, update a master record, or simply flag duplicates for review.

Basic tools delete rows without understanding field priority or business rules. That approach destroys useful data.

Solid deduplication requires clear logic. Define how to select the master item, how to resolve conflicts, and what to do with secondary records.

This article explains the practical methods to merge, update, and remove duplicates in CSV files, Excel sheets, and CRMs.

Let’s go!

📌 Summary For Those In a Rush

This article covers everything you need to know about deduplicating your spreadsheets, including how to merge, update, and remove duplicates the right way.

Problem: Without understanding prioritization patterns and bulk actions, you'll either lose important data or keep the wrong records when dealing with duplicates.

Solution: Datablist offers three dedupe methods: simple merging and removal, AI-powered editing for complex rules, and multi-file deduplication.

The Deduplication Methods We Cover:

  1. Simple duplicates merging and removal on a single file
  2. AI editing for complex prioritization rules before removing
  3. Removing duplicates across multiple files

In The Next 10 Minutes You’ll Learn

Why You Should Listen To Us

Datablist is a platform for building lead generation workflows that allows currently 26000 users to find, enrich, and clean data using over 60 different tools from AI Agents to Email Finders, AI processors, Technology enrichments, and more.

Additionally, Datablist features an extensive deduplication suite that allows you to merge, update, remove, or flag duplicates with just a few clicks, without coding.

Three Ways To Deduplicate Spreadsheet Files - Why You Can Trust Datablist
Three Ways To Deduplicate Spreadsheet Files - Why You Can Trust Datablist

Understanding Deduplication Fundamentals

Before moving with how to deduplicate your list, here are the principles behind the different deduplication techniques.

This section will cover:

What You Need To Understand: Deduplication Fundamentals

The following points are only relevant for single-file deduplication. For multi-file deduplication, you can only delete your copies from certain files, and not merge or update, making understanding these principles helpful rather than mandatory

By default, Datablist tries to merge duplicate records automatically. In practice, this doesn’t always work since most users have conflicting duplicates.

Three Ways To Deduplicate Spreadsheet Files - Conflicting Duplicates
Three Ways To Deduplicate Spreadsheet Files - Conflicting Duplicates

When conflicts exist, the process relies on two concepts:

  • Prioritization patterns to choose the master record in a duplicate group
  • Bulk actions to handle the secondary records in this duplicate group

Understanding Duplicate Types

We classify duplicates by how similar their fields are.

  1. Exact duplicates: all columns contain identical values. These usually come from double imports or accidental copy-paste.
  2. Conflicting duplicates: records represent the same entity but conflict on some fields like phone, job title, or revenue.
  3. Complementing duplicates: each record holds different useful data that should be combined. One record might have an email address while its duplicate has a phone number, making them complementary.
Three Ways To Deduplicate Spreadsheet Files - Duplicate Types
Three Ways To Deduplicate Spreadsheet Files - Duplicate Types

First: Determining a Prioritization Pattern

You must decide which record becomes the reference record. We call this the Master Item Rule. Remember this term, you’ll need it later.

Example Patterns/Master Item Rules:

  • Most complete: Keeps the record with the most filled-in fields
  • Last updated: Keeps the most recently modified record
  • First created: Keeps the oldest record
  • Lowest value: Keeps the record with the smallest number in a specific column
  • Highest value: Keeps the record with the largest number in a specific column
  • Matching value: Keeps the record that matches a specific value on a property you define

📘 Master Item Rules

Important: "Last updated" and "First created" are only relevant for data that's been actively managed in Datablist over time. If you've just uploaded your file, these options won't work because imported spreadsheets don't include this metadata.

We recommend choosing “Most complete” or using the technique explained in the second part of this section if you’re not sure which master item rule to choose.

For complex cases, Datablist allows you to use AI to create custom prioritization patterns, for example: If column A contains “Hello people”, and column B contains “of Germany”.

More on this in the second part of the step-by-step section.

Three Ways To Deduplicate Spreadsheet Files - Master Item Rules
Three Ways To Deduplicate Spreadsheet Files - Master Item Rules

Second: Choosing a Bulk Action

When you’ve chosen your prioritization pattern, the next thing you need to do is figure out what you want to do with the records that don’t match that pattern.

Example Bulk Actions To Process Duplicates:

  • Delete secondary items
  • Merge the Master Item and the secondary item into one record
  • Merge selected properties of the secondary item with the Master Item, and delete the rest
  • Update selected properties of the Master Item with the values of the secondary item
  • Flagging duplicates without deleting them. This is especially valuable if you work in a large organization, and the secondary items are needed for compliance purposes
  • …. and everything else you can think of

📘 Understanding Merging Duplicates vs. Updating Duplicates

Merging is to combine the values of both records. This is especially good for duplicated CRM contacts where you have notes in both records

Updating means replacing specific values with better data from another source. Use it when each duplicate has some correct information, like keeping contact A, but fixing its job title using the accurate one from contact B.

Three Ways To Deduplicate Spreadsheet Files - Deduplication Strategy
Three Ways To Deduplicate Spreadsheet Files - Deduplication Strategy

Questions To Ask Yourself Before Deduplicating a List

Now that you understand patterns and bulk actions, use the following questions to quickly determine your prioritization pattern and what to do with the rest.

Which Record Should Be Your Master Item?

This question helps you determine your prioritization pattern. Think about what makes one duplicate "better" than the other.

Ask yourself:

  • Is there one record that's more complete than the others?
  • Did one record come from a more reliable source?
  • Is one record more recent or freshly updated?
  • Does one record have a specific value that makes it the "correct" version?

Your answer determines your Master Item rule:

  • If completeness matters most → use "Most complete"
  • If recency matters most → use "Last updated" or "First created"
  • If a specific value determines the winner → use "Matching value"
  • If the logic is more complex → use AI Editing (Method 2)

What Should Happen to the Non-Master Records?

This question helps you determine your bulk action. Once you've picked a winner, what do you want to do with the losers?

Ask yourself:

  • Do the other records have any valuable data I want to keep?
  • Should I combine information from multiple records into one?
  • Do I just need to delete the extras and move on?
  • Do I need to flag duplicates for review instead of deleting them?

Your answer determines your bulk action:

  • If other records have no value → simply drop all conflicting values/ delete them
  • If other records have useful data → combine the conflicting values or update the master item
  • If you need compliance records → flag duplicates without deleting
  • If you need to cherry-pick specific values → use AI Editing (Method 2)
Three Ways To Deduplicate Spreadsheet Files - It’s Really Simple
Three Ways To Deduplicate Spreadsheet Files - It’s Really Simple

Deduplication: Cleaning Duplicate Records From Your Data

Datablist has a deduplication suite that handles everything from simple duplicate removal to multi-file deduplication. Therefore, this section will feature 3 different workflows:

  1. Merging and removing duplicates on a single file based on simple rules
  2. Updating and removing duplicates on a single file with complex rules
  3. Removing duplicates across multiple files; no merging possible

Let’s get started!

How Datablist Handles Duplicates - Quick Revisit

If you have read the last section, you can skip this; if you haven’t, use this simple summary so you understand exactly what you will be doing.

  1. Datablist scans your data and finds rows that have matching information in the columns you specify.
  2. When it finds duplicates, it lets you auto-merge them for exact matches
  3. If you have conflicting duplicates, it asks you to choose a pattern by which to prioritize one record over the other (we call it the "Master Item Rule").
  4. When you’ve defined your Master Item Rule, it allows you to merge, update, flag, or delete the second duplicate record from the pair.

Simple Duplicates Merging & Removal On A Single File

This is the simplest way to remove duplicates. You have a list with some entries appearing more than once, and you want to keep only one copy of each record.

When it's useful:

  • You imported the same CSV file twice by accident
  • Your CRM export contains duplicate contacts
  • Scraped data has repeated entries from pagination errors

Step 1: Sign Up And Upload Your Data

  1. Sign up for Datablist
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
  1. Upload your CSV or Excel
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page

Step 2: Navigate to the Duplicates Finder

Click on Clean in the top menu of the app and select Duplicates Finder

Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts
Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts

Step 3: Choose Your Unique Identifier

In this step, you’ll have two options:

Option 1: Choose one or a few columns as a unique identifier - RECOMMENDED

Think of a unique identifier as the piece of information that makes each record special. For example:

  • Using one column: If you choose "Email" as your unique identifier, then john@example.com will be considered unique even if everything else matches
  • Using multiple columns: If you choose "First Name" + "Company" together, then "John" at "Microsoft" is different from "John" at "Google"

The more columns you select, the stricter the matching becomes. We recommend starting with just one or two columns that truly identify unique records in your data.

Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier
Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier

Option 2: Deduplicate based on all properties - NOT RECOMMENDED

This option will check if every single column in a row matches exactly with another row. This means that two rows are only considered duplicates if all their data is identical.

Why we don't recommend this: In real-world data, duplicates rarely match perfectly across all columns. For example, the same person might have slightly different job titles, or the same company might have different employee counts from different sources. If you use this option, you'll miss most duplicates.

Three Ways To Deduplicate Spreadsheet Files - Exact Duplicates Are Rare
Three Ways To Deduplicate Spreadsheet Files - Exact Duplicates Are Rare

When the second option might be useful: Use this only if you're looking for exact duplicate rows that were imported twice by mistake, where literally every field is identical.

Once you selected the properties you want to deduplicate on, scroll down and click on Next

Step 4: Select Comparison Algorithm

In this step, you have to select a comparison algorithm and processor for each property you want to deduplicate on. We recommend keeping the default settings except for company names.

Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors
Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors

If you’re deduplicating based on company names: If you’re deduplicating based on company names, then choose the company names processor since it’s the only one that Datablist can not automatically detect.

Step 5: Select Master Item, Review, and Resolve Conflicts

  1. Choose master item rule: As explained in the first section, Datablist always asks you to specify a Master Item rule. The default rule is “Most Complete”, but you can also choose another one.
Three Ways To Deduplicate Spreadsheet Files - Master Item Rule Selection
Three Ways To Deduplicate Spreadsheet Files - Master Item Rule Selection
  1. Review and resolve conflicts if necessary: Many times, when you have duplicates, they are not identical on all properties. That’s also the reason we ask you to specify a master item

    To resolve conflicts, you can choose either combine or drop the conflicting values. However, combining values works only for text-based properties, so if you have numbers, date time, etc., you’ll need to combine both rules; combining and dropping.

Three Ways To Deduplicate Spreadsheet Files - Merge Settings
Three Ways To Deduplicate Spreadsheet Files - Merge Settings
  1. Click on Refresh Merging Preview to see the changes that will be made
Three Ways To Deduplicate Spreadsheet Files - Dropping Conflicting Values
Three Ways To Deduplicate Spreadsheet Files - Dropping Conflicting Values

Step 6: Running and Reviewing

Now, the only thing you’ll need to do is to click Auto-merge when possible.

Three Ways To Deduplicate Spreadsheet Files - Merging Preview
Three Ways To Deduplicate Spreadsheet Files - Merging Preview

Once you’ve merged your duplicates, Datablist will let you download the changes that have been made as a CSV. The file will include:

  • All duplicates you had in your file
  • The records where those duplicates have been merged to
  • The changes that have been made
  • The Datablist record ID

Downloading that file is optional

Three Ways To Deduplicate Spreadsheet Files - Deduping Successful
Three Ways To Deduplicate Spreadsheet Files - Deduping Successful

💡 If You’ve Made Any Mistakes

You can also revert the changes you made by clicking the history button and undoing the actions when going back to your spreadsheet view.

Editing Duplicates Before Removing Them

Sometimes the simple master item rules aren't enough. What if you want to keep the phone number from one record but the job title from another? This is where AI Editing comes in.

How it works: Instead of choosing a preset rule, you describe exactly what you want in plain English. Datablist's AI reads your instructions, generates a script, and applies your custom logic to every duplicate group.

When It's Useful:

  • You have contacts from multiple sources (CRM, LinkedIn, phone lists) and want to combine the best data from each
  • Your duplicates have different fields filled in, and you want to cherry-pick specific values
  • You need custom logic that doesn't fit the standard master item rules
  • You want to update records before deleting them, not just pick a winner
  • You want to flag the duplicates instead of deleting them for compliance reasons

Step 1: Sign Up And Upload Your Data

  1. Sign up for Datablist
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
  1. Upload your CSV or Excel
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page

Step 2: Navigate to the Duplicates Finder

Click on Clean in the top menu of the app and select Duplicates Finder

Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts
Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts

Step 3: Choose Your Unique Identifier

Select the column(s) you want to use for matching duplicates. Once selected, scroll down and click on Next

Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier
Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier

Step 4: Select Comparison Algorithm

Select a comparison algorithm and processor for each property you want to deduplicate on. We recommend keeping the default settings except for company names.

Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors
Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors

Step 5: Open AI Editing

Instead of selecting a master item rule, click on AI Editing in the deduplication panel.

Three Ways To Deduplicate Spreadsheet Files - AI Deduplication
Three Ways To Deduplicate Spreadsheet Files - AI Deduplication

Step 6: Write Your Prompt

Describe what you want in plain English. Here's a practical example:

Let's say you have contact data from two sources: phone verification and LinkedIn scraping. The phone records have verified numbers, but LinkedIn has updated job titles and company names. You want to keep the phone record as the master but update it with LinkedIn data.

Here's the prompt I used:

Select the records with "Phone" as source as master item and update them with the job title and company name from the record with the "LinkedIn" as source. 

The source: /source
The job title: /job title
The company name: /company

Delete the second item when finished

Note: Don’t forget to map your properties to the prompt using /

Click Generate and preview changes when ready

Three Ways To Deduplicate Spreadsheet Files - AI Prompt
Three Ways To Deduplicate Spreadsheet Files - AI Prompt

Step 7: Review and Apply the Changes

Datablist will show you exactly what changes the AI will make before applying them. Review the preview to make sure it matches your expectations.

Once you're happy with the preview, click Run AI Script to apply the changes to all duplicate groups. Then export your cleaned data.

Three Ways To Deduplicate Spreadsheet Files - AI Deduplication Preview
Three Ways To Deduplicate Spreadsheet Files - AI Deduplication Preview

💡 Prompt Tips for Better Results

Be very specific about your expectations. The more precisely you can describe what you want it to do, the better your results will be.

With This You Can Also:

  • Flag duplicates instead of deleting them: Write a prompt like "Add 'DUPLICATE' to the status column for all non-master items instead of deleting them"
  • Combine text fields: "Merge all notes from duplicate records into the master item's notes field, separated by line breaks"
  • Prioritize by source quality: "Use Salesforce records as master when available, otherwise use HubSpot, then spreadsheet imports"
  • …. or anything else you can think of.

Removing Duplicates Across Two Sheets or More

If you have two different CSV files and you want to find records that appear in both or deduplicate a new lead list against your existing CRM export, Datablist makes it simple.

How it works: Unlike single-file deduplication, this workflow compares records across multiple files and removes duplicates that span different data sources. You can select two files or more with no limit.

When It's Useful:

  • You're importing new leads and want to avoid duplicates with existing contacts
  • You're merging data from multiple vendors or sources
  • You need to find an overlap between two customer lists
  • You want to prevent contacting the same prospect twice
  • You need to consolidate customer data from various departments or branches
  • … and for many more data cleaning workflows

📘 Important Difference From Single-File Deduplication

When deduplicating across multiple files, Datablist removes duplicates entirely rather than merging them.

Step 1: Sign Up And Upload Your Files

  1. Sign up for Datablist
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
Three Ways To Deduplicate Spreadsheet Files - Datablist Homepage
  1. Import your first CSV or Excel file
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page
Three Ways To Deduplicate Spreadsheet Files - Datablist Start Page
  1. Import your second file into another collection (and any additional files you want to deduplicate across)
Three Ways To Deduplicate Spreadsheet Files - Import Second File
Three Ways To Deduplicate Spreadsheet Files - Import Second File
  1. Make Sure You Have a Unique Identifier

Before proceeding, confirm that all your files share at least one common column that can be used as a unique identifier. This could be:

  • Email address
  • LinkedIn URL
  • Company domain
  • Phone number
  • Any other field that uniquely identifies a record

Step 2: Navigate to the Duplicates Finder

Click on Clean in the top menu of the app and select Duplicates Finder

Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts
Three Ways To Deduplicate Spreadsheet Files - Duplicated Contacts

Step 3: Enable Multi-Collection Deduplication

  1. Check Check Duplicate Items Across Several Collections
  2. Select the collection(s) / file(s) you just imported
Three Ways To Deduplicate Spreadsheet Files - Multi File Selection
Three Ways To Deduplicate Spreadsheet Files - Multi File Selection

Step 4: Choose Your Unique Identifier Property

Select the property you want to use for matching duplicates across files. You can select multiple properties, but make sure all files contain these properties to keep your deduplication process accurate.

Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier
Three Ways To Deduplicate Spreadsheet Files - Choosing a Unique Identifier

Step 5: Select Comparison Algorithm

Choose the comparison mechanism that fits your data:

  • Exact: Best for URLs, domains, or IDs where you need exact matches
  • Smart: Best for text-based properties where slight variations might exist
Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors
Three Ways To Deduplicate Spreadsheet Files - Comparison Algorithm And Processors

Click on Run duplicates check once you've chosen the comparison method.

Step 6: Set Up Cleaning Rules

Choose how you want to handle the duplicates:

  • Remove duplicate items from collection X: Removes duplicates from your selected file
  • Keep duplicate items only in collection X: Only available when deduplicating across 3 or more collections

Click on Process duplicate items to continue.

Three Ways To Deduplicate Spreadsheet Files - Auto Cleaning Rule
Three Ways To Deduplicate Spreadsheet Files - Auto Cleaning Rule

That’s it!

Conclusion

Congrats, you reached the end, and know now more about deduplication than most people will ever learn. Here’s a quick recap of today’s most important lessons:

  1. Duplicates aren't all the same, and knowing which type you’re dealing with makes a big difference
  2. Picking the right Master Item and bulk action can save you hours of manual cleanup
  3. Unlike other tools that lock you into their way of doing things, Datablist lets you handle duplicates exactly how you need to

So whether you're merging contacts from a messy CRM, applying custom logic with AI, or cleaning new leads against your existing database, you've got the tools and the knowledge to do it right. Happy deduplicating!

Frequently Asked Question

How Does Datablist Decide Which Duplicate Record To Keep?

Datablist doesn't decide, you do. You choose a Master Item Rule (like "Most complete" or "Last updated") that tells Datablist which record to prioritize. If your logic is more complex, you can use AI Editing to define custom rules in plain English (our AI assistant will handle the rest).

What Differentiates Datablist's Deduplication And Matching Suite From Other Products?

Three things: flexibility, AI-powered customization, and price. Most tools only let you delete duplicates. Datablist lets you merge, update, flag, or delete based on rules you define. The AI Editing feature handles complex logic that other tools simply can't. And the next comparable product costs multiple thousands of dollars per year (enterprise software).

What If I Don't Want To Delete My Duplicates?

You can flag them instead. Use AI Editing and write a prompt like: "Add 'DUPLICATE' to the status column for all non-master items instead of deleting them." This is especially useful for compliance purposes or when you need to review duplicates before removing them.

What If The Master Item Rules Don't Fit My Use Case?

Use AI Editing. Instead of choosing a preset rule, you describe your logic in plain English, and Datablist's AI creates a custom script for you. For example: "Keep the record from Salesforce, but use the job title from LinkedIn."

Can I Create Custom Master Item Rules?

Yes. Datablist's AI Editing feature lets you write any prioritization rule you can describe. Want to keep records where column A contains a specific value? Or prioritize based on multiple conditions? Just type what you need, and the AI handles the rest.

What Is A Unique Identifier In Deduplication?

A unique identifier is the column (or combination of columns) that makes each record distinct. For example, if you use "Email" as your unique identifier, two rows with the same email are considered duplicates, even if other fields differ. You can also combine columns like "First Name" + "Company" for stricter matching.

How Can I Deduplicate A List With Conflicting Values?

Conflicting duplicates happen when two records represent the same entity but have different values in some fields. To handle them: (1) Choose a Master Item Rule to pick which record wins, (2) Decide whether to combine, drop, or update the conflicting values, (3) Use Datablist's deduplication suite to apply your choices in bulk. For complex cases, AI Editing lets you cherry-pick specific values from different records.

How Can I Flag Duplicates Without Deleting Them?

You can use Datablist’s AI Editing feature inside their Deduplication and Matching Suite. Simply write a prompt like: "Add 'DUPLICATE' to the status column for all non-master items instead of deleting them." This marks your duplicates for review while keeping all your data intact, perfect for compliance or when you need manual approval before removal.

How To Update Duplicated Records Without Deleting?

Updating duplicates means replacing specific values in your master record with better data from another source. For this, you can use Datablist’s AI Editing feature inside their Deduplication and Matching Suite. The only thing you need to do is describe what you want, for example: "Keep records from Source A, but update the job title and company name using values from Source B." The AI applies your logic to all duplicate groups, then you can delete the extras or keep them flagged.