CSV encoding defines how characters are stored inside a CSV file.

If the wrong encoding is used, names, accents, currencies, and symbols can break. For example, é can become é, and company names can become unreadable.

Common CSV encodings

The most common encodings are:

  • UTF-8: the default for modern CSV files
  • UTF-8 with BOM: often used by Excel to detect UTF-8
  • ISO-8859-1: common in older Western European exports
  • Windows-1252: common in older Microsoft exports

⚠️ Encoding problems are data quality problems

If names or addresses break during import, deduplication and enrichment can fail later. Fix encoding before cleaning or matching records.

Signs of a CSV encoding problem

You might have an encoding issue when you see:

  • Broken accents, such as François
  • Question marks inside names
  • Strange symbols in addresses
  • Currency symbols replaced by boxes
  • Columns that import correctly but values look corrupted

Encoding is different from a CSV delimiter. Encoding controls characters. Delimiters control columns.

How to avoid CSV encoding issues

Use UTF-8 when you export a CSV file. If you work with Excel, test the file in a CSV editor before importing it into a CRM or enrichment workflow.

Datablist can open large CSV files in the browser and helps you inspect values before you run data cleaning, deduplication, or enrichment.

Learn more about the CSV format in What does CSV stand for? and What are CSV headers?.