CSV encoding defines how characters are stored inside a CSV file.
If the wrong encoding is used, names, accents, currencies, and symbols can break. For example, é can become é, and company names can become unreadable.
Common CSV encodings
The most common encodings are:
UTF-8: the default for modern CSV filesUTF-8 with BOM: often used by Excel to detect UTF-8ISO-8859-1: common in older Western European exportsWindows-1252: common in older Microsoft exports
⚠️ Encoding problems are data quality problems
If names or addresses break during import, deduplication and enrichment can fail later. Fix encoding before cleaning or matching records.
Signs of a CSV encoding problem
You might have an encoding issue when you see:
- Broken accents, such as
François - Question marks inside names
- Strange symbols in addresses
- Currency symbols replaced by boxes
- Columns that import correctly but values look corrupted
Encoding is different from a CSV delimiter. Encoding controls characters. Delimiters control columns.
How to avoid CSV encoding issues
Use UTF-8 when you export a CSV file. If you work with Excel, test the file in a CSV editor before importing it into a CRM or enrichment workflow.
Datablist can open large CSV files in the browser and helps you inspect values before you run data cleaning, deduplication, or enrichment.
Learn more about the CSV format in What does CSV stand for? and What are CSV headers?.