Fix CSV Encoding Issues — UTF-8 BOM, Delimiters, and Character Errors
CSV looks simple — until you open a file and see  before the first header, accented characters display as é, or Excel shows all data in one column. Here's how to fix every common CSV encoding issue.
Common Error Messages & Symptoms
- First column header starts with
orname→ UTF-8 BOM - Characters display as
éinstead ofé→ UTF-8 read as Latin-1 - Characters display as
éorâ€"→ Mojibake (double encoding) - All data in one column in Excel → Wrong delimiter
- Rows shifted / extra rows → Newline inside quoted field
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff→ File is UTF-16
Fix 1: UTF-8 BOM (Byte Order Mark)
The #1 CSV encoding issue. Excel adds EF BB BF (3 bytes) at the start of UTF-8 files. Most parsers treat it as part of the first field.
Detect:
xxd file.csv | head -1
# BOM present: efbb bf4e 616d 65 → "Name"
# No BOM: 4e61 6d65 → "Name"
Fix (Node.js):
import { readFileSync } from 'fs';
let csv = readFileSync('data.csv', 'utf8');
// Strip BOM if present
if (csv.charCodeAt(0) === 0xFEFF) {
csv = csv.slice(1);
}
Fix (Python):
# Use utf-8-sig encoding — automatically strips BOM
with open('data.csv', encoding='utf-8-sig') as f:
reader = csv.reader(f)
for row in reader:
print(row) # First header is clean
Fix (creating CSVs for Excel):
// Add BOM so Excel opens UTF-8 correctly
const BOM = '\uFEFF';
const csv = BOM + 'Name,Email\nJohn,[email protected]\n';
fs.writeFileSync('export.csv', csv, 'utf8');
Fix 2: Wrong Delimiter
European CSVs often use ; instead of , (because , is the decimal separator in Europe).
Detect:
head -2 file.csv
# Comma-separated: Name,Email,City
# Semicolon-separated: Name;Email;City
# Tab-separated: Name\tEmail\tCity
Fix (Python pandas):
# Auto-detect delimiter
import pandas as pd
df = pd.read_csv('data.csv', sep=None, engine='python')
# Or specify explicitly
df = pd.read_csv('data.csv', sep=';') # Semicolon
df = pd.read_csv('data.csv', sep='\t') # Tab (TSV)
Fix (Node.js with PapaParse):
const Papa = require('papaparse');
const result = Papa.parse(csvString, {
delimiter: '', // Auto-detect
// Or: delimiter: ';'
});
Test with: sample CSV files — includes comma, semicolon, and tab-delimited variants.
Fix 3: Newlines Inside Quoted Fields
RFC 4180 allows newlines inside fields if the field is quoted. Many parsers break on this.
Name,Bio
"John","Likes coding.
Also likes coffee."
"Jane","One-liner bio"
Fix: Use a proper CSV parser that handles quoted fields, not line.split(','):
// WRONG — breaks on quoted newlines
const rows = csv.split('\n').map(line => line.split(','));
// RIGHT — use PapaParse or csv-parse
const Papa = require('papaparse');
const { data } = Papa.parse(csv, { header: true });
Fix 4: Character Encoding Detection
# Detect encoding with chardet
import chardet
with open('mystery.csv', 'rb') as f:
raw = f.read(10000)
result = chardet.detect(raw)
print(result) # {'encoding': 'ISO-8859-1', 'confidence': 0.73}
# Then read with detected encoding
import pandas as pd
df = pd.read_csv('mystery.csv', encoding=result['encoding'])
Fix 5: Excel-Specific Issues
Excel has unique CSV handling quirks:
// Force Excel to recognize UTF-8
const BOM = '\uFEFF';
const csv = BOM + Papa.unparse(data);
// Force number fields as text (prevent 0001 → 1)
// Prefix with = and quote: ="0001"
Test with Sample CSVs
| Test | File | What It Tests | |------|------|--------------| | Standard CSV | sample-users.csv | Basic parsing | | Large CSV | sample-1M-rows.csv | Memory/streaming | | Semicolon-delimited | sample-semicolon.csv | Delimiter detection | | Unicode data | sample-unicode.csv | Encoding handling |
See our CSV sample files for all variants.
Quick Reference
| Issue | Symptom | Fix |
|-------|---------|-----|
| BOM |  before header | encoding='utf-8-sig' (Python) or strip \uFEFF |
| Wrong delimiter | One column in Excel | Detect and specify: , ; \t |
| Mojibake | é instead of é | Re-read with correct encoding |
| Newlines in fields | Rows shifted | Use proper CSV parser, not split |
| Numbers as text | 0001 becomes 1 | Prefix with = or use .xlsx |