Your production data pipeline breaks at 3 AM.
The error log: SyntaxError: Unexpected token in JSON at position 0. You check the CSV export from your CRM. Opens fine in Excel. Headers look correct. 50,000 customer records ready to import.
You run the import script again. Same error.
You open the file in a hex editor. First three bytes: EF BB BF.
That's the UTF-8 Byte Order Mark (BOM). Three invisible characters Excel added when it saved the file. Your import script doesn't expect them. Your JSON parser chokes on them. Your database loader rejects the file.
TL;DR
CSV import fails with "Unexpected token" or encoding errors? Excel likely added a UTF-8 BOM (three bytes: EF BB BF). Detect with hex editor or browser-based tools. Remove using text editor encoding options or client-side processing—fix locally in 30 seconds, never upload customer data to third-party BOM removers.
Quick 30-Second BOM Fix
Got a CSV failing with encoding errors right now? Start here:
- Detect BOM → Open file in VS Code, check status bar for "UTF-8 with BOM"
- Remove BOM → Click status bar → "Save with Encoding" → Select "UTF-8" (no BOM)
- Save and retry import → File now clean, import should succeed
Alternative for non-technical users: Drag CSV into browser-based detection tool → Tool detects BOM instantly → Click "Remove BOM" if present → Download clean file.
This fixes 95% of BOM-related import failures. For understanding why this happens and preventing it, continue below.
Table of Contents
- What is the Byte Order Mark (BOM)?
- Why BOM Breaks CSV Imports
- Common BOM Error Messages
- The Privacy Risk of Upload-Based BOM Removal
- How to Detect BOM in CSV Files
- How to Remove BOM (3 Methods)
- Why Excel Adds BOM
- Common BOM Scenarios
- What This Won't Do
- Additional Resources
- FAQ
- Conclusion
What is the Byte Order Mark (BOM)?
The Byte Order Mark (BOM) is a Unicode character (U+FEFF) placed at the start of a text file to indicate encoding type and byte order. In UTF-8 files, the BOM appears as three bytes: EF BB BF in hexadecimal.
While optional in UTF-8 (byte order is irrelevant in UTF-8 per the Unicode Standard), some programs—particularly Windows applications like Excel—add BOM by default when saving UTF-8 files.
What BOM looks like in different views:
Hex editor view:
00000000: EFBB BF6E 616D 652C ....name,
Text editor (if visible):
name,email,phone
What it should be:
name,email,phone
The first three bytes (EF BB BF) are the UTF-8 BOM. They're invisible in Excel but visible in hex editors and can break strict CSV parsers.
Why BOM Breaks CSV Imports
The Technical Problem
CSV parsers expect the first line to contain column headers. When a UTF-8 BOM is present, the first three bytes (EF BB BF) appear before the actual headers.
What the parser sees:
name,email,phone
John Doe,[email protected],555-0100
What the parser expects:
name,email,phone
John Doe,[email protected],555-0100
The result: The first column header is interpreted as name (with BOM characters) instead of name. Column matching fails. Import rejected.
Real-World Impact
Database imports fail: PostgreSQL's COPY command rejects BOM-prefixed CSVs with encoding errors. MySQL's LOAD DATA INFILE misinterprets the first column name.
JSON conversion breaks: Converting CSV to JSON produces:
[
{
"name": "John Doe",
"email": "[email protected]"
}
]
Your application expects a name key. It gets name. Field mapping fails.
API imports rejected: REST APIs expecting clean CSV headers return validation errors. Salesforce imports fail on BOM-prefixed files. HubSpot bulk uploads reject the file.
Automated pipelines crash: ETL scripts that parse CSV headers break when column names don't match expectations. Data warehouses reject BOM files during scheduled loads.
The fix: Remove the BOM before processing. But how you remove it determines whether you expose customer data or maintain privacy.
Common BOM Error Messages
If you see these errors, BOM is likely the cause:
JavaScript/Node.js:
SyntaxError: Unexpected token in JSON at position 0
Python CSV module:
_csv.Error: field larger than field limit (131072)
(BOM causes the first field to be read incorrectly, sometimes triggering field size limits)
PostgreSQL:
ERROR: invalid byte sequence for encoding "UTF8": 0xef 0xbb 0xbf
MySQL:
Error 1300: Invalid utf8 character string
Salesforce Data Import:
Error: We can't find a column header row. Check your file format.
Java CSV Parser:
IllegalArgumentException: Header column 'id' not found
All of these stem from the same root cause: three invisible bytes at the start of the file.
The Privacy Risk of Upload-Based BOM Removal
What Happens When You Upload
Popular BOM removal tools (ConvertCSV, Datablist, CSVLint) follow this workflow:
- Upload: Entire CSV transmitted to their servers
- Processing: Server reads first three bytes, removes BOM if present, generates new file
- Download: BOM-free file returned
- Deletion: File allegedly deleted from server
The issue: You're uploading customer data to remove three bytes. The data processed: names, emails, addresses, transaction history, health records, financial data—everything in your CSV.
GDPR Article 5(1)(c) — Data Minimization:
"Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed."
Question: Is uploading 50,000 customer records to a third party "necessary" to remove a 3-byte encoding marker?
Answer: No. The BOM can be detected and removed locally without transmitting data.
The Audit Trail Problem
Regulator question: "How do you verify the tool deleted customer data after processing?"
Your answer: "They said they delete files."
Problem: No audit trail. No deletion verification. No proof of compliance.
GDPR Article 28 requires Data Processing Agreements (DPAs) with third-party processors. Most free BOM removal tools don't offer DPAs, which can create compliance gaps for regulated industries.
According to IBM's research on data quality, poor data governance and unauthorized data processing cost organizations an average of $12.9 million annually.
How to Detect BOM in CSV Files
Method 1: Visual Inspection (Often Invisible)
Open the CSV in a plain text editor (Notepad++, Sublime Text, VS Code).
In Notepad++: View → Show Symbol → Show All Characters. BOM appears as  at the start.
In VS Code: Check the status bar at bottom-right. If it says "UTF-8 with BOM", BOM is present.
Problem: Not all editors make BOM visible. Excel hides it completely.
Method 2: Hex Editor (Definitive)
Open the file in a hex editor (HxD on Windows, xxd on Linux/Mac).
BOM present:
00000000: EFBB BF6E 616D 652C ....name,
No BOM:
00000000: 6E61 6D65 2C65 6D61 name,ema
If the first three bytes are EF BB BF, BOM is present.
Method 3: Command Line
Linux/Mac:
file -b --mime-encoding yourfile.csv
Output: utf-8 (no BOM) or utf-8-bom (BOM present).
Windows PowerShell:
Get-Content yourfile.csv -Encoding Byte -TotalCount 3
If output is 239 187 191 (decimal for EF BB BF), BOM is present.
Method 4: Browser-Based Detection (Recommended)
Use Format Checker which detects BOM without uploading files. Drag your CSV into the browser—BOM detection happens entirely client-side.
- No upload required
- Instant detection
- Zero data exposure
How to Remove BOM (3 Methods)
Method 1: Text Editor (Simplest)
Notepad++ (Windows):
- Open CSV
- Encoding → Convert to UTF-8 (without BOM)
- Save
VS Code:
- Open CSV
- Click "UTF-8 with BOM" in status bar
- Select "Save with Encoding" → Choose "UTF-8"
- Save
Sublime Text:
- Open CSV
- File → Save with Encoding → UTF-8
- Ensure "UTF-8 with BOM" is NOT selected
Limitation: Manual process, doesn't scale for batch operations.
Method 2: Command Line Removal
Linux/Mac:
# Remove BOM using sed
sed -i '1s/^\xEF\xBB\xBF//' yourfile.csv
Windows PowerShell:
# Read content without BOM, write as UTF-8
Get-Content yourfile.csv | Set-Content -Encoding UTF8 yourfile_no_bom.csv
Python script:
with open('input.csv', 'r', encoding='utf-8-sig') as f:
content = f.read()
with open('output.csv', 'w', encoding='utf-8') as f:
f.write(content)
Note: Python's utf-8-sig encoding automatically handles BOM removal when reading.
Advantage: Scriptable for batch processing
Limitation: Requires technical knowledge, command-line access
Method 3: Browser-Based Tool (Recommended for Privacy)
Use browser-based detection tools:
- Drag CSV into browser
- Tool detects BOM automatically
- Click "Remove BOM" (if present)
- Download BOM-free file
Processing speed: 500K rows in ~2 seconds
Data exposure: Zero (file never leaves your computer)
GDPR compliance: Yes (local processing = no data transfer)
This is the recommended approach for any file containing customer data.
Why Excel Adds BOM
Excel's Encoding Behavior
When you save a CSV in Excel per Excel specifications:
- "CSV (Comma delimited)" format: Saves as ANSI (Windows-1252) on Windows, no BOM
- "CSV UTF-8 (Comma delimited)" format: Saves as UTF-8 with BOM by default
Why Excel adds BOM: Windows text editors (Notepad) historically used BOM to detect UTF-8 files. Excel follows this convention for compatibility.
The problem: Most modern CSV parsers, databases, and APIs expect UTF-8 without BOM. Excel's default behavior causes compatibility issues.
How to Save CSV from Excel Without BOM
There's no built-in option. Excel always adds BOM when saving as "CSV UTF-8."
Workarounds:
-
Save as regular CSV (ANSI): File → Save As → CSV (Comma delimited). This avoids BOM but may lose special characters.
-
Save as UTF-8 with BOM, then remove BOM: Use one of the methods above to strip BOM after Excel saves.
-
Use Google Sheets: Exports CSV as UTF-8 without BOM by default.
-
Export from source system directly: Skip Excel entirely. Export from database/CRM/analytics platform directly to CSV.
Common BOM Scenarios
Scenario 1: CRM Export → Database Import
Problem: HubSpot exports contacts to CSV. You open in Excel to check data. Save file. Import to PostgreSQL fails with encoding error.
Cause: Excel added BOM when you saved the file.
Solution: Use browser-based detection to detect BOM before import. Remove BOM if present. Import succeeds.
Scenario 2: API Data → CSV → JSON Conversion
Problem: You download CSV from a REST API, convert to JSON for frontend processing. JSON parser throws "Unexpected token" error.
Cause: API returned CSV with BOM. JSON converter doesn't strip BOM before parsing.
Solution: Remove BOM before JSON conversion using text editor or browser-based tool.
Scenario 3: Automated ETL Pipeline
Problem: Nightly ETL job pulls CSVs from SFTP server, loads to data warehouse. Pipeline runs successfully for months. Suddenly fails with column mismatch error.
Cause: Upstream vendor changed export process. New CSVs include BOM.
Solution: Add BOM detection/removal step to pipeline. Use command-line tools or integrate privacy-first processing.
Scenario 4: Bulk Email Import
Problem: Marketing team exports email list from Excel, uploads to email platform (Mailchimp, SendGrid). First contact's name appears as John Doe in emails.
Cause: Excel's BOM export. Email platform didn't strip BOM during import.
Solution: Remove BOM before upload using text editor encoding conversion or browser-based detection tool. Verify headers are clean.
What This Won't Do
BOM removal solves encoding errors, but it's not a complete data processing solution. Here's what this guide doesn't cover:
Not a Replacement For:
- Data validation - Doesn't verify data integrity, check for null values, or validate field formats
- Character encoding conversion - Only removes BOM, doesn't convert between encodings (UTF-8 → ANSI, etc.)
- CSV structure repair - Won't fix broken delimiters, mismatched quotes, or malformed rows
- Data transformation - No column mapping, data cleaning, or format standardization
- Database migration tools - Doesn't replace ETL platforms or database import utilities
Technical Limitations:
- BOM detection only - Detects UTF-8 BOM (EF BB BF), not UTF-16 or UTF-32 BOM variants
- File-level operation - Removes BOM from file start only, doesn't fix encoding issues throughout file
- No automatic prevention - Can't prevent Excel from adding BOM during save
- Single encoding issue - Solves BOM specifically, not all Unicode or encoding problems
Best Use Cases: This workflow excels at detecting and removing UTF-8 BOM from CSV files before import. For comprehensive data cleaning, encoding conversion, or ETL workflows, combine BOM removal with dedicated data processing tools.
Dealing with other CSV import errors? See our complete guide: CSV Import Errors: Every Cause, Every Fix (2026)
Additional Resources
Unicode & BOM Standards:
- Unicode FAQ: UTF-8, UTF-16, UTF-32 & BOM - Official Unicode Consortium documentation on BOM usage and recommendations
- W3C: The Byte Order Mark in HTML - W3C guidance on BOM handling in web contexts
Database Documentation:
- PostgreSQL COPY Command - Official PostgreSQL documentation on CSV import and encoding
- MySQL LOAD DATA INFILE - MySQL documentation on bulk CSV loading
Excel & Encoding:
- Excel Specifications and Limits - Microsoft documentation on Excel's CSV export behavior
Privacy & Compliance:
- GDPR Official Website - General Data Protection Regulation compliance resources for data processing
- IBM Data Quality Assessment - Enterprise data governance practices
FAQ
The Bottom Line
BOM is a three-byte encoding marker that breaks CSV imports, JSON parsing, and database loads. Most teams upload files to third-party tools to remove it—exposing customer data to fix a trivial encoding issue.
The privacy-first alternative: Client-side detection and removal that processes files entirely in your browser.
Common mistakes:
- Uploading customer CSVs to BOM removal tools without Data Processing Agreements
- Assuming Excel's "UTF-8" export is safe (it adds BOM by default)
- Debugging import errors for hours without checking for BOM
- Manually editing large files when automated tools process in seconds
The solution: Text editor encoding options or browser-based tools that detect and remove BOM locally at 500K rows/second.