Problem-solving

How to Fix BOM (Byte Order Mark) Issues in CSV Files: Complete Guide (2025)

January 1, 2026

By SplitForge Team

Your production data pipeline breaks at 3 AM with what turns out to be a BOM CSV error.

The error log: SyntaxError: Unexpected token in JSON at position 0. You check the CSV export from your CRM. Opens fine in Excel. Headers look correct. 50,000 customer records ready to import.

You run the import script again. Same error.

You open the file in a hex editor. First three bytes: EF BB BF.

That's the UTF-8 Byte Order Mark (BOM). Three invisible characters Excel added when it saved the file. Your import script doesn't expect them. Your JSON parser chokes on them. Your database loader rejects the file.

TL;DR

CSV import fails with "Unexpected token" or encoding errors? Excel likely added a UTF-8 BOM (three bytes: EF BB BF). Detect with hex editor or browser-based tools. Remove using text editor encoding options or client-side processing—fix locally in 30 seconds, never upload customer data to third-party BOM removers.

Quick 30-Second BOM Fix

Got a CSV failing with encoding errors right now? Start here:

Detect BOM → Open file in VS Code, check status bar for "UTF-8 with BOM"
Remove BOM → Click status bar → "Save with Encoding" → Select "UTF-8" (no BOM)
Save and retry import → File now clean, import should succeed

Alternative for non-technical users: Drag CSV into browser-based detection tool → Tool detects BOM instantly → Click "Remove BOM" if present → Download clean file.

This fixes 95% of BOM-related import failures. For understanding why this happens and preventing it, continue below.

What is the Byte Order Mark (BOM)?
Why BOM Breaks CSV Imports
Common BOM Error Messages
The Privacy Risk of Upload-Based BOM Removal
How to Detect BOM in CSV Files
How to Remove BOM (3 Methods)
Why Excel Adds BOM
Common BOM Scenarios
What This Won't Do
Additional Resources
FAQ
Conclusion

What is the Byte Order Mark (BOM)?

The Byte Order Mark (BOM) is a Unicode character (U+FEFF) placed at the start of a text file to indicate encoding type and byte order. In UTF-8 files, the BOM appears as three bytes: EF BB BF in hexadecimal.

While optional in UTF-8 (byte order is irrelevant in UTF-8 per the Unicode Standard), some programs—particularly Windows applications like Excel—add BOM by default when saving UTF-8 files.

What BOM looks like in different views:

Hex editor view:

00000000: EFBB BF6E 616D 652C  ....name,

Text editor (if visible):

ï»¿name,email,phone

What it should be:

name,email,phone

The first three bytes (EF BB BF) are the UTF-8 BOM. They're invisible in Excel but visible in hex editors and can break strict CSV parsers.

Why BOM Breaks CSV Imports

The Technical Problem

CSV parsers expect the first line to contain column headers. When a UTF-8 BOM is present, the first three bytes (EF BB BF) appear before the actual headers.

What the parser sees:

ï»¿name,email,phone
John Doe,[email protected],555-0100

What the parser expects:

name,email,phone
John Doe,[email protected],555-0100

The result: The first column header is interpreted as ï»¿name (with BOM characters) instead of name. Column matching fails. Import rejected.

Real-World Impact

Database imports fail: PostgreSQL's COPY command rejects BOM-prefixed CSVs with encoding errors. MySQL's LOAD DATA INFILE misinterprets the first column name.

JSON conversion breaks: Converting CSV to JSON produces:

[
  {
    "ï»¿name": "John Doe",
    "email": "[email protected]"
  }
]

Your application expects a name key. It gets ï»¿name. Field mapping fails.

API imports rejected: REST APIs expecting clean CSV headers return validation errors. Salesforce imports fail on BOM-prefixed files. HubSpot bulk uploads reject the file.

Automated pipelines crash: ETL scripts that parse CSV headers break when column names don't match expectations. Data warehouses reject BOM files during scheduled loads.

The fix: Remove the BOM before processing. But how you remove it determines whether you expose customer data or maintain privacy.

Common BOM Error Messages

If you see these errors, BOM is likely the cause:

JavaScript/Node.js:

SyntaxError: Unexpected token  in JSON at position 0

Python CSV module:

_csv.Error: field larger than field limit (131072)

(BOM causes the first field to be read incorrectly, sometimes triggering field size limits)

PostgreSQL:

ERROR: invalid byte sequence for encoding "UTF8": 0xef 0xbb 0xbf

MySQL:

Error 1300: Invalid utf8 character string

Salesforce Data Import:

Error: We can't find a column header row. Check your file format.

Java CSV Parser:

IllegalArgumentException: Header column 'ï»¿id' not found

All of these stem from the same root cause: three invisible bytes at the start of the file.

The Privacy Risk of Upload-Based BOM Removal

What Happens When You Upload

Popular BOM removal tools (ConvertCSV, Datablist, CSVLint) follow this workflow:

Upload: Entire CSV transmitted to their servers
Processing: Server reads first three bytes, removes BOM if present, generates new file
Download: BOM-free file returned
Deletion: File allegedly deleted from server

The issue: You're uploading customer data to remove three bytes. The data processed: names, emails, addresses, transaction history, health records, financial data—everything in your CSV.

GDPR Article 5(1)(c) — Data Minimization:

"Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed."

Question: Is uploading 50,000 customer records to a third party "necessary" to remove a 3-byte encoding marker?

Answer: No. The BOM can be detected and removed locally without transmitting data.

The Audit Trail Problem

Regulator question: "How do you verify the tool deleted customer data after processing?"

Your answer: "They said they delete files."

Problem: No audit trail. No deletion verification. No proof of compliance.

GDPR Article 28 requires Data Processing Agreements (DPAs) with third-party processors. Most free BOM removal tools don't offer DPAs, which can create compliance gaps for regulated industries.

According to IBM's research on data quality, poor data governance and unauthorized data processing cost organizations an average of $12.9 million annually.

How to Detect BOM in CSV Files

Method 1: Visual Inspection (Often Invisible)

Open the CSV in a plain text editor (Notepad++, Sublime Text, VS Code).

In Notepad++: View → Show Symbol → Show All Characters. BOM appears as ï»¿ at the start.

In VS Code: Check the status bar at bottom-right. If it says "UTF-8 with BOM", BOM is present.

Problem: Not all editors make BOM visible. Excel hides it completely.

Method 2: Hex Editor (Definitive)

Open the file in a hex editor (HxD on Windows, xxd on Linux/Mac).

BOM present:

00000000: EFBB BF6E 616D 652C  ....name,

No BOM:

00000000: 6E61 6D65 2C65 6D61  name,ema

If the first three bytes are EF BB BF, BOM is present.

Method 3: Command Line

Linux/Mac:

file -b --mime-encoding yourfile.csv

Output: utf-8 (no BOM) or utf-8-bom (BOM present).

Windows PowerShell:

Get-Content yourfile.csv -Encoding Byte -TotalCount 3

If output is 239 187 191 (decimal for EF BB BF), BOM is present.

Method 4: Browser-Based Detection (Recommended)

Use Format Checker which detects BOM without uploading files. Drag your CSV into the browser—BOM detection happens entirely client-side.

No upload required
Instant detection
Zero data exposure

How to Remove BOM (3 Methods)

Method 1: Text Editor (Simplest)

Notepad++ (Windows):

Open CSV
Encoding → Convert to UTF-8 (without BOM)
Save

VS Code:

Open CSV
Click "UTF-8 with BOM" in status bar
Select "Save with Encoding" → Choose "UTF-8"
Save

Sublime Text:

Open CSV
File → Save with Encoding → UTF-8
Ensure "UTF-8 with BOM" is NOT selected

Limitation: Manual process, doesn't scale for batch operations.

Method 2: Command Line Removal

Linux/Mac:

# Remove BOM using sed
sed -i '1s/^\xEF\xBB\xBF//' yourfile.csv

Windows PowerShell:

# Read content without BOM, write as UTF-8
Get-Content yourfile.csv | Set-Content -Encoding UTF8 yourfile_no_bom.csv

Python script:

with open('input.csv', 'r', encoding='utf-8-sig') as f:
    content = f.read()

with open('output.csv', 'w', encoding='utf-8') as f:
    f.write(content)

Note: Python's utf-8-sig encoding automatically handles BOM removal when reading.

Advantage: Scriptable for batch processing
Limitation: Requires technical knowledge, command-line access

Method 3: Browser-Based Tool (Recommended for Privacy)

Use browser-based detection tools:

Drag CSV into browser
Tool detects BOM automatically
Click "Remove BOM" (if present)
Download BOM-free file

Processing speed: 500K rows in ~2 seconds
Data exposure: Zero (file never leaves your computer)
GDPR compliance: Yes (local processing = no data transfer)

This is the recommended approach for any file containing customer data.

Why Excel Adds BOM

Excel's Encoding Behavior

When you save a CSV in Excel per Excel specifications:

"CSV (Comma delimited)" format: Saves as ANSI (Windows-1252) on Windows, no BOM
"CSV UTF-8 (Comma delimited)" format: Saves as UTF-8 with BOM by default

Why Excel adds BOM: Windows text editors (Notepad) historically used BOM to detect UTF-8 files. Excel follows this convention for compatibility.

The problem: Most modern CSV parsers, databases, and APIs expect UTF-8 without BOM. Excel's default behavior causes compatibility issues.

How to Save CSV from Excel Without BOM

There's no built-in option. Excel always adds BOM when saving as "CSV UTF-8."

Workarounds:

Save as regular CSV (ANSI): File → Save As → CSV (Comma delimited). This avoids BOM but may lose special characters.
Save as UTF-8 with BOM, then remove BOM: Use one of the methods above to strip BOM after Excel saves.
Use Google Sheets: Exports CSV as UTF-8 without BOM by default.
Export from source system directly: Skip Excel entirely. Export from database/CRM/analytics platform directly to CSV.

Common BOM Scenarios

Scenario 1: CRM Export → Database Import

Problem: HubSpot exports contacts to CSV. You open in Excel to check data. Save file. Import to PostgreSQL fails with encoding error.

Cause: Excel added BOM when you saved the file.

Solution: Use browser-based detection to detect BOM before import. Remove BOM if present. Import succeeds.

Scenario 2: API Data → CSV → JSON Conversion

Problem: You download CSV from a REST API, convert to JSON for frontend processing. JSON parser throws "Unexpected token" error.

Cause: API returned CSV with BOM. JSON converter doesn't strip BOM before parsing.

Solution: Remove BOM before JSON conversion using text editor or browser-based tool.

Scenario 3: Automated ETL Pipeline

Problem: Nightly ETL job pulls CSVs from SFTP server, loads to data warehouse. Pipeline runs successfully for months. Suddenly fails with column mismatch error.

Cause: Upstream vendor changed export process. New CSVs include BOM.

Solution: Add BOM detection/removal step to pipeline. Use command-line tools or integrate privacy-first processing.

Scenario 4: Bulk Email Import

Problem: Marketing team exports email list from Excel, uploads to email platform (Mailchimp, SendGrid). First contact's name appears as ï»¿John Doe in emails.

Cause: Excel's BOM export. Email platform didn't strip BOM during import.

Solution: Remove BOM before upload using text editor encoding conversion or browser-based detection tool. Verify headers are clean.

What This Won't Do

BOM removal solves encoding errors, but it's not a complete data processing solution. Here's what this guide doesn't cover:

Not a Replacement For:

Data validation - Doesn't verify data integrity, check for null values, or validate field formats
Character encoding conversion - Only removes BOM, doesn't convert between encodings (UTF-8 → ANSI, etc.)
CSV structure repair - Won't fix broken delimiters, mismatched quotes, or malformed rows
Data transformation - No column mapping, data cleaning, or format standardization
Database migration tools - Doesn't replace ETL platforms or database import utilities

Technical Limitations:

BOM detection only - Detects UTF-8 BOM (EF BB BF), not UTF-16 or UTF-32 BOM variants
File-level operation - Removes BOM from file start only, doesn't fix encoding issues throughout file
No automatic prevention - Can't prevent Excel from adding BOM during save
Single encoding issue - Solves BOM specifically, not all Unicode or encoding problems

Best Use Cases: This workflow excels at detecting and removing UTF-8 BOM from CSV files before import. For comprehensive data cleaning, encoding conversion, or ETL workflows, combine BOM removal with dedicated data processing tools.

Dealing with other CSV import errors? See our complete guide: CSV Import Errors: Every Cause, Every Fix (2026)

Additional Resources

Unicode & BOM Standards:

Unicode FAQ: UTF-8, UTF-16, UTF-32 & BOM - Official Unicode Consortium documentation on BOM usage and recommendations
W3C: The Byte Order Mark in HTML - W3C guidance on BOM handling in web contexts

Database Documentation:

PostgreSQL COPY Command - Official PostgreSQL documentation on CSV import and encoding
MySQL LOAD DATA INFILE - MySQL documentation on bulk CSV loading

Excel & Encoding:

Excel Specifications and Limits - Microsoft documentation on Excel's CSV export behavior

Privacy & Compliance:

GDPR Official Website - General Data Protection Regulation compliance resources for data processing
IBM Data Quality Assessment - Enterprise data governance practices

FAQ

No. Some parsers handle BOM gracefully (Python's utf-8-sig encoding, modern JavaScript parsers). But many don't—particularly older systems, databases, and strict CSV parsers. Best practice: remove BOM before processing to ensure maximum compatibility.

Not directly. Excel's "CSV UTF-8" format always adds BOM. Workarounds: (1) Save as regular CSV (ANSI), (2) Use Google Sheets, (3) Remove BOM after saving, (4) Skip Excel and export directly from source system.

Use browser-based detection tools. Drag your CSV in—they detect BOM instantly without uploading the file. No hex editors or command-line knowledge required.

No. BOM is metadata added to the beginning of the file. Removing it doesn't affect your actual CSV data (rows, columns, values). Your data remains identical—just without the three-byte encoding marker.

UTF-8 without BOM is the modern standard. It supports all characters (international names, special symbols) while maintaining maximum compatibility with parsers, databases, and APIs. ANSI (Windows-1252) works for English-only data but fails on international characters.

No. BOM is a file-level marker that appears only at the very beginning. If you see EF BB BF bytes elsewhere, it's not a BOM—it's likely corrupted encoding or binary data mixed into your CSV.

No. BOM can appear in any text file (TXT, JSON, XML, HTML). The issue is particularly problematic with CSVs because they're frequently processed by strict parsers that don't expect BOM.

Use command-line scripting (bash, PowerShell, Python) to batch process files. Loop through directory, detect BOM, remove if present. For non-technical users, browser-based tools with batch upload features offer drag-and-drop batch processing.

The Bottom Line

BOM is a three-byte encoding marker that breaks CSV imports, JSON parsing, and database loads. Most teams upload files to third-party tools to remove it—exposing customer data to fix a trivial encoding issue.

The privacy-first alternative: Client-side detection and removal that processes files entirely in your browser.

Common mistakes:

Uploading customer CSVs to BOM removal tools without Data Processing Agreements
Assuming Excel's "UTF-8" export is safe (it adds BOM by default)
Debugging import errors for hours without checking for BOM
Manually editing large files when automated tools process in seconds

The solution: Text editor encoding options or browser-based tools that detect and remove BOM locally at 303K rows/second.

Fix BOM Errors Securely

Detect BOM in 2 seconds without uploads

Process files locally in your browser

Remove encoding markers instantly

Zero data exposure, full GDPR compliance

Check for BOM →