Navigated to blog › convert-10-million-rows-csv-json-excel
Back to Blog
Data Engineering

Convert 10M Rows: CSV ↔ JSON ↔ Excel in 60 Seconds

December 13, 2025
12
By SplitForge Team

Your database export just finished. 10 million rows. 3.2GB JSON file.

You need it in CSV by Monday for the analytics team.

Your conversion script crashes. Online tools refuse files over 100MB. Cloud APIs want $50/month subscriptions plus per-file charges. Your CTO won't approve uploading customer data to third-party servers.

You have 48 hours.

Every month, data teams lose 12–20 hours trying to convert files that are too large for Excel, too inconsistent for Python scripts, or too sensitive for cloud tools. The financial impact: $1,800–$3,200 per incident in wasted labor, missed deadlines, and paid subscriptions for tools that shouldn't be necessary.

This guide shows the architecture we built to convert 10 million rows in 45 seconds—no uploads, no RAM spikes, no infrastructure.

Key Takeaway:
You don't need cloud APIs, Python expertise, or expensive ETL platforms. A properly architected browser-based converter can process 10 million rows at 440,000 rows per second (CSV → JSONL, February 2026 Node harness benchmark)—entirely client-side with zero uploads and complete privacy.


TL;DR

A properly engineered browser-based converter can process 10M rows CSV→JSON with OPFS output streaming — heap stays flat at ~25 MB regardless of output size — with zero uploads and complete privacy.

This guide breaks down the architecture: streaming parsers, OPFS streaming output, two-pass JSON tokenization, and Web Worker pipelines that make enterprise-grade performance possible without servers.


Quick 2-Minute Emergency Fix

Need to convert millions of rows between CSV/JSON/Excel right now?

  1. Don't use cloud converters → File size limits, uploads expose data, subscription costs
  2. Use browser-based streamingWeb Workers process locally
  3. Drop your file → Handled via File API, stays on device
  4. Convert → OPFS streaming output, ~25 MB working memory, flat heap
  5. Download result → Created via Blob API, zero server interaction

This handles CSV↔JSON↔Excel conversion for 10M+ rows in under 60 seconds. Continue reading for comprehensive technical deep dive.


Table of Contents


Why This Matters

Format conversion is infrastructure work. It shouldn't require:

  • Cloud service subscriptions ($20–$200/month)
  • Custom Python/Node.js scripts that break on edge cases
  • Uploading sensitive data to third-party servers
  • Waiting 30–120 minutes for cloud processing queues

The financial and operational impact:

Development costs:

  • Average time to write robust CSV↔JSON converter: 8–15 hours
  • Maintenance burden: 2–4 hours/month fixing encoding issues, edge cases
  • Total annual cost: $3,200–$6,400 in developer time (at $100/hour loaded cost)

Cloud service costs:

  • Convertio Pro: $10/month (250 MB file limit)
  • CloudConvert: $8–$25/month (API limits apply)
  • Zamzar Pro: $16/month (50 conversions/month)
  • Annual cost: $96–$300 for basic plans

Compliance risks:

  • GDPR Article 28 requires processor agreements for uploaded data
  • SOC 2 compliance mandates data handling audits
  • HIPAA restricts health data uploads to third parties
  • Violation costs: $100K–$50M in GDPR fines for data breaches

This guide demonstrates how streaming Web Worker architecture achieves enterprise-grade conversion (10M+ rows, flat heap via OPFS) while maintaining complete data privacy through client-side processing.

By the end, you'll understand:

  • Why traditional conversion methods fail at scale
  • How streaming architecture handles 10M+ rows without memory overflow
  • Technical implementation of OPFS streaming output and two-pass JSON tokenizer (v3.2)
  • Real-world benchmarks: CSV↔JSON↔Excel at production scale

The Real Problem: Why Format Conversion Breaks at Scale

Traditional Tools Fail Above 1M Rows

Excel:

  • Hard limit: 1,048,576 rows
  • CSV import crashes with special characters (international data, JSON escaping)
  • No native JSON support (requires Power Query, limited to 500K rows)
  • XLSX generation requires all data in memory (memory = 3–5× file size)

Python pandas:

import pandas as pd
df = pd.read_csv('10m_rows.csv')  # Loads entire file into RAM
df.to_json('output.json')         # Creates full string in memory

Memory usage: 10M rows × 20 columns × 100 bytes = 20GB RAM
Reality: Crashes on laptops, requires server infrastructure

Online Conversion Services:

  • Convertio: 100 MB file limit (free), 1 GB (paid)
  • CloudConvert: 1 GB limit, 25 conversions/day
  • Zamzar: 50 MB limit (free), 2 GB (paid)
  • All require uploading data to their servers

Node.js streaming (common approach):

const csv = require('csv-parser');
fs.createReadStream('input.csv')
  .pipe(csv())
  .pipe(jsonStream())
  .pipe(fs.createWriteStream('output.json'));

Problems:

  • Requires Node.js installation
  • 100K–150K rows/sec typical performance
  • No progress indicators
  • Breaks on malformed CSV (encoding issues, quote escaping)

The gap: Need 500K+ rows/sec performance, multi-format support, browser accessibility, and zero server uploads.


How Browser-Based Streaming Solves This

Web Workers + OPFS Streaming Architecture

Modern browsers provide everything needed for enterprise-grade file processing:

┌─────────────────────────────────────────────────────────────┐
│ Main Thread                                                 │
│  ├─ UI rendering & user interaction                         │
│  ├─ File selector (<input type="file">)                     │
│  ├─ Progress bar updates                                    │
│  └─ Download link generation                                │
└─────────────────────────────────────────────────────────────┘
                            │
                  postMessage(file)
                            ↓
┌─────────────────────────────────────────────────────────────┐
│ Web Worker (Background Thread)                              │
│  ├─ Streaming file reader (64KB chunks)                     │
│  ├─ Format-specific parser (CSV/JSON/JSONL)                 │
│  ├─ Row builder (per-format hot path)                       │
│  ├─ OPFS StreamWriter (browser-private storage sink)        │
│  └─ File handle transfer back to main thread                │
└─────────────────────────────────────────────────────────────┘

1. Web Workers (Background Processing)

// Main thread remains responsive
const worker = new Worker('converterWorker.js');
worker.postMessage({ file, format });

// Worker processes in background
self.onmessage = async (e) => {
  const { file, format } = e.data;
  await streamConvert(file, format);
};

Benefits:

  • Non-blocking UI (progress bars, cancellation)
  • Parallel processing (multi-core CPU utilization)
  • Memory isolation (worker crash doesn't kill UI)

2. Streaming File API

const reader = file.stream().getReader();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  processChunk(value); // Process 64KB at a time
}

Memory usage: O(chunk size) instead of O(file size)
Result: 10M rows uses 2–5 MB RAM, not 20 GB

3. OPFS StreamWriter (v3.1+) — replaced in-memory ChunkWriter

Pre-v3.1 architecture used a ChunkWriter (in-memory 2 MB Uint8Array buffer flushed to a Blob array). v3.1+ writes directly to the browser-private Origin Private File System (OPFS) — output never accumulates in the JS heap. The pattern below shows the older ChunkWriter for historical reference:

// Pre-v3.1: ChunkWriter (in-memory, now replaced by OPFS StreamWriter)
class ChunkWriter {
  constructor(size = 2 * 1024 * 1024) { // 2MB buffer
    this.buffer = new Uint8Array(size);
    this.position = 0;
  }
  
  write(str) {
    const encoded = this.encoder.encode(str);
    this.buffer.set(encoded, this.position);
    this.position += encoded.length;
    
    if (this.position > this.buffer.length * 0.9) {
      this.flush(); // Write to Blob when 90% full
    }
  }
}

Performance gain: 2–3× faster than string concatenation
Reason: Avoids repeated memory allocation and string copies

4. Two-Pass Streaming Tokenizer for JSON→CSV (v3.2)

Pre-v3.1 used a compiled row processor via new Function() for JSON→CSV (15–30% speed gain). This was removed in v3.2 for CSP compliance. The current approach is a two-pass streaming tokenizer:

// v3.2: Two-pass streaming tokenizer — input is O(1) heap
// Pass 1: scan first 100 objects to discover column headers
const headerSet = new Set();
for await (const obj of streamJSONObjects(file)) {
  Object.keys(flattenObject(obj)).forEach(k => headerSet.add(k));
  if (++sampleCount >= 100) break;
}

// Pass 2: stream all objects and write CSV rows to OPFS sink
for await (const obj of streamJSONObjects(file)) {
  writer.write(buildCSVRow(flattenObject(obj), headers));
}

Trade-off: Character-by-character JSON tokenization is slower than the compiled extractor (~28K rows/sec vs pre-v3.1 537K), but input never loads into heap — enabling unlimited input file size.


Real-World Performance Benchmarks

JSON → CSV: Two-Pass Streaming Tokenizer (v3.2)

v3.2 approach: Input file is tokenized character-by-character — never loaded into heap. Output streams to OPFS (browser-private storage). Both input and output are O(1) heap regardless of file size.

v3.2 benchmark (Node harness, May 2026): ~28K rows/sec
Why slower than pre-v3.1 (was 537K): The compiled row extractor (new Function()) was removed for CSP compliance. Character-by-character JSON tokenization replaces it — more memory-safe but CPU-heavier.

Code path (v3.2):

// Pass 1: scan first 100 objects for headers (never loads full file)
for await (const obj of streamJSONObjects(file)) {
  Object.keys(flattenObject(obj)).forEach(k => headerSet.add(k));
  if (++sampleCount >= 100) break;
}

// Pass 2: stream all objects → CSV rows → OPFS sink
for await (const obj of streamJSONObjects(file)) {
  writer.write(buildCSVRow(flattenObject(obj), headers));
}

Memory profile (v3.2):

  • JS heap: ~15 MB working memory (input tokenized in chunks, never fully loaded)
  • OPFS sink: output written to browser storage — zero heap accumulation
  • Unlimited input file size — tokenizer processes one object at a time

CSV → JSON: Streaming Output at 10M Scale

v3.2 benchmark (Node harness, May 2026): ~95K rows/sec at 5M–10M scale
Output: Streams to OPFS — JS heap stays flat regardless of output size (2.4 GB output tested, ~25 MB heap)

Pre-v3.1 figures (now outdated): 220K rows/sec / 45.44 sec for 10M rows (batch-to-Blob architecture, not OPFS). The OPFS streaming path in v3.1+ uses different output mechanics — benchmark figures are not directly comparable.

Architecture enabling flat-heap output (v3.2):

// Input: CSV streamed line-by-line via async generator
for await (const line of streamLinesFast(file, delimiter)) {
  const values = parseCSVLineFast(line, delimiter);
  const obj = buildNestedObject(headers, values, options);
  batch.push(JSON.stringify(obj));
  
  if (batch.length >= BATCH_SIZE) {
    writer.write(batch.join('\n') + '\n'); // Write to OPFS sink
    batch = []; // Heap freed immediately
  }
}

Memory profile (CSV→JSON streaming, v3.2): ~25 MB peak working memory; output streams to OPFS (browser-private storage) — JS heap stays flat regardless of output file size

CSV → Excel: 94,697 Rows/Second

Test: 1 million rows, 3 columns → XLSX
Result: 10.56 seconds = 94,697 rows/sec (65.2 MB output)

Why slower than JSON:

  • XLSX requires ZIP compression (CPU intensive)
  • XML generation for sheet data (more complex than JSON)
  • Excel file format overhead (styles, formatting, metadata)

Still impressive because:

  • Exceeds Excel's own row limit (1,048,576 max)
  • Faster than Python pandas (typically 30K–50K rows/sec)
  • No server upload required (Excel Online has 100K row limit)

Technical Deep Dive: How It Works

1. Streaming CSV Parser

Challenge: CSV isn't truly line-delimited due to quoted fields with newlines:

id,description
1,"Product with
newline in description"
2,"Another product"

Solution: Quote-aware streaming parser

async function* streamLines(file) {
  const reader = file.stream().getReader();
  let buffer = '';
  let inQuotes = false;
  
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    buffer += decoder.decode(value, { stream: true });
    let i = 0;
    
    while (i < buffer.length) {
      const c = buffer[i];
      
      if (c === '"') {
        if (inQuotes && buffer[i+1] === '"') {
          i += 2; // Skip escaped quote
          continue;
        }
        inQuotes = !inQuotes;
      }
      
      if (!inQuotes && c === '\n') {
        const line = buffer.slice(0, i);
        buffer = buffer.slice(i + 1);
        yield line; // Return complete line
        i = 0;
      } else {
        i++;
      }
    }
  }
}

Performance: 400K+ lines/sec
Memory: O(1) - buffer never exceeds 64 KB

2. Flattening Nested JSON

Input (nested):

{
  "id": 1,
  "user": {
    "name": "John",
    "email": "[email protected]"
  },
  "metadata": {
    "created": "2024-01-01"
  }
}

Output (flattened for CSV):

id,user.name,user.email,metadata.created
1,John,[email protected],2024-01-01

Recursive flattening algorithm:

function flattenObject(obj, prefix = '') {
  const flattened = {};
  
  for (const key in obj) {
    const val = obj[key];
    const newKey = prefix ? `${prefix}.${key}` : key;
    
    if (val && typeof val === 'object' && !Array.isArray(val)) {
      Object.assign(flattened, flattenObject(val, newKey));
    } else if (Array.isArray(val)) {
      flattened[newKey] = val.join(', ');
    } else {
      flattened[newKey] = val;
    }
  }
  
  return flattened;
}

Handles:

  • Nested objects (unlimited depth)
  • Arrays (joins with comma-space)
  • null/undefined (converts to empty string)
  • Mixed types (stringify objects, preserve primitives)

3. Auto-Header Detection

Problem: JSON objects don't guarantee consistent keys:

[
  {"id": 1, "name": "John", "email": "[email protected]"},
  {"id": 2, "name": "Jane", "phone": "555-0001"},
  {"id": 3, "name": "Bob", "email": "[email protected]", "company": "Acme"}
]

Solution: Sample first N rows, collect all unique keys:

const headerSet = new Set();
const sampleSize = Math.min(100, data.length);

for (let i = 0; i < sampleSize; i++) {
  const obj = data[i];
  const flattened = flattenObject(obj);
  Object.keys(flattened).forEach(key => headerSet.add(key));
}

const headers = Array.from(headerSet);

Result: CSV contains all columns seen in first 100 rows
Trade-off: Misses columns that only appear after row 100 (rare in practice)

4. Escape Handling

CSV requires escaping:

  • Commas: Hello, World"Hello, World"
  • Quotes: He said "Hi""He said ""Hi"""
  • Newlines: Line 1\nLine 2"Line 1\nLine 2"

Inline escape function:

function escapeCSV(val, delimiter) {
  const str = val == null ? '' : String(val);
  
  if (str.indexOf(delimiter) !== -1 || 
      str.indexOf('"') !== -1 || 
      str.indexOf('\n') !== -1) {
    return '"' + str.replace(/"/g, '""') + '"';
  }
  
  return str;
}

Performance: 10M+ escapes/sec (when needed)
Optimization: Early return for values not requiring escaping


Comparison: Browser vs Traditional Methods

Method100K Rows1M Rows10M RowsMemoryPrivacy
Browser Converter (CSV→JSON)0.19s1.9s19s~50 MB batch✓ Local
Python pandas2.5s25s250s2 GB✓ Local
Node.js streaming0.8s8s80s100 MB✓ Local
Excel (manual)15sCrashesN/A4 GB✓ Local
CloudConvert API30s180s900sN/A✗ Upload
Convertio45s300sN/AN/A✗ Upload

Browser converter wins on:

  • Speed (2–13× faster than Python)
  • Memory efficiency (40× less than pandas)
  • Accessibility (no installation required)
  • Privacy (zero uploads)
  • Cross-platform (works on any OS with a browser)

Use Cases: When to Use Browser-Based Conversion

1. API Response Processing

Scenario: Export 100K user records from REST API as JSON, need CSV for analysis

Traditional approach:

curl https://api.example.com/users > users.json
python -c "import pandas; pandas.read_json('users.json').to_csv('users.csv')"

Time: 5 minutes (including pandas install if first time)

Browser approach:

  1. Save API response as users.json
  2. Upload to browser converter
  3. Select JSON → CSV
  4. Download result

Time: 30 seconds
Benefit: No Python/pandas required, works on any computer

2. Database Export Migration

Scenario: Migrate 5M rows from PostgreSQL (CSV export) to MongoDB (requires JSON)

Traditional approach:

// Node.js script
const csv = require('csv-parser');
const fs = require('fs');

fs.createReadStream('export.csv')
  .pipe(csv())
  .pipe(jsonTransform())
  .pipe(fs.createWriteStream('import.json'));

Issues:

  • Requires Node.js + dependencies
  • Script must handle encoding, escaping, edge cases
  • No progress indicator
  • Debugging takes hours when it breaks

Browser approach:

  • Upload 5M row CSV (300 MB file)
  • Select CSV → JSON
  • Download in 22 seconds
  • Import to MongoDB

Benefit: Zero code, handles edge cases automatically, shows progress

3. Excel Limitations Workaround

Scenario: Client sends 1.5M row Excel file, need to analyze in Python

Problem: pandas.read_excel() is extremely slow on large XLSX files

Solution:

  1. Convert XLSX → CSV in browser (15 seconds)
  2. Clean data if needed
  3. Load CSV in pandas (2 seconds)

Total time: 17 seconds
Alternative: pandas.read_excel() takes 180+ seconds on 1.5M rows

4. Privacy-Compliant Processing

Scenario: Healthcare provider needs to convert patient data (HIPAA)

Constraint: Cannot upload PHI (Protected Health Information) to third-party servers

Traditional approach:

  • Deploy on-premise conversion server
  • Maintain infrastructure
  • Security audits required

Browser approach:

  • All processing client-side
  • Zero data transmission
  • No infrastructure needed
  • Built-in compliance

Cost savings: $50K–$200K annually (infrastructure + compliance overhead)


Privacy & Compliance Architecture

Why Client-Side Processing Matters

Data never leaves your device:

// File selected by user
<input type="file" onChange={handleFile} />

// Processed in Web Worker (browser sandbox)
worker.postMessage({ file });

// Downloaded to user's device
const blob = new Blob([result]);
const url = URL.createObjectURL(blob);
downloadLink.href = url;

No network transmission at any stage.

Compliance Benefits

GDPR (EU):

  • Article 28: No processor agreement needed (no data processing by third party)
  • Article 32: Technical measures maintained (client-side encryption)
  • Article 44: No cross-border transfer (data stays local)

HIPAA (US Healthcare):

  • No BAA (Business Associate Agreement) required
  • PHI never transmitted or stored externally
  • Audit logs on user's device only
  • Reference: HHS HIPAA Security Rule

SOC 2:

  • No vendor security assessment needed
  • Data handling controls at user's discretion
  • Zero third-party data access

ISO 27001:

  • Reduces attack surface (no data in transit)
  • Simplifies risk assessment
  • No external data storage to audit

Financial impact:

  • Compliance overhead: $0 (vs $50K–$200K for vendor assessments)
  • Data breach risk: Eliminated for conversion step
  • Audit scope: Reduced (one less vendor to assess)

Performance Optimization Techniques

1. Compiled Row Processors

Before optimization:

function toCSV(obj, headers, delimiter) {
  return headers
    .map(h => escape(obj[h], delimiter))
    .join(delimiter) + '\n';
}

Performance: 150K rows/sec

After optimization (compiled):

const builder = new Function(`
  const delimiter = '${delimiter}';
  
  function escape(val) {
    const str = val == null ? '' : String(val);
    if (str.indexOf(delimiter) !== -1 || 
        str.indexOf('"') !== -1 || 
        str.indexOf('\\n') !== -1) {
      return '"' + str.replace(/"/g, '""') + '"';
    }
    return str;
  }
  
  return function(obj) {
    ${headers.map((h, i) => `
      let v${i} = obj['${h}'];
      if (v${i} === undefined || v${i} === null) v${i} = '';
      else if (Array.isArray(v${i})) v${i} = v${i}.join(', ');
    `).join('\n')}
    
    return ${headers.map((_, i) => `escape(v${i})`).join(' + delimiter + ')} + '\\n';
  }
`)();

Performance (pre-v3.1): ~220K rows/sec with compiled extractor (now removed for CSP compliance)

Why it works:

  • Eliminates .map() array operation
  • Inlines escape function per call
  • Removes dynamic property access in loop
  • Pre-computes string concatenation positions

2. OPFS Streaming Output (v3.1+) — Replaced Blob-Based ChunkWriter

Problem (string concatenation):

let csvText = '';
for (const row of data) {
  csvText += toCSV(row); // O(n²) string copies
}

Memory: Grows with file size, crashes on large files

Intermediate solution (pre-v3.1 ChunkWriter):

// Pre-v3.1: flush 2MB Uint8Array buffers into a growing Blob array
const writer = new ChunkWriter(2 * 1024 * 1024); // 2 MB buffer
for (const row of data) {
  writer.write(toCSV(row));
}
// Blob chunks accumulate in JS heap proportional to output size

Memory: O(n) but still accumulates in JS heap

Current solution (v3.1+ OPFS StreamWriter):

// v3.1+: output writes directly to OPFS (browser-private storage)
const writer = new StreamWriter('text/csv');
await writer.init(); // Creates OPFS sync access handle
for (const row of data) {
  writer.write(toCSV(row)); // Writes to disk, not heap
}
const outputRef = await writer.finalize(); // Returns OPFS File handle

Memory: O(1) heap regardless of output size — heap stays flat at ~25 MB for a 2.4 GB output

3. Streaming vs Buffering Trade-offs

Full buffer approach:

const data = await file.text(); // Load entire file
const result = convert(data);   // Process all at once
download(result);               // Output

Pros: Simple code
Cons: Memory = 3–5× file size, crashes on large files

Streaming approach:

for await (const chunk of file.stream()) {
  const processed = convert(chunk);
  output.write(processed);
}

Pros: Constant memory, handles unlimited file size
Cons: More complex code, requires careful state management

Hybrid (optimal):

const BATCH_SIZE = 25000;
let batch = [];

for await (const line of streamLines(file)) {
  batch.push(parseLine(line));
  
  if (batch.length >= BATCH_SIZE) {
    output.write(convertBatch(batch));
    batch = []; // Free memory
  }
}

Pros: Balance between simplicity and memory efficiency
Result: ~95K rows/sec (CSV→JSON) with ~25 MB working memory (v3.2, OPFS output)


Common Conversion Patterns

Pattern 1: CSV → JSON for API Consumption

Input CSV:

id,name,email,created_at
1,John Doe,[email protected],2024-01-01
2,Jane Smith,[email protected],2024-01-02

Output JSON (array of objects):

[
  {
    "id": 1,
    "name": "John Doe",
    "email": "[email protected]",
    "created_at": "2024-01-01"
  },
  {
    "id": 2,
    "name": "Jane Smith",
    "email": "[email protected]",
    "created_at": "2024-01-02"
  }
]

Type coercion options:

  • Parse numbers: "1"1
  • Parse booleans: "true"true
  • Parse nulls: "null"null

Pattern 2: JSON → CSV for Excel Analysis

Input JSON (nested):

[
  {
    "user_id": 1,
    "profile": {
      "name": "John",
      "email": "[email protected]"
    },
    "stats": {
      "orders": 5,
      "revenue": 432.50
    }
  }
]

Output CSV (flattened):

user_id,profile.name,profile.email,stats.orders,stats.revenue
1,John,[email protected],5,432.50

Flattening preserves all data in Excel-compatible format.

Pattern 3: Excel → JSON for Database Import

Input: Multi-sheet Excel with related data

Sheet 1 (Users):

idnameemail
1John[email protected]

Sheet 2 (Orders):

order_iduser_idamount
101199.99

Output JSON (separate files):

// users.json
[{"id": 1, "name": "John", "email": "[email protected]"}]

// orders.json
[{"order_id": 101, "user_id": 1, "amount": 99.99}]

Import to database with foreign key relationships preserved.


Advanced Features

1. Nested JSON Handling

Option: Flatten nested objects

Input:

{"user": {"address": {"city": "Boston"}}}

Output:

user.address.city
Boston

Option: Keep nested structure

Input (same):

{"user": {"address": {"city": "Boston"}}}

Output:

user
"{""address"":{""city"":""Boston""}}"

2. Array Value Handling

Join arrays with delimiter:

{"tags": ["javascript", "node", "react"]}

tags
"javascript, node, react"

Expand arrays to separate rows:

{"id": 1, "tags": ["a", "b"]}

id,tag
1,a
1,b

3. Delimiter Detection

Auto-detect CSV delimiter from file content:

  • Comma: Standard CSV
  • Semicolon: European Excel exports
  • Tab: TSV files
  • Pipe: Database exports

Detection algorithm:

function detectDelimiter(sample) {
  const delimiters = [',', ';', '\t', '|'];
  const counts = delimiters.map(d => 
    sample.split('\n')[0].split(d).length
  );
  
  return delimiters[counts.indexOf(Math.max(...counts))];
}

4. BOM (Byte Order Mark) Handling

Excel requires BOM for UTF-8 CSV:

const BOM = new Uint8Array([0xEF, 0xBB, 0xBF]);
const csvBlob = new Blob([BOM, csvData], {
  type: 'text/csv;charset=utf-8;'
});

Without BOM: International characters (é, ñ, 中) display incorrectly in Excel
With BOM: Perfect character rendering


Troubleshooting Common Issues

Issue 1: "Out of Memory" Errors

Cause: File too large for available RAM

Solutions:

  1. Split file first
  2. Use JSONL instead of JSON (streaming-friendly)
  3. Convert in chunks (100K rows at a time)
  4. Close other browser tabs/applications

Memory requirements (v3.2 worker, OPFS streaming):

  • CSV → JSON: ~25 MB working memory (output to OPFS, not heap)
  • JSON → CSV: ~15 MB working memory (streaming tokenizer, both input and output O(1))
  • CSV → Excel: constant working memory, streaming write (output accumulates as ArrayBuffer in worker); ~15s per 1M rows; auto-splits at 1,048,576 rows per sheet

Issue 2: Special Characters Corrupted

Cause: Encoding mismatch

Solutions:

  • Ensure UTF-8 encoding on input
  • Enable BOM for Excel compatibility
  • Check source file encoding (Windows-1252, Latin1)

Detection:

// Check for BOM
const header = await file.slice(0, 3).arrayBuffer();
const bytes = new Uint8Array(header);
const hasBOM = bytes[0] === 0xEF && 
               bytes[1] === 0xBB && 
               bytes[2] === 0xBF;

Issue 3: Excel Opens CSV with Wrong Columns

Cause: Delimiter mismatch (Excel expects system locale)

Solutions:

  • US/UK: Use comma delimiter
  • Europe: Use semicolon delimiter
  • Save as .tsv (tab-delimited) for universal compatibility

Issue 4: JSON Parse Errors

Cause: Invalid JSON syntax in source file

Common errors:

  • Single quotes instead of double quotes
  • Trailing commas in objects
  • Unescaped control characters
  • Byte Order Mark in JSON

Validation:

try {
  JSON.parse(await file.text());
} catch (e) {
  console.error('Invalid JSON:', e.message);
  // Attempt to fix common issues
}

Integration Patterns

Pattern 1: API Development Workflow

Scenario: Frontend expects JSON, backend exports CSV

# Backend exports
psql -c "COPY users TO '/tmp/users.csv' CSV HEADER"

# Convert to JSON in browser

# Frontend consumes
fetch('users.json')
  .then(r => r.json())
  .then(data => render(data))

Benefit: No backend conversion logic needed

Pattern 2: Data Pipeline Integration

ETL flow:

  1. Extract: Database → CSV export
  2. Transform: CSV → JSON (browser converter)
  3. Load: Upload JSON to API

Advantages:

  • No ETL server infrastructure
  • No Python/Node.js dependencies
  • Works on any workstation

Pattern 3: Excel Power Users

Daily workflow:

  1. Receive client data as Excel
  2. Convert to CSV instantly
  3. Process with command-line tools
  4. Convert back to Excel for delivery

Time saved: 15–20 minutes daily (manual copy/paste eliminated)


Cost Analysis: Browser vs Alternatives

Scenario: Monthly Data Processing (1M rows × 20 conversions)

Option 1: Browser Converter (Free)

  • Conversion cost: $0
  • Time: 40 minutes total (2 min per conversion)
  • Privacy: Complete (local processing)
  • Total cost: $0

Option 2: Cloud Conversion API

  • Service: CloudConvert Pro ($25/month)
  • API limits: 500 conversions/month
  • Upload time: 60 minutes total (3 min per conversion)
  • Total cost: $300/year
  • Privacy risk: Data uploaded to third party

Option 3: Python pandas Scripts

  • Development: 15 hours initial ($1,500)
  • Maintenance: 2 hours/month ($2,400/year)
  • Server costs: $0 (runs locally)
  • Total first year: $3,900
  • Annual ongoing: $2,400

Option 4: ETL Platform

  • Service: Talend, Informatica, etc.
  • Cost: $2,000–$10,000/year
  • Overkill for simple conversions
  • Total cost: $2,000–$10,000/year

Winner: Browser converter saves $300–$10,000 annually


Technical Specifications

Supported Formats

Input:

  • CSV (any delimiter)
  • TSV (tab-separated)
  • JSON (array of objects)
  • JSONL (newline-delimited JSON)
  • Excel (.xlsx, .xls)

Output:

  • CSV (configurable delimiter)
  • JSON (formatted or minified)
  • JSONL (streaming-friendly)
  • Excel (.xlsx)

Performance Characteristics

MetricValue
Max file sizeUnlimited (browser memory limit)
Max rows tested10,000,000
CSV → JSONL throughput~440,000 rows/sec
CSV → JSON throughput~95,000 rows/sec
JSON → CSV throughput~28,000 rows/sec
Excel → CSV throughput~94,000 rows/sec
Memory usage50 MB typical
Supported browsersChrome, Firefox, Safari, Edge

Browser Requirements

  • Chrome 90+ (recommended)
  • Firefox 88+
  • Safari 14+
  • Edge 90+

Features used:

  • Web Workers (background processing)
  • Streams API (file reading)
  • TextEncoder/TextDecoder (UTF-8 handling)
  • Blob/File API (output generation)

Best Practices

1. File Size Management

Under 100 MB: Direct conversion works perfectly
100 MB – 1 GB: Close other tabs, conversion takes 10–60 seconds
Over 1 GB: Consider splitting first, or use JSONL format

2. Encoding Considerations

Always use UTF-8:

  • Set charset in editor before creating CSV
  • Enable BOM if opening in Excel
  • Test with international characters (é, ñ, 中)

3. Data Validation

Before conversion:

  • Check for consistent column counts
  • Verify header row is present
  • Scan for encoding issues
  • Test with small sample first

After conversion:

  • Verify row count matches (no data loss)
  • Spot-check special characters
  • Validate JSON structure if applicable
  • Test import into target system

4. Privacy Considerations

For sensitive data:

  • Use incognito/private browsing (auto-clear history)
  • Close browser after conversion (clear memory)
  • Verify network tab shows zero uploads
  • Consider air-gapped machine for classified data

Benchmarking Methodology

Test Environment

Hardware:

  • MacBook Pro M1 (8-core, 16 GB RAM)
  • Chrome 120.0.6099.109

Test files:

  • Generated with controlled data
  • Consistent column counts
  • No null values (worst case)
  • UTF-8 encoding

Measurement:

const start = performance.now();
await convertFile(file, options);
const elapsed = performance.now() - start;
const rowsPerSec = (rowCount / elapsed) * 1000;

Reproducibility

Generate test data:

// 1M row CSV
const rows = Array.from({length: 1000000}, (_, i) => 
  `${i},User ${i},user${i}@example.com,${randomDate()}`
);
const csv = 'id,name,email,created_at\n' + rows.join('\n');

Run benchmark:

  1. Upload generated file
  2. Click Convert
  3. Record processing time from UI
  4. Calculate rows/sec

Verify results:

  • Check output row count matches input
  • Spot-check data integrity
  • Confirm file size is reasonable

Real-World Success Stories

Case Study 1: E-commerce Analytics

Company: 50-person online retailer
Challenge: Daily sales exports (200K rows) from Shopify as CSV, needed in MongoDB (JSON)

Before:

  • Manual process: 30 minutes daily
  • Node.js script (unmaintained, broke on encoding issues)
  • Developer time to fix: 2 hours/month

After:

  • Browser conversion: 2 minutes daily
  • Zero maintenance
  • Works on any team member's computer

Savings: 9 hours/month, $900/month in developer time

Case Study 2: Healthcare Data Migration

Organization: Regional hospital network
Challenge: Migrate 5M patient records from legacy system (CSV) to new EHR (requires JSON)

Constraints:

  • HIPAA compliance (no data uploads)
  • Limited IT budget
  • Tight timeline (3 weeks)

Solution:

  • Browser-based conversion on air-gapped workstation
  • Processing: 5M rows in 23 seconds per file
  • Total migration time: 4 hours (including validation)

Result:

  • Zero compliance risk
  • $0 additional software costs
  • Completed 2 weeks ahead of schedule

Case Study 3: Financial Services

Firm: Hedge fund analytics team
Challenge: Convert trading data (1M+ rows daily) between formats for different analysis tools

Before:

  • Python scripts (5 different scripts)
  • Maintenance burden: 3 hours/week
  • Frequent breaks on edge cases

After:

  • Single browser tool handles all conversions
  • Zero maintenance
  • Handles edge cases automatically

Impact:

  • 12 hours/month saved
  • Reduced dependency on one developer
  • Faster onboarding for new analysts

Case Study 4: Marketing Automation Platform

Company: SaaS marketing platform (120 employees)
Challenge: Customer data exports (1.2M rows/hour) from database to various third-party integrations

Before:

  • AWS Lambda CSV→JSON pipeline
  • Cost: $180/month in Lambda + data transfer
  • Processing time: 14 minutes per export
  • Occasional timeout failures requiring reruns

After:

  • Browser-based conversion on analyst workstations
  • Processing time: 3 minutes per export
  • Zero infrastructure costs
  • 100% success rate

Result:

  • Savings: $2,160/year ($180/month eliminated)
  • Time savings: 77% faster processing
  • Improved reliability: No timeout failures
  • Better compliance: Customer data stays local

The Architecture Philosophy

Why Browser-Based Processing Wins

1. Zero Installation Friction

  • No Python/Node.js required
  • No dependency management
  • No version conflicts
  • Works on locked-down corporate machines

2. Universal Accessibility

  • Windows, Mac, Linux identical experience
  • No IT approval needed
  • No license management
  • Instant availability

3. Privacy by Architecture

  • Impossible to upload data (no server-side code)
  • No vendor security audits required
  • No data retention policies to manage
  • Complete user control

4. Performance at Scale

  • Multi-core CPU utilization via Web Workers
  • Memory-efficient streaming
  • Compiled hot paths
  • Competitive with native code

5. Future-Proof

  • Browsers improve continuously
  • WebAssembly support coming
  • GPU acceleration possible
  • No deployment pipeline needed

What This Won't Do

Browser-based format conversion excels at CSV↔JSON↔Excel transformation, but it's not a complete ETL platform. Here's what this approach doesn't cover:

Not a Replacement For:

  • Complex ETL pipelines - No scheduled jobs, data lineage tracking, or orchestration
  • Database migration tools - Can't directly load to PostgreSQL, MySQL, MongoDB without intermediate steps
  • Data transformation platforms - No complex joins, aggregations, or multi-source merges
  • Schema validation services - Converts formats but doesn't enforce business rules or constraints
  • Data warehousing - Not designed for ongoing analytics, BI dashboards, or historical tracking

Technical Limitations:

  • RAM constraints - Limited by browser memory (typically 1-4GB per tab)
  • No incremental processing - Full file re-conversion needed for any changes
  • Single file at a time - No batch queue for converting 100+ files automatically
  • Browser-dependent - Performance varies by browser, OS, and hardware
  • No custom transformations - Can't add calculated columns, complex logic during conversion

Privacy & Security Caveats:

  • Browser security dependent - Relies on browser sandbox (keep browser updated)
  • Local malware risk - Workstation compromise still exposes data
  • No audit trail - Can't prove what was converted, when, or by whom
  • Cache considerations - Browser cache may retain JavaScript code (not data files)

Data Type Limitations:

  • Excel formulas - Converted to values only, formula logic not preserved
  • Pivot tables - Lost during conversion to CSV/JSON
  • Macros/VBA - Not supported or preserved
  • Embedded objects - Charts, images removed in CSV/JSON output
  • Custom formatting - Conditional formatting, cell colors not preserved

Scale Considerations:

  • Sweet spot: 100K-10M rows - Beyond this, consider database solutions
  • File size limit: ~1-4GB - Larger files may fail depending on available RAM
  • Complex nested JSON - Deep nesting (10+ levels) may slow processing significantly

Best Use Cases: This tool excels at one-time or recurring format conversion for files that are too large for Excel, too sensitive for cloud tools, and need standard CSV/JSON/Excel output. For ongoing data pipelines, schema enforcement, or complex transformations, use dedicated ETL platforms after initial conversion.


Frequently Asked Questions

Yes. v3.2 uses OPFS (Origin Private File System) streaming output — data writes directly to browser-private storage rather than accumulating in the JS heap. JS heap stays flat at ~25 MB regardless of output size. CSV→JSON processes 10M rows at ~95K rows/sec (Node harness, May 2026). The bottleneck is CPU (data processing), not memory.

Typical limits by browser:

  • Chrome: ~4GB per tab (configurable)
  • Firefox: ~3GB per tab
  • Safari: ~2GB per tab
  • Edge: ~4GB per tab

In practice, v3.2 uses ~25 MB working memory for 10M rows (OPFS streaming output), well under all browser limits. Files up to 1GB work reliably on machines with 8GB+ RAM.

Yes, by architecture. All processing happens client-side in your browser. No data ever transmits to our servers or any third party. This means:

  • No Article 28 processor agreements needed (GDPR)
  • No BAA required (HIPAA)
  • No cross-border data transfer (GDPR Article 44)
  • No vendor security audits required (SOC 2)

Your data never leaves your device.

JSONL (newline-delimited JSON) is fully supported and actually performs better than standard JSON because it's streaming-friendly. Select "JSONL" as the input format and the converter will process line-by-line with lower memory usage than standard JSON arrays.

The converter auto-detects UTF-8, UTF-16, and Windows-1252 encodings. For Excel compatibility, enable "Add BOM" (Byte Order Mark) which ensures international characters display correctly. If you see garbled text, validate encoding first.

Excel formulas are evaluated and converted to their values. The formula logic itself isn't preserved in CSV/JSON output (this is a limitation of the target formats, not the converter). If you need formula preservation, keep a copy of the original .xlsx file. For Excel-to-JSON conversion specifically — including multi-sheet workbooks and data type preservation — the Excel to JSON Converter handles these conversions with correct type mapping for dates and booleans.

Excel files (.xlsx) are ZIP archives containing XML with styling, formatting, and metadata. This requires:

  • ZIP compression (CPU intensive)
  • XML generation (more complex than JSON)
  • Format overhead (much larger than CSV)

CSV is plain text with minimal overhead. v3.2 benchmarks (Node harness, May 2026): CSV→JSONL ~440K rows/sec, CSV→JSON ~95K rows/sec, JSON→CSV ~28K rows/sec (two-pass streaming tokenizer), Excel→CSV ~94K rows/sec.

Because processing happens entirely client-side, a browser crash means you'll need to restart the conversion. For critical workflows processing 10M+ rows, we recommend:

  1. Close other browser tabs
  2. Use Incognito/Private mode (starts fresh)
  3. Disable browser extensions temporarily
  4. For files over 5GB, split first

Not directly (browser-based tools require user interaction). For automation needs:

  • Use Node.js with conversion libraries
  • Use Python pandas for smaller files (<1M rows)
  • Use conversion patterns as templates for your own scripts

The browser version excels at one-off conversions without infrastructure setup.

If you need strict JSON schema validation or CSV column type enforcement, use purpose-built validators first. This tool focuses on format conversion, not schema compliance.

Machines with 4GB RAM or less may struggle with files over 8GB. In these cases, split the file first or use a machine with more RAM.

If your Excel file uses pivot tables, macros, custom formatting, or merged cells, these features won't transfer to CSV/JSON. Save a copy before converting.



Conclusion

Converting 10 million rows between CSV, JSON, and Excel doesn't require cloud APIs, Python expertise, or expensive ETL platforms.

Browser-based streaming architecture delivers:

  • ~440,000 rows/sec peak throughput (CSV → JSONL)
  • 220,000 rows/sec sustained throughput (10M rows)
  • ~25 MB working memory (OPFS streaming output, flat heap regardless of file size)
  • Zero uploads (complete privacy)
  • Zero cost (no subscriptions, no infrastructure)

The technical foundation:

  • Web Workers for parallel processing
  • Streaming APIs for memory efficiency
  • OPFS StreamWriter for flat-heap output
  • Two-pass streaming tokenizer for JSON→CSV input

Real-world impact:

  • Saves 12–20 hours per incident
  • Eliminates $300–$10,000 annual costs
  • Maintains GDPR/HIPAA compliance
  • Works on any modern computer

Stop paying for cloud conversion APIs that upload your data. Stop maintaining fragile Python scripts that break on edge cases. Stop waiting hours for Excel to process files it can't even fully open.

Modern browsers are production-grade data processing platforms.

Use them.

Format Converter handles CSV, JSON, and Excel at enterprise speed with zero setup.

Convert CSV, JSON & Excel Files Instantly

Process 10M+ rows in under 60 seconds
Zero uploads — complete data privacy
Works in browser — no installation needed

Continue Reading

More guides to help you work smarter with your data

ai-data-prep

AI-Ready Data Checklist: 10 Things to Verify Before Upload (2026)

Before uploading to ChatGPT, Claude, or a fine-tuning API, run through this 10-point checklist. UTF-8 encoding, clean headers, PII removed, size within limits.

Read More
ai-data-prep

Convert Excel to JSON for AI APIs and LLM Pipelines (2026)

AI APIs and LLM pipelines expect JSON, not spreadsheets. Fine-tuning needs JSONL; direct prompts take arrays. Convert locally — no upload, no conversion server.

Read More
ai-data-prep

Prepare Data for AI: The Complete Guide (Privacy-First, 2026)

How to prepare a CSV or Excel file for ChatGPT, Claude, or an AI API — encoding, PII, format, size, and privacy. The complete local-first prep workflow.

Read More