Back to Blog
Performance Engineering

We Process 10 Million CSV Rows in 12 Seconds. In Your Browser.

December 12, 2025
13
By SplitForge Team

Excel crashes at 1,048,576 rows.

Most online CSV tools cap out at 500K rows β€” then force you to upload your data to their servers.

Our Split Column tool just processed 10 million rows in 12.21 seconds.

In your browser. Zero uploads. Zero servers. Zero crashes.

And here's the part that breaks conventional wisdom: performance improved as the dataset got larger.

TL;DR

Excel hits a hard 1M row limit and crashes. Browser-based streaming architecture processes 10M rows in 12 seconds using Web Workers and zero-copy Blob constructionβ€”819K rows/sec throughput. Try Split Column β†’ Upload β†’ Process β†’ Download. 100% client-side, no file uploads.


The Benchmarks That Shouldn't Be Possible

We achieved 819K rows/second throughput processing 10 million CSV rows entirely in-browser. That's nearly double the performance of smaller 1M row datasets, defying conventional wisdom about browser-based processing limits.

For context, Excel crashes at 1,048,576 rows (Microsoft's hard limit), while our browser-based tool processes 10x that volume in 12 seconds with zero uploads or server infrastructure.

Dataset SizeProcessing TimeThroughput
1,000 rows0.03 seconds33K rows/sec
100,000 rows0.30 seconds333K rows/sec
1,000,000 rows2.13 seconds469K rows/sec
10,000,000 rows12.21 seconds819K rows/sec

Performance IMPROVED as the dataset got larger.

That's not a typo. At 10M rows, we're processing at 819,001 rows per second β€” nearly double the throughput of 1M rows.

For reference, Python pandas processes 10M rows in ~8–12 seconds β€” our browser-based tool is in the same performance class.

This violates everything you've been told about browser-based data processing.


Key Innovations Behind 819K Rows/Second

Here's what makes this possible:

  • βœ… Two-pass architecture β€” analyze structure, then execute optimized plan
  • βœ… 60MB optimized batching β€” sweet spot for parallelism without memory pressure
  • βœ… Zero-copy Blob streaming β€” native C++ speed instead of JavaScript string concat
  • βœ… Web Worker isolation β€” processing never blocks main thread
  • βœ… RFC 4180-aware quote parsing β€” handles escaped quotes and edge cases correctly
  • βœ… CPU cache warm-zone design β€” hot code paths stay optimized at scale

Architecture Diagram

CSV File (10M rows, ~800MB)
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pass 1: Analysis (50-100ms)    β”‚
β”‚ β€’ Detect delimiters            β”‚
β”‚ β€’ Validate structure           β”‚
β”‚ β€’ Calculate batch size         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Pass 2: Processing (12.1s)     β”‚
β”‚ Web Worker (background thread) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 60MB Batches                   β”‚
β”‚ β€’ ~127K rows per batch         β”‚
β”‚ β€’ Optimized for CPU cache      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Zero-Copy Blob Construction    β”‚
β”‚ β€’ Uint8Array operations        β”‚
β”‚ β€’ C++ native performance       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
Streaming Output β†’ Download

Why Most CSV Tools Fail at Scale

Most browser-based CSV tools hit hard limits at 500K-1M rows due to fundamental architectural constraints. Memory fragmentation from string concatenation, blocking main-thread operations, and inefficient parsing create performance walls that force tools to either crash or offload processing to servers.

Before building the solution, we had to understand why existing tools hit walls at 500K-1M rows:

1. Browser Memory Fragmentation

Most tools build CSV output as strings in memory. At 10M rows, you're juggling gigabytes of fragmented string data. The browser chokes.

Every string concatenation allocates new memory. At scale, this creates garbage collection storms that bring processing to a halt.

2. String Concatenation Overhead

Concatenating millions of CSV rows creates memory pressure:

// This pattern kills performance at scale
let csvOutput = "";
for (let i = 0; i < 10000000; i++) {
  csvOutput += row[i] + "\n";  // New allocation every iteration
}

At 10M rows, this approach fragments memory and triggers constant garbage collection.

3. Blocking the Main Thread

Process CSV synchronously and your browser freezes. Users see the "page unresponsive" dialog and kill the tab.

Excel does this. That's why it crashes.

4. Upload/Download Bottlenecks

Server-based tools require uploading your 500MB file before processing even starts. Then you wait again for download.

Round-trip latency kills throughput. A 500MB upload at 10 Mbps takes 6+ minutes before processing even begins.

5. CSV Quote Parsing Complexity

Proper CSV parsing with escaped quotes, delimiters inside quoted fields, and multi-line values is computationally expensive.

Most tools cut corners or crash on edge cases like:

"Smith, John","123 Main St, Apt 2","New York, NY"

6. Privacy & Security Risk

The moment you upload sensitive customer data, financial records, or PII to a third-party server, you've created compliance risk.

GDPR, HIPAA, and SOC2 auditors hate this. Every upload is a liability.

Most tools accept these limitations as inevitable.

We didn't.


The Architecture That Made 12 Seconds Possible

We built a two-pass streaming architecture combining Web Worker parallelism with zero-copy Blob construction to achieve server-class performance entirely client-side. This eliminates upload latency, infrastructure costs, and privacy risks while processing 10M rows at 819K rows/second.

We redesigned the CSV pipeline from scratch. Here's what we built:

1. Two-Pass Architecture

Instead of processing everything in one shot, we make two passes:

Pass 1: Analysis Phase (50-100ms for 10M rows)

  • Scan file to detect delimiters, quote characters, structural patterns
  • Build processing plan optimized for file structure
  • Determine optimal batch sizes
  • Validate consistency

Pass 2: Execution Phase (12.1s for 10M rows)

  • Execute plan with optimized batching
  • Zero-copy Blob construction
  • Streaming output generation
  • Progress tracking

This separation lets us optimize for the specific structure of your file.

2. 60MB Chunk Batching

We don't process row-by-row. We batch rows into 60MB chunks β€” the sweet spot where:

  • Chunks are large enough to amortize overhead
  • Small enough to avoid memory pressure
  • Perfectly sized for Web Worker parallelism
  • Minimize garbage collection frequency

At 10M rows, batching overhead becomes negligible. That's why throughput increases at scale.

3. Zero-Copy Uint8Array Blob Construction

Here's the code that makes it possible:

// Build CSV output without string concatenation
const chunks = [];
let currentSize = 0;
const CHUNK_THRESHOLD = 60 * 1024 * 1024; // 60MB

for (const batch of processedBatches) {
  // Parse CSV batch to array
  const csvText = Papa.unparse(batch, { 
    header: false,
    skipEmptyLines: true 
  });
  
  // Create binary blob instead of string concat
  const blob = new Blob([csvText], { type: 'text/csv' });
  chunks.push(blob);
  currentSize += blob.size;
  
  // Trigger download when threshold hit
  if (currentSize > CHUNK_THRESHOLD) {
    triggerStreamingDownload(new Blob(chunks));
    chunks.length = 0;
    currentSize = 0;
  }
}

// Handle remaining chunks
if (chunks.length > 0) {
  triggerStreamingDownload(new Blob(chunks));
}

Instead of building a giant string, we construct binary Blob chunks. The browser's native Blob API is optimized at the C++ level β€” orders of magnitude faster than JavaScript string operations.

4. Web Worker Parallelism

Processing runs in a background Web Worker. The main thread stays responsive.

// Main thread stays responsive
const worker = new Worker('/workers/splitColumnWorker.js');

worker.postMessage({
  file: csvFile,
  columnIndex: targetColumn,
  batchSize: 60 * 1024 * 1024
});

worker.onmessage = (event) => {
  if (event.data.type === 'progress') {
    updateProgressBar(event.data.percent);
  } else if (event.data.type === 'complete') {
    handleCompletion(event.data.stats);
  }
};

You can switch tabs, check email, or keep working while your 10M row file processes.

No "page unresponsive" dialogs. Ever.

5. Smart Delimiter Detection

We auto-detect:

  • Comma vs semicolon vs tab vs pipe delimiters
  • Quote character preferences
  • European CSV formats (; delimiter, , decimal separator)
  • Escaped quotes and edge cases
  • Mixed encoding scenarios

Your files just work.


The Privacy Advantage Nobody Talks About

Every byte of your CSV file stays in your browser.

Zero uploads. Zero servers. Zero third-party access.

This isn't just a privacy feature β€” it's a competitive moat.

While competitors build server infrastructure to handle large files, we've eliminated:

  • Upload latency (6+ minutes for 500MB files)
  • Download latency
  • Server costs ($1000s/month for processing infrastructure)
  • Data breach liability
  • GDPR compliance complexity
  • SOC2 audit requirements
  • PCI-DSS scope for payment data

This delivers enterprise-scale data processing with zero backend costs.


Real-World Performance Impact

For Data Analysts

Before: Excel crashes at 1M rows. Submit IT ticket. Wait 2 days for server-based solution.

After: Process 5M customer export files locally in 6 seconds. No IT tickets. No uploads.

For Marketing Teams

Before: Split 2M email campaign data. Upload to online tool (8 min). Wait for processing (4 min). Download results (5 min). Total: 17+ minutes.

After: Process locally in 2.4 seconds. Zero uploads. Instant results.

For Finance Teams

Before: Upload sensitive transaction logs (3M rows) to third-party tool. Compliance risk. 12+ minute round trip.

After: Process 3M rows in 3.6 seconds. Zero uploads. Zero compliance risk.

For Developers

Before: "Need backend infrastructure for CSV processing. Estimated: 2 weeks + $500/month AWS costs."

After: Client-side processing at 819K rows/sec. Zero backend. Zero costs.


Why Performance Improves at Scale

Batching overhead amortization and CPU cache optimization cause per-row processing cost to decrease as dataset size increases. At 1K rows, setup overhead dominates; at 10M rows, the JIT-optimized processing loop runs at peak efficiency with consistent 60MB batches keeping working data in L3 cache.

This seems impossible, but here's why it works:

Batching Overhead Amortization

At 1K rows, batching overhead is 30-40% of total processing time.

At 10M rows, batching overhead is <1% of total processing time.

The per-row cost decreases as dataset size increases.

Memory Management Efficiency

Smaller datasets trigger more frequent garbage collection cycles.

Larger datasets process in consistent 60MB chunks, maintaining steady memory pressure without fragmentation.

CPU Cache Optimization

Processing 60MB batches keeps working data in CPU cache longer, reducing memory access latency.

Browser JIT Compilation

V8's JIT compiler optimizes hot code paths after ~10K iterations.

At 10M rows, the processing loop is fully optimized. At 1K rows, it's still warming up.


Technical Deep-Dive: Two-Pass Architecture

Pass 1: Analysis (50-100ms for 10M rows)

// Scan first 50 rows to detect structure
const sample = rows.slice(0, 50);

const analysis = {
  delimiter: detectDelimiter(sample),
  quoteChar: detectQuoteChar(sample),
  columnCount: sample[0].length,
  hasHeader: detectHeader(sample),
  encoding: detectEncoding(sample),
  avgRowSize: calculateAvgRowSize(sample)
};

// Calculate optimal batch size
const batchSize = Math.min(
  MAX_BATCH_SIZE,
  Math.floor(60 * 1024 * 1024 / analysis.avgRowSize)
);

Pass 2: Execution (12.1s for 10M rows)

// Process in optimized batches
for (let i = 0; i < rows.length; i += batchSize) {
  const batch = rows.slice(i, i + batchSize);
  const processed = processBatch(batch, targetColumn);
  
  // Zero-copy blob construction
  const blob = new Blob([Papa.unparse(processed)]);
  chunks.push(blob);
  
  // Update progress
  postMessage({
    type: 'progress',
    percent: ((i + batchSize) / rows.length) * 100
  });
}

Comparison: Browser vs Server vs Desktop

Approach10M RowsPrivacyCostSetup Time
ExcelCrashesβœ… Local$00 min
Online CSV Tools15-20 min (incl upload)❌ Server upload$19-49/mo2 min
Python pandas8-12 secβœ… Local$030+ min setup
AWS Lambda5-10 sec + upload❌ Server$50-200/mo2-4 hours
SplitForge12.21 secβœ… Local$00 min

What This Means for Browser-Based Data Processing

For years, conventional wisdom said: "Heavy data processing needs servers."

Modern browser APIs changed that equation.

Web Workers, Blob construction, streaming parsers, and typed arrays can now deliver server-class performance β€” while keeping data local and eliminating infrastructure costs.

The bottleneck isn't the browser. It's the architecture.

SplitForge proves that privacy-first, client-side data processing can outperform traditional server-based tools β€” while eliminating upload latency, infrastructure costs, and compliance risk.


What This Tool Won't Do

Browser-based CSV processing is powerful, but it has limits. Here's what this tool doesn't replace:

Not a Replacement For:

  • Excel analysis features: No pivot tables, formulas, or chartsβ€”use Excel for analysis after processing
  • Database operations: No SQL queries, joins, or relational operationsβ€”export from your database first
  • Complex data transformations: No calculated columns, aggregations, or reshapingβ€”this processes existing columns
  • Real-time collaborative editing: No multi-user simultaneous editing like Google Sheets

Technical Limitations:

  • Memory ceiling: Browser memory limits mean files >5GB may fail on systems with <16GB RAM
  • Mobile performance: Processing 10M rows on a phone will be significantly slowerβ€”desktop recommended
  • Older browsers: Requires Chrome 90+, Firefox 88+, Safari 14+, Edge 90+ for Web Worker support

Best Use Cases: This tool excels at splitting columns in large CSV files quickly and privately. For complex data analysis, statistical modeling, or collaborative work, use dedicated tools after processing.


Additional Resources

Official Web Standards:

Browser Performance Documentation:

Technical Benchmarks:


FAQ

Excel has a hard limit of 1,048,576 rows per worksheet per Microsoft's official specifications. This is a fundamental architectural constraint of how Excel loads entire worksheets into memory. Beyond this limit, Excel either crashes or refuses to open the file.

Zero network latency eliminates the 6+ minutes needed to upload/download 500MB files. Modern JavaScript engines like V8 compile hot code paths to native machine code at runtime. The browser's Blob API operates at C++ speed, not JavaScript speed. Combined with Web Worker parallelism, this matches server-class performance without round-trip delays.

Tested successfully with 10M row files (~800MB). Theoretical limit is browser memory (typically 2-4GB for Chrome).

Yes, but slower. Mobile browsers have less memory and CPU. Recommend desktop for 5M+ row files.

Chrome 90+, Firefox 88+, Safari 14+, Edge 90+. Requires Web Worker and Blob API support.

Run tests with your own files directly in the browser. All benchmarks are fully reproducible on hardware similar to MacBook Pro M1, 16GB RAM.

All processing happens locally. Zero uploads. Zero external API calls. Your data never leaves your device.

Python pandas processes 10M rows in 8-12 seconds. Our browser-based tool matches that performance without requiring local installation or programming knowledge.


Process 10 Million Rows Now

819K rows/second throughput in browser
Zero uploads - your data never leaves your computer
Matches Python pandas performance without installation
Web Workers keep UI responsive while processing

Continue Reading

More guides to help you work smarter with your data

csv-guides

How to Audit a CSV File Before Processing

You inherited a CSV from a vendor. Before you load it into anything, you need to know what's actually in it β€” without trusting the filename.

Read More
csv-guides

Combine First and Last Name Columns in CSV for CRM Import

Your CRM requires a single Full Name column but your export has First and Last split. Here's how to combine them across 100K rows in 30 seconds.

Read More
csv-guides

Data Profiling vs Validation: What Each Reveals in Your CSV

Everyone says 'validate your CSV before import.' But validation can only check what you already know to look for. Profiling finds what you didn't know to check.

Read More