Engineering

15M CSV Rows in 67 Seconds: Browser-Based Processing Deep-Dive

December 9, 2025

By Splitforge Team

Yesterday, we did something that shouldn't be possible in a browser.

We processed 15,000,000 CSV rows with full deduplication in 67.1 seconds—entirely client-side, with zero uploads, while keeping the browser responsive.

That's 223,200 rows per second. While hashing every row. While comparing millions of values. While streaming output to downloadable files.

TL;DR: We built a browser-based CSV merge tool that processes 15 million rows in 67 seconds using streaming architecture, Web Workers for background processing, and custom optimizations. The tool handles files larger than available RAM (tested up to 10GB), maintains linear scaling from 1M to 15M rows, and keeps the browser responsive throughout. It's 14× faster than Excel's row limit, competitive with pandas (which requires Python installation), and eliminates privacy risks from cloud uploads. All processing happens locally using the ReadableStream API and batched output generation—proving browsers can handle enterprise-scale data processing with the right architecture.

The browser is no longer a toy runtime. With the right architecture, it can outperform desktop utilities.

The Moment We Stopped Expecting It to Crash
Why This Matters: Privacy vs. Performance
The Scaling Curve (Real Benchmarks)
The Challenge: Privacy vs. Performance
The Architecture: Three-Layer Streaming
The Optimizations: Why We're 3-5× Faster
The Benchmarks: Real-World Performance
How We Compare: Desktop Tools and Cloud Solutions
The Privacy Advantage: Zero Trust Architecture
What This Means for Different Users
The Technical Takeaway: Browser Architecture Patterns

The Moment We Stopped Expecting It to Crash

When we first attempted 5 million rows, we expected the browser to hang or crash. Instead, it finished smoothly in 15 seconds with constant memory usage and a responsive UI. So we pushed harder—10 million rows. Still stable, still linear scaling.

Then 15 million rows with deduplication enabled—the ultimate stress test. We were certain it would buckle under the memory pressure of hashing and comparing millions of entries while maintaining a Set of seen values.

It didn't.

67.1 seconds later, the merged file downloaded. No crashes. No memory spikes. No UI freezes. The browser stayed responsive throughout, processing 223,200 rows per second while users could switch tabs or interact with other page elements.

That's when we knew: browser-based tools aren't just "fast for the web"—they're competitive with compiled desktop applications when architected correctly.

Why This Matters: Privacy vs. Performance

Modern datasets are too big for Excel (1,048,576 row limit explained) and too sensitive for cloud uploads. SplitForge solves both problems simultaneously using client-side processing.

For context on what 15 million rows represents:

14× Excel's hard limit - Excel crashes or truncates at 1M rows
30× most browser tools - Typical browser CSV tools choke at 500K-2M rows
3-5× desktop utilities - Many desktop CSV processors crash at 2-5M rows
Faster than upload time - Most cloud solutions take longer than 67 seconds just to upload a 15M row file

The critical insight: Client-side processing eliminates the upload bottleneck. There's no network transfer overhead—just local computation.

The Scaling Curve (Real Benchmarks)

May 2026 Architecture Update — v11.0: The figures below reflect the original December 2025 in-memory architecture. The current tool uses OPFS streaming output (output written directly to browser storage, never accumulated in the JS heap) and external merge sort dedup (3-phase OPFS sort replaces the in-memory seenRows Map). JS heap is now ~40 MB regardless of file size. Verified: 2×21 GB vertical in ~33 min; 2×10.5 GB horizontal in ~19 min; 60M row dedup in ~7.5 min — all at 40–77 MB JS heap. The 15M / 67s figures below remain accurate for the architecture described in this post.

Testing environment: Chrome 120, Intel Core i5-12600KF, 16GB RAM, Windows 11

Rows (Millions)	Time (Seconds)	Throughput (rows/sec)	Memory Peak
1	3.0	333,000	< 600 MB
5	15.0	333,000	< 1.2 GB
6.5 (w/ dedupe)	28.6	227,100	< 1.5 GB
15 (w/ dedupe)	67.1	223,200	< 2 GB (v1.0 arch; v11.0: ~40 MB)

Key findings (December 2025 architecture):

Linear scaling from 1M to 15M rows - Streaming architecture prevents exponential memory growth
Deduplication adds ~30% overhead - Hash computation and Set storage cost is predictable
Memory stays proportional - No leaks, no exponential growth patterns
Browser remains responsive - Web Workers keep main thread free for UI updates

The Challenge: Privacy vs. Performance

When we built SplitForge, we had one non-negotiable requirement: all data processing must happen client-side. No uploads. No servers. No exceptions.

This wasn't a nice-to-have feature. For finance teams processing transaction records, healthcare organizations handling patient data, and legal departments managing confidential documents, uploading CSV files to third-party servers creates unacceptable compliance risks. GDPR Article 28 requires data processing agreements with any third-party processor—something free online tools don't provide.

But client-side processing comes with architectural constraints (as of the December 2025 architecture):

Limited in-memory accumulation - The original architecture accumulated output in JS heap, capped at ~2–4 GB. The v11.0 OPFS streaming architecture eliminates this constraint — JS heap stays at ~40 MB regardless of file size.
Single-threaded JavaScript - Synchronous processing blocks UI, freezing the browser
Browser APIs only - No system calls, no optimized C libraries, no GPU acceleration
Garbage collection pauses - Large object creation triggers GC, causing stutters

Most tools solve this by requiring uploads to backend servers with more resources. We decided to solve it with streaming architecture instead.

The Architecture: Three-Layer Streaming

Our CSV Merge tool uses a three-layer streaming architecture that processes files larger than available RAM while keeping the browser responsive.

1. Streaming File Reader

Instead of loading entire files into memory (file.text()), we stream them in chunks using the ReadableStream API. This lets us process 10GB files on machines with 4GB of RAM by keeping only small portions in memory at once.

According to MDN ReadableStream documentation, the streaming approach provides:

Backpressure handling - Consumer controls read speed
Chunked processing - Process data incrementally
Memory efficiency - Only current chunk in memory

async function* streamLinesFast(file) {
  const reader = file.stream().getReader();
  const decoder = new TextDecoder();
  let buffer = '';
  let inQuotes = false;
  
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    
    buffer += decoder.decode(value, { stream: true });
    
    // Process complete lines, keep partial line in buffer
    let lineStart = 0;
    let i = 0;
    
    while (i < buffer.length) {
      const c = buffer[i];
      if (c === '"') inQuotes = !inQuotes;
      
      if (!inQuotes && (c === '\n' || c === '\r')) {
        const line = buffer.slice(lineStart, i);
        lineStart = i + 1;
        if (line.trim()) yield line;
      }
      i++;
    }
    buffer = buffer.slice(lineStart);
  }
  if (buffer.trim()) yield buffer;
}

This generator yields complete CSV lines without loading the full file, handling quoted fields with embedded newlines correctly.

2. Web Worker for Background Processing

All heavy computation runs in a Web Worker per the Web Workers API specification, keeping the main thread responsive. The worker handles CPU-intensive operations while the UI thread remains free for user interactions.

The worker processes:

CSV parsing with proper quote/escape handling per RFC 4180
Deduplication using FNV-1a hash (fast, collision-resistant)
Output formatting with minimal escaping overhead
Memory management with explicit garbage collection hints

Progress updates stream back to the UI via postMessage() without blocking:

// Worker posts progress every 100K rows
if (rowsProcessed % 100000 === 0) {
  self.postMessage({
    type: 'progress',
    rowsProcessed,
    totalRows: estimatedTotal
  });
}

The main thread receives updates asynchronously, updating the progress bar without interrupting processing.

3. Batched Output Generation

Instead of building one giant string in memory, we create batches of ~500K rows and encode them directly to Uint8Array. This avoids massive JavaScript string allocations and speeds up Blob creation by 3-5×.

const batch = []; // Array of CSV lines
const BATCH_SIZE = 500000;
const chunks = [];

function flushBatch() {
  const text = batch.join('\n') + '\n';
  const encoded = new TextEncoder().encode(text);
  chunks.push(encoded);
  batch.length = 0; // Clear without reallocating
}

// Process rows
for (const row of parsedRows) {
  batch.push(row);
  if (batch.length >= BATCH_SIZE) {
    flushBatch();
  }
}

// Final Blob construction with pre-encoded chunks
const blob = new Blob(chunks, { type: 'text/csv' });

This is dramatically faster than new Blob([giantString]) because:

Avoids creating multi-GB strings in memory
TextEncoder converts directly to UTF-8 bytes
Blob constructor concatenates binary chunks efficiently
Garbage collector can free batches immediately after encoding

The Optimizations: Why We're 3-5× Faster

We implemented several critical optimizations that compound for massive speedups on large files.

Fast CSV Parser (3× Standard Parsers)

Standard CSV parsers prioritize correctness and edge cases over raw speed. We wrote a specialized parser that's 3× faster for our use case by eliminating unnecessary validation:

function parseCSVLineFast(line, delimiter) {
  const values = [];
  let current = '';
  let inQuotes = false;
  let i = 0;
  
  while (i < line.length) {
    const c = line[i];
    if (inQuotes) {
      if (c === '"' && line[i + 1] === '"') {
        current += '"';
        i += 2;
      } else if (c === '"') {
        inQuotes = false;
        i++;
      } else {
        current += c;
        i++;
      }
    } else {
      if (c === '"') {
        inQuotes = true;
        i++;
      } else if (c === delimiter) {
        values.push(current);
        current = '';
        i++;
      } else {
        current += c;
        i++;
      }
    }
  }
  values.push(current);
  return values;
}

This parser handles RFC 4180 quoted fields correctly while avoiding regex overhead and unnecessary string copies.

FNV-1a Hash for Deduplication

For deduplication, we hash millions of rows for O(1) duplicate detection. FNV-1a is perfect for this use case per Fowler-Noll-Vo hash function: fast, simple implementation, good collision resistance for hash tables.

function hashRow(values) {
  let hash = 2166136261; // FNV offset basis
  for (let i = 0; i < values.length; i++) {
    const str = values[i] == null ? '' : String(values[i]);
    for (let j = 0; j < str.length; j++) {
      hash ^= str.charCodeAt(j);
      hash = Math.imul(hash, 16777619); // FNV prime
    }
    hash ^= 31; // Field separator
  }
  return hash >>> 0; // Convert to unsigned 32-bit
}

Stored in a JavaScript Set(), this gives us O(1) average-case duplicate detection with minimal memory overhead compared to storing full row strings.

Fast Mode (Optional 3-5× Speedup)

For users who don't need strict CSV compliance, we offer Fast Mode which uses split() instead of full parsing. This yields 3-5× speedup on clean data without quoted fields or embedded delimiters.

// Fast mode for clean CSV data
const values = line.split(delimiter);

This trades RFC 4180 compliance for raw speed—appropriate when data quality is known and trusted.

The Benchmarks: Real-World Performance

Tested with production-scale datasets on a modern laptop (16GB RAM, Chrome 120, Intel i5-12600KF):

Input Size	Rows	Mode	Time	Throughput	Memory
10 MB	100K	Standard	0.3s	333K/sec	< 200 MB
50 MB	500K	Standard	1.5s	333K/sec	< 400 MB
100 MB	1M	Standard	3.0s	333K/sec	< 600 MB
500 MB	5M	Standard	15s	333K/sec	< 1.2 GB
1.5 GB	15M	Dedupe ON	67.1s	223K/sec	< 2 GB

Key findings:

Consistent throughput across file sizes - Streaming architecture works as designed
Deduplication overhead is predictable - ~30% performance cost for hash computation
Memory stays linear - No exponential growth, no memory leaks detected
Browser remains responsive - Can switch tabs, scroll, interact with UI throughout
Handles files larger than RAM - 10GB files process successfully on 16GB systems

For more details on how SplitForge handles million-row CSVs in your browser, see our architecture deep-dive.

How We Compare: Desktop Tools and Cloud Solutions

We tested against major CSV processing tools using the same 15M row dataset (1.5GB file, 23 columns):

Desktop Applications:

Excel 365: Crashes at 1,048,576 row limit (hard architectural constraint)
Python pandas: ~45 seconds (faster but requires Python environment + pandas installation)
SysTools CSV Splitter: Crashes on 5M+ row files, Windows-only
Kernel CSV Splitter: Requires installation, $99 license, Windows-only

Browser Tools:

ConvertCSV: 500K row limit, then forces upload to server
CSV Viewer: Hangs browser at 2M rows
Online-Convert: Requires upload, 5-10 minute server processing

Cloud Solutions:

Gigasheet: Requires upload (network bottleneck), 5-10 minute processing
CSVbox: 5M row limit on free tier, upload required
Data.world: Upload required, privacy compliance concerns

SplitForge doesn't just avoid uploads—it's faster than tools that run natively on your computer (except pandas with a pre-configured Python environment). And unlike every tool listed: your data never leaves your machine.

The Privacy Advantage: Zero Trust Architecture

Because everything runs client-side per W3C File API specification, SplitForge implements true zero-trust data processing:

No uploads: Your CSV never leaves your machine—files read locally via File API
No servers: We can't see your data because we never receive it
No limits: Process 100GB files if your browser can handle it (tested up to 10GB)
Offline capable: Works without internet after initial page load (service workers cache assets)
Audit-friendly: No data transmission = simplified GDPR/HIPAA compliance

For regulated industries (finance, healthcare, legal), this eliminates entire categories of compliance concerns:

No data processing agreements required (no third-party processor)
No vendor risk assessment needed
No audit trail of data transmission
No breach notification requirements for tool usage

The tool can't leak your data because it never has network access to it.

What This Means for Different Users

For Data Analysts

Processing capabilities Excel can't match:

Merge multiple CSV files totaling millions of rows
Process monthly exports without 1M row limit
Clean datasets with deduplication in seconds
No more "split, process chunks, manually recombine" workflows

For SMBs and Operations Teams

Enterprise capabilities without enterprise costs:

No expensive software licenses required
Works on any device with a modern browser
Zero IT approval needed (no installation)
No security risk from uploads

For Finance/Healthcare/Legal

Compliance-friendly by architecture:

GDPR-aligned and HIPAA-aligned by architecture (data never leaves device)
No vendor risk assessment required
Audit trail is local-only
Process most sensitive datasets without exposure risk

The Technical Takeaway: Browser Architecture Patterns

Building fast browser applications requires rethinking traditional architectures. Five key patterns emerged from this project:

1. Stream everything - Don't load files into memory. Use ReadableStream API for incremental processing.

2. Use Web Workers - Keep the UI responsive by offloading CPU-intensive work to background threads.

3. Batch output - Avoid giant string allocations. Encode batches to Uint8Array and concatenate binary chunks.

4. Optimize hot paths - Custom parsers beat generic libraries for specific use cases. Profile and optimize what actually runs millions of times.

5. Test at scale - 1M rows reveals problems 100K rows don't. Performance characteristics change dramatically with file size.

People underestimate what browsers can do. The browser has evolved from a document viewer into a powerful runtime capable of serious data processing—if you architect for its strengths rather than fighting its constraints.

If you need to merge CSV files with different column structures, our guide covers advanced merge techniques including automatic column alignment and header normalization.

Performance Metrics Summary

15,000,000 rows processed in 67.1 seconds
223,200 rows/second throughput with deduplication enabled
26,411 duplicates removed during processing
100% client-side—zero uploads, zero server usage, zero network transfer
14× Excel's row limit handled without breaking a sweat
< 2GB peak RAM despite processing 1.5GB source file

The browser is no longer a constraint for data processing—it's an opportunity.

Tools Referenced:

Browser APIs & Standards:

MDN ReadableStream - Streaming file I/O
MDN Web Workers API - Background processing
W3C File API - Browser file handling
FNV Hash Function - Fast hashing algorithm

Browser-Based Tools:

CSV Merger - Client-side processing, no uploads

All browser-based tools process data entirely in your browser—no uploads, no servers, no data leaving your computer. Essential for protecting sensitive financial records, healthcare information, and confidential business data.

Building browser-based tools? Have benchmarks to share? Connect on LinkedIn or tweet results at @splitforge.

Merge 15M Rows in 67 Seconds—Zero Uploads Required

Process files 14× larger than Excel's 1M row limit

Deduplicate millions of rows in seconds—100% client-side

Zero uploads—your data never leaves your device

Works offline after first load with service workers

Merge CSV Files Now →

Table of Contents