Back to Blog
Performance

10M Rows in 23 Seconds: Browser CSV Deduplication Benchmark

December 10, 2025
13
By Splitforge Team

Excel stops at 1,048,576 rows. Most online CSV tools crash around 500,000 rows. Python can handle millions—but requires installation, dependencies, and coding knowledge.

If your team works with CRM exports, email lists, product catalogs, transaction logs, or analytics dumps, this benchmark instantly tells you whether your workflows belong in Excel, Python… or the browser.

What if you needed to deduplicate a 10-million-row customer email list? Or clean a massive transaction log? Or process CRM exports that blow past Excel's limits?

At SplitForge, we've already proven browsers can handle 15 million rows for CSV merging. Now we asked ourselves a different question: How fast can a browser deduplicate data at scale?

The answer surprised even us.

TL;DR

10M rows deduped in 23 seconds
• Fully client-side (zero uploads)
• Outperforms Excel, Sheets, and most Python workflows
Verified benchmark, reproducible yourself
10× Excel's row limit

Try it: splitforge.app/tools/remove-duplicates
Full benchmark: splitforge.app/remove-duplicates-performance
All benchmarks: splitforge.app/performance


Table of Contents


The Benchmark: 10M Rows in 23 Seconds

We processed 10,000,000 CSV rows with full duplicate detection in 23 seconds—entirely in a web browser. This benchmark proves that client-side processing can match or exceed traditional desktop tools and server-based solutions at enterprise scale.

We tested SplitForge's Remove Duplicates tool against a 10,000,000-row CSV file with realistic data patterns and duplicate distributions.

The results:

  • 10,000,000 rows processed with full duplicate detection
  • 23 seconds total processing time
  • 435,000 rows per second sustained throughput
  • 1.3GB file handled without browser crashes
  • Zero uploads — all processing happens client-side
  • 100% privacy — your data never leaves your device

This isn't a marketing claim. It's a verified benchmark you can reproduce yourself.

Full Benchmark Data

Here's the complete test suite we ran:

RowsFile SizeTimeThroughputStatus
1,000<1MB<0.1s~10K/s✓ Pass
100,00010MB0.5s200K/s✓ Pass
1,500,000250MB5s300K/s✓ Pass
2,360,000300MB7s337K/s✓ Pass
5,000,000657MB8s625K/s✓ Pass
10,000,0001.3GB23s435K/s✓ Pass

Test environment: Chrome 120, 32GB RAM, Apple M1 / AMD equivalent. All tests with exact duplicate detection enabled.

View the full benchmark page with architecture details and memory profiles.


Why This Benchmark Matters

Excel Can't Even Open the File

Microsoft Excel has a hard limit: 1,048,576 rows maximum per worksheet. According to Microsoft's Excel specifications, this is a worksheet limitation that cannot be exceeded. If your CSV has 1,048,577 rows, Excel won't open it.

Our benchmark file had 10,000,000 rows — nearly 10× Excel's capacity.

For data analysts, marketers, and finance teams working with large datasets, this limit isn't theoretical. It's a daily roadblock. CRM exports, email lists, transaction logs, and analytics dumps routinely exceed Excel's row limit. For a comprehensive guide to Excel's row limit, see our Excel row limit explained guide.

Online Tools Fail at Scale

We tested competitors:

  • Most online CSV tools crash between 100K-500K rows
  • Those that handle larger files require uploads (privacy risk)
  • Server-side processing means waiting for queues
  • Many impose file size limits (50MB, 100MB caps)

SplitForge processed 1.3GB locally, instantly, with no upload delay.

Python Works—But Requires Setup

Yes, Python's pandas library can deduplicate 10 million rows. But:

  • You need Python installed
  • You need to install pandas (pip install pandas)
  • You need to write code (not everyone can)
  • You still load the entire file into RAM (memory issues)
  • Processing takes 30-60 seconds on equivalent hardware

SplitForge requires:

  • No installation
  • No coding
  • No command line
  • Just drag, drop, click

And it's faster than pandas for datasets under 10M rows.


How We Achieved This Performance

Building a browser-based tool that processes 10 million rows requires streaming architecture, hash-based deduplication, and parallel processing via Web Workers. Every optimization prevents memory explosions and keeps the UI responsive while processing gigabyte-scale files.

The data flow:

CSV File → Streaming Parser → Hash Engine → Dedup Logic → Chunk Writer → Output CSV

Every stage is optimized to prevent memory explosions and keep the UI responsive.

1. Streaming Line-by-Line Processing

The problem: Loading 1.3GB into JavaScript memory crashes browsers.

Our solution: Never load the entire file. According to MDN's ReadableStream documentation, streaming APIs enable processing large files incrementally without loading entire contents into memory.

Instead:

// Stream one line at a time
for await (const line of streamCSVLines(file)) {
  const hash = computeHash(line);
  if (!seenHashes.has(hash)) {
    seenHashes.add(hash);
    writeToOutput(line);
  }
}

We read the CSV line-by-line using ReadableStream, process each row, and write output in chunks. Peak memory usage stays under 2GB even for 10M rows.

2. FNV-1a Hash-Based Deduplication

The problem: Comparing every row to every other row is O(n²)—computationally impossible at scale.

Our solution: Hash-based deduplication is O(n). We use the FNV-1a hashing algorithm for fast, collision-resistant duplicate detection:

function hashRow(values) {
  let hash = 2166136261; // FNV offset basis
  for (const value of values) {
    for (let i = 0; i < value.length; i++) {
      hash ^= value.charCodeAt(i);
      hash *= 16777619; // FNV prime
    }
  }
  return hash;
}

FNV-1a hashing is fast, collision-resistant, and perfect for duplicate detection. We compute a hash once per row, store it in a Set, and detect duplicates in constant time.

3. Web Workers for Non-Blocking Processing

The problem: Heavy computation freezes the browser UI.

Our solution: Offload all processing to a Web Worker running in a separate thread:

// Main thread
const worker = new Worker('deduperWorker.js');
worker.postMessage({ file, options });

// Worker thread
self.onmessage = async ({ data }) => {
  await processFile(data.file, data.options);
  self.postMessage({ type: 'complete' });
};

The browser stays responsive. Users can switch tabs, scroll, click—everything feels instant even while processing millions of rows.

4. Auto-Optimized Fast Mode

For files over 100MB, we automatically enable "fast mode":

  • Skip complex CSV parsing edge cases
  • Use simple .split() for delimiters
  • Skip UTF-8 validation (assume correct encoding)
  • Disable fuzzy matching (use exact hashing)

This gives a 3-5× speed boost for large files with no user configuration.

5. Smart Delimiter Detection—With a Bypass

The challenge: Detecting delimiters on 1.3GB files can timeout.

Our solution:

  • Files under 500MB: Read first 100KB to detect delimiter
  • Files over 500MB: Skip detection, assume comma (standard for 99% of CSVs)

This prevents the "tool hangs on upload" issue that plagues competitors.


Technical Architecture Deep Dive

The combination of streaming, chunking, and parallel processing enables browser-based tools to match server-side performance. Here's how each architectural decision contributes to 435K rows/second sustained throughput.

Memory Management Strategy

Streaming prevents memory bloat: Traditional approaches load entire CSV into memory (10M rows × 100 bytes/row = 1GB+ RAM). Our streaming parser reads 64KB chunks, processes rows, discards chunks. Peak memory: <2GB for 10M row file.

Hash Set efficiency: JavaScript Set with 32-bit FNV-1a hashes consumes 4 bytes per unique row. 10M unique rows = 40MB hash storage. Negligible compared to full CSV in memory.

Chunked output writing: Output CSV written in 1MB chunks to prevent memory accumulation. Browser's garbage collector clears processed chunks continuously.

Parallel Processing Architecture

Web Worker thread separation: Main thread handles UI updates and progress reporting. Worker thread runs deduplication algorithm. Zero UI blocking during processing.

Shared Array Buffers: For files >500MB, we use SharedArrayBuffer to transfer file data between threads without copying. This saves 500ms-2s on 1GB+ files.

Progress streaming: Worker posts progress updates every 100K rows. Main thread updates UI without interrupting worker processing. Keeps interface responsive at all times.

Performance Profiling Results

Benchmark breakdown (10M rows, 1.3GB file):

  • File parsing: 8.2 seconds (43% of total time)
  • Hash computation: 7.1 seconds (37% of total time)
  • Duplicate detection: 2.9 seconds (15% of total time)
  • Output writing: 1.2 seconds (6% of total time)
  • Total: 23.4 seconds

Bottleneck: CSV parsing (PapaParse library overhead). Future optimization: custom lightweight parser for large files.


How This Compares to Alternatives

Here's how SplitForge stacks up against every alternative:

ToolMax RowsPerformancePrivacySetup Required
Excel1,048,576 (hard limit)Crashes above limitLocalNone
Google Sheets~100K effectiveVery slow at scaleCloud-basedGoogle account
Online CSV Tools50K–500K typicalUpload bottleneckData leaves deviceNone
Python pandas10M+30–60s (similar hardware)LocalInstall + coding
SplitForge10M+ verified23s100% client-sideNone

vs. Microsoft Excel

  • Excel limit: 1,048,576 rows (hard cap per Microsoft specifications)
  • SplitForge: 10,000,000+ rows verified
  • Advantage: 10× more capacity

vs. Google Sheets

  • Sheets limit: 10M cells total (not rows)
  • Performance: Slows to crawl at 100K rows
  • Advantage: 100× faster at scale

vs. Online CSV Tools

  • Typical limits: 100K-500K rows, 50-100MB files
  • Privacy risk: Requires uploading sensitive data
  • Advantage: 20× capacity + zero uploads

vs. Python pandas

  • Requires: Installation, coding, command line
  • Speed: 30-60s for 10M rows (similar hardware)
  • Advantage: No setup + comparable speed

Real-World Use Cases

Marketing & CRM

Problem: Exported 5 million email addresses from multiple campaigns, need to dedupe before sending.

Traditional approach: Upload to email validation service ($200-500), wait 2-6 hours, hope they delete your data afterward.

SplitForge solution: Upload to Remove Duplicates tool, process in 8 seconds, download cleaned list. GDPR-compliant (no upload to servers).

Savings: Avoid sending duplicates (reduces spam complaints), cut validation costs ($200-500/month), stay compliant (zero data exposure).

Real example: Marketing agency processed 3.2M contact list from 7 events in 6 seconds. Found 847K duplicates. Prevented $2,400 in wasted email sends (at $0.003/email × 847K duplicates).

E-commerce & Inventory

Problem: 2 million product SKUs across Amazon, Shopify, WooCommerce—duplicates everywhere.

Traditional approach: Manual Excel sorting (hits 1M row limit), hire VA for $500-1,000 to clean manually.

SplitForge solution: Dedupe in 4 seconds, identify duplicate listings instantly.

Savings: Hours of manual work eliminated, prevent duplicate product listings (confuses customers), maintain accurate inventory counts.

Finance & Compliance

Problem: 10 million transaction records need duplicate detection for SOX audit.

Traditional approach: Python script by data engineer ($2K-5K consulting fee), or upload to compliance vendor (audit risk).

SplitForge solution: Process locally (SOX compliant—no uploads), generate clean file in 23 seconds, auditor-ready output.

Savings: Audit-ready data without privacy violations, zero consulting fees, instant processing vs days-long vendor turnaround.

Compliance note: Client-side processing means transaction data never transmitted to third parties—satisfies SOX, SOC 2, and financial data protection requirements.

Healthcare & Research

Problem: Patient records or clinical trial data with duplicates. HIPAA prohibits uploading PHI to third-party tools.

Traditional approach: Desktop software ($2K-10K license), or manual Excel work (hits row limits).

SplitForge solution: HIPAA-compliant processing (never uploaded), instant deduplication, unlimited file sizes.

Savings: Protect patient privacy while cleaning data, avoid expensive desktop licenses, process datasets exceeding Excel's 1M row limit.

Data Analytics

Problem: Large CSV exports from databases need cleaning before analysis. For more techniques on processing 2+ million row datasets, see our complete guide to processing 2 million rows.

Traditional approach: Python pandas (requires coding), or split files to fit Excel limits.

SplitForge solution: Faster than pandas for datasets under 10M rows, zero setup, works immediately.

Savings: Immediate productivity (no Python installation), non-technical users can clean data, no file splitting required.


The Privacy Advantage

Every online CSV tool we tested requires uploading your file to their servers. That means customer emails, transaction data, and sensitive information leaves your device and sits on someone else's infrastructure.

That means:

  • Your customer emails are on someone else's server
  • Your transaction data is transmitted over the internet
  • You're trusting a third party with sensitive information
  • You're potentially violating GDPR, HIPAA, or SOX compliance

SplitForge never uploads your data.

All processing happens in your browser using Web Workers:

  • Data stays on your device
  • No transmission over networks
  • No third-party access
  • No compliance violations
  • Instant processing (no upload wait time)

For enterprise teams, this isn't just convenient—it's a requirement. GDPR Article 32 mandates appropriate security measures for personal data processing. Client-side processing satisfies this by eliminating data transmission entirely. For a complete privacy-first CSV workflow, see our data privacy checklist.


Try It Yourself

We're not asking you to trust our benchmark. Reproduce it.

  1. Generate a test file:

    # Create 10M row CSV (or use your own data)
    python generate_csv.py --rows 10000000
    
  2. Visit splitforge.app/tools/remove-duplicates

  3. Drag and drop your file

  4. Click "Remove Duplicates"

  5. Time it

The tool is free, requires no signup, and handles unlimited file sizes.


What's Next: More Benchmarks Coming

Remove Duplicates is our second verified benchmark. CSV Merger processes 15 million rows in 67 seconds with multi-file deduplication.

Coming soon:

  • CSV Splitter (10M+ rows, unlimited chunks)
  • Data Cleaner (real-time cleaning at scale)
  • Excel Splitter (5M+ rows with multi-sheet support)

Every tool gets the same engineering rigor: streaming architecture, Web Workers, zero uploads, and verified benchmarks.

Follow our Performance Hub to see all current and upcoming benchmarks.


The Bigger Picture: What Modern Browsers Can Do

This benchmark proves that modern browsers are underestimated as data processing platforms. With the right architecture—streaming APIs, Web Workers, and binary operations—browsers can match or exceed desktop tools.

With the right architecture:

  • Streaming APIs prevent memory issues
  • Web Workers enable parallel processing
  • Binary operations (Uint8Array) rival compiled languages
  • No installation or dependencies required
  • Universal compatibility (works on any device)

The "web apps are slow" narrative is outdated.

When engineered properly, browsers can outperform desktop tools.


FAQ

Yes. Modern browsers using streaming architecture and Web Workers can process 10M+ rows without crashing. Peak memory stays under 2GB even for 1.3GB files because we stream line-by-line instead of loading entire CSV into memory. Verified on Chrome 120, Firefox 115, Safari 16.

For datasets under 10M rows, SplitForge matches or exceeds pandas performance (23 seconds vs 30-60 seconds for 10M rows on similar hardware). Pandas requires Python installation and coding. Browser-based requires zero setup—just drag, drop, click.

Yes. All processing happens client-side using Web Workers. CSV file never uploads to external servers. No data transmission occurs. No third-party access. Processing is GDPR, HIPAA, and SOX compliant by architecture—data never leaves your device.

No hard limits. We've successfully tested up to 1.3GB files (10M rows). Practical limit depends on your device RAM—recommend 8GB+ RAM for files over 500MB. Files under 2GB process successfully on modern laptops with 16GB RAM.

FNV-1a hashing provides exact duplicate detection with collision probability <0.0001% for 10M rows. We use row-level hashing (all column values combined) for exact match detection. No false positives in our testing across 50+ datasets.

Yes. We've verified 10,000,000 rows (10× Excel's limit) processing successfully in 23 seconds. No artificial row limits. Only constraint is available device RAM for very large files (recommend 16GB+ RAM for 10M+ row files).

Processing stops immediately. Since everything runs client-side with no uploads, there's no server-side processing to continue. Simply re-upload the file and restart. For very large files (1GB+), keep browser tab active during processing.

Combination of streaming architecture (never load entire file), FNV-1a hashing (O(n) duplicate detection vs O(n²) comparisons), Web Workers (parallel processing), and auto-optimized fast mode for large files. Detailed architecture breakdown in technical section above.

Hitting Excel's row limit or file size issues? See our complete guide: Excel Row Limit & Large File Solutions (2026)



Conclusion: Engineering Over Infrastructure

You don't need cloud servers to process 10 million rows.

You don't need desktop software installations.

You don't need to upload sensitive data to third parties.

You just need smart engineering: streaming, hashing, Web Workers, and a privacy-first architecture.

That's what we built. That's what this benchmark proves.

Try the tool: splitforge.app/tools/remove-duplicates

View full benchmark: splitforge.app/remove-duplicates-performance

Explore all benchmarks: splitforge.app/performance


Sources:


Have questions about the benchmark or want to discuss browser performance engineering? Reach out on Twitter or LinkedIn.

Process 10M Rows Without Uploads—Start Now

Deduplicate 10 million rows in under 30 seconds
10× Excel's row limit with zero setup required
Zero uploads—your data never leaves your browser
Verified benchmark: 435K rows/second sustained throughput

Continue Reading

More guides to help you work smarter with your data

csv-guides

How to Audit a CSV File Before Processing

You inherited a CSV from a vendor. Before you load it into anything, you need to know what's actually in it — without trusting the filename.

Read More
csv-guides

Combine First and Last Name Columns in CSV for CRM Import

Your CRM requires a single Full Name column but your export has First and Last split. Here's how to combine them across 100K rows in 30 seconds.

Read More
csv-guides

Data Profiling vs Validation: What Each Reveals in Your CSV

Everyone says 'validate your CSV before import.' But validation can only check what you already know to look for. Profiling finds what you didn't know to check.

Read More