Navigated to csv-compare-performance
Verified Benchmark — June 2026

CSV Compare: 50M Rows
at 141K Rows/Sec

Primary key matching, composite keys, OPFS streaming architecture. Line-by-line: 141K rows/sec (10GB Gate 3) — primary-key: 54K rows/sec — tested June 2026 on Chrome 132, Windows 11, Intel i5-12600KF, 64GB RAM. Results vary by hardware, browser, and file complexity.

141K
Throughput (L-b-L)
rows/sec (verified)
360s
50M Row Test
line-by-line streaming
O(n)
Complexity
linear scaling
Never
File Uploads
zero transmission

Benchmark Performance

SplitForge 100K–5M times: Chrome 132, Windows 11, Intel i5-12600KF, 64GB RAM, Phase 1 (Feb 2026). 15M values: Phase 2 Sub-gate 2, same hardware, June 2026. 28M values: Phase 2 Gate 3, June 2026 — 217s / 132K rows/sec (verified). 50M values: 10GB Gate 3, June 2026 — 360s / 141K rows/sec (verified, 11.6GB combined). Results vary by hardware, browser, and file complexity. Excel times estimated from internal workflow testing.

Performance at Scale

Chrome 132 · Windows 11 · Intel i5-12600KF · 64GB RAM · February 2026

File SizePrimary Key ModeLine-by-Line ModeTest Notes
100K rows0.5 sec / 200K rows/sec~0.4 sec / 250K rows/secStartup overhead visible at small sizes; Map initialization ~50ms
500K rows~2.4 sec / 208K rows/sec~1.8 sec / 277K rows/secMixed text + numeric columns, 8-column test file
1M rows4.8 sec / 208K rows/sec~3.6 sec / 277K rows/secComma delimiter, string primary key (email), 10 columns
5M rows14.3 sec / 350K rows/sec (Phase 1)~11 sec / 454K rows/sec (Phase 1)Phase 1 architecture (Feb 2026). Phase 2 adds OPFS index — see 15M row row below.
15M rows275s / 54K rows/sec121s / 123K rows/secPhase 2 Sub-gate 2 — Jun 2026. Primary-key: OPFS-backed index. Line-by-line: O(1) heap.
28M rowsnot tested217s / 132K rows/secPhase 2 Gate 3 — Jun 2026. L-b-L O(1) heap: A=28M rows, B=28.56M rows, 6.1 GB combined.
50M rows (10GB Gate 3, verified)not tested360s / 141K rows/sec10GB Gate 3 — Jun 2026. A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined. added + deleted paths both verified. Throughput improves at scale due to pipeline saturation.

100K–5M rows: Phase 1 results (Feb 2026). 15M rows: Phase 2 Sub-gate 2 (Jun 2026). 28M rows: Phase 2 Gate 3 (Jun 2026). 50M rows: 10GB Gate 3 verified (Jun 2026) — A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined; added + deleted paths both confirmed. Phase 2 uses OPFS-backed primary-key index — throughput at scale reflects OPFS I/O overhead vs pure-memory Phase 1. Line-by-line uses O(1) streaming without OPFS Map, achieving higher throughput. Results vary by hardware, browser, and file complexity.

How Phase 2 Streaming Architecture Works

OPFS-Backed Primary-Key Index

Phase 2 stores the primary key Map on the Origin Private File System (OPFS) rather than in heap memory. This means the index is written to disk as rows are processed — the worker heap stays bounded while supporting up to ~16.7M unique keys (the V8 Map architectural limit). OPFS I/O adds some overhead vs pure-memory, but unlocks file sizes that would previously OOM the browser.

O(1) Line-by-Line Streaming

Line-by-line mode uses streamLinesFastFromStream — both files are read as concurrent ReadableStreams. No rows are accumulated in memory: each row pair is compared and classified immediately. Worker heap stays ~10-20 MB regardless of file size. This mode achieves 141K rows/sec at 50M rows (10GB Gate 3) and has no practical ceiling. Requires both files to be in identical sort order; if order might differ, use Primary Key mode.

Web Worker Isolation + OPFS Result Buffers

All processing runs in a dedicated Web Worker thread. Result rows (added, deleted, modified, unchanged) are written to 4 separate OPFS output files as produced — not held in memory. The browser UI stays fully responsive during 15M+ row operations. Progress updates stream every 100ms.

Zero Network Transmission

File reading, parsing, indexing, comparison, and result generation all happen inside the browser sandbox. No data leaves the device at any point. This is not a proxy model or edge function — the JavaScript engine running in the browser tab is the only compute involved.

Memory Efficiency

~10-20 MB
Worker heap (L-b-L mode, any scale)
~3.8 MB
CDP main-thread heap at 50M rows (10GB Gate 3)
~16.7M
Max unique primary keys (V8 Map limit)
O(n)
Linear time complexity

ROI Calculator

Baseline: ~20 min per manual Excel VLOOKUP comparison session

Weekly = 52 / Monthly = 12 / Daily = 250
Analyst avg: $35–65/hr / Senior: $75–120/hr
16.9h
Time saved per year
$845
Annual value saved
~30s
SplitForge comparison time

Estimate based on 20-min manual VLOOKUP workflow vs 30-sec SplitForge workflow. Individual results vary.

Reproduce This Benchmark

These results are reproducible. Here's exactly how.

Generate test files

Create two CSV files (File A = baseline, File B = modified). Schema: columns id, name, email, value, status, updated_at — mixed string and numeric types.

For the 5M row benchmark: modify ~5% of rows in File B (change value and status columns), add 0.5% new rows, delete 0.5%.

Python generator (pandas):
df = pd.DataFrame({ 'id': range(5_000_000), ... })
df.to_csv('file_a.csv', index=False)
# Modify 5% of rows → file_b.csv

Run the comparison

1
Open CSV Compare in Chrome 132+
2
Load file_a.csv as File A, file_b.csv as File B
3
Select Primary Key mode → choose "id" column
4
Start a timer — click Compare
5
Stop timer when results appear
6
Repeat 3× on a cleared browser cache, discard highest and lowest

Verify zero uploads: Open DevTools → Network tab before clicking Compare. No requests containing file contents will appear — all processing is local.

Hardware note: Our test machine (Chrome 132, Windows 11, i5-12600KF, 64GB RAM) produced 14.3 sec for 5M rows. A 16GB laptop may be 20-30% slower — the comparison step still completes; absolute time varies. The structural advantage over Excel VLOOKUP holds regardless of hardware.

Honest Limitations: Where Falls Short

No tool is perfect for every use case. Here's where might be a better choice, and the real limitations of our browser-based architecture.

Browser-Based Processing

Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.

Workaround:
Close unnecessary browser tabs to free up memory. For files over 50M rows, consider database solutions.

No Offline Mode (Initial Load)

Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.

Workaround:
Once loaded, you can disconnect and continue processing. For true offline environments, desktop tools may be better.

Browser Tab Memory Limits

Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.

Workaround:
Use 64-bit browsers with sufficient RAM. Chrome and Firefox handle large files best.

Primary Key Mode: V8 Map Ceiling at ~16.7M Unique Keys

Phase 2 primary-key mode uses an OPFS-backed index — not in-heap RAM — so it is no longer memory-bound in the traditional sense. The practical ceiling is the V8 Map architectural limit (~16.7M unique keys). Line-by-line mode has no such ceiling; worker heap stays ~10-20 MB regardless of file size. For datasets with more than ~16.7M unique rows, use Python pandas merge, DuckDB, or a database JOIN.

Line-by-Line Requires Identical Sort Order

Line-by-line mode compares rows by position — row 1 vs row 1, row 2 vs row 2. If the files are sorted differently, this produces incorrect results. If sort order might differ between files, always use Primary Key mode instead.

CSV and TSV Only

SplitForge CSV Compare accepts .csv and .tsv files. Excel .xlsx files must be converted to CSV first using the Excel to CSV Converter tool. JSON, Parquet, and database format comparisons are not supported.

No Persistent History

Comparison results are not saved between browser sessions. To preserve a diff report, export to CSV or JSON immediately after the comparison completes. Closing or refreshing the tab discards in-memory results.

No Automation or Scheduling

SplitForge is a manual browser tool — not a CLI, API, or pipeline component. It cannot be run on a schedule, triggered by webhooks, or integrated into CI/CD workflows. For automated comparisons, use Python pandas merge, dbt tests, or a database-level diff query.

Questions about limitations? Check our FAQ section below or contact us via the feedback button.

Performance FAQ

Ready to Compare?

Drop your two CSV files. 50M rows compared in 360 seconds at 141K rows/sec. Your data never leaves your browser.

No account requiredNo upload — everNo file size limits under 1GB