Verified Benchmark — June 2026

CSV Compare: 50M Rows
at 141K Rows/Sec

Primary key matching, composite keys, OPFS streaming architecture. Line-by-line: 141K rows/sec (10GB Gate 3) — primary-key: 54K rows/sec — tested June 2026 on Chrome 132, Windows 11, Intel i5-12600KF, 64GB RAM. Results vary by hardware, browser, and file complexity.

141K

Throughput (L-b-L)

rows/sec (verified)

360s

50M Row Test

line-by-line streaming

O(n)

Complexity

linear scaling

Never

File Uploads

zero transmission

Benchmark Performance

SplitForge 100K–5M times: Chrome 132, Windows 11, Intel i5-12600KF, 64GB RAM, Phase 1 (Feb 2026). 15M values: Phase 2 Sub-gate 2, same hardware, June 2026. 28M values: Phase 2 Gate 3, June 2026 — 217s / 132K rows/sec (verified). 50M values: 10GB Gate 3, June 2026 — 360s / 141K rows/sec (verified, 11.6GB combined). Results vary by hardware, browser, and file complexity. Excel times estimated from internal workflow testing.

Performance at Scale

Chrome 132 · Windows 11 · Intel i5-12600KF · 64GB RAM · February 2026

File Size	Primary Key Mode	Line-by-Line Mode	Test Notes
100K rows	0.5 sec / 200K rows/sec	~0.4 sec / 250K rows/sec	Startup overhead visible at small sizes; Map initialization ~50ms
500K rows	~2.4 sec / 208K rows/sec	~1.8 sec / 277K rows/sec	Mixed text + numeric columns, 8-column test file
1M rows	4.8 sec / 208K rows/sec	~3.6 sec / 277K rows/sec	Comma delimiter, string primary key (email), 10 columns
5M rows	14.3 sec / 350K rows/sec (Phase 1)	~11 sec / 454K rows/sec (Phase 1)	Phase 1 architecture (Feb 2026). Phase 2 adds OPFS index — see 15M row row below.
15M rows	275s / 54K rows/sec	121s / 123K rows/sec	Phase 2 Sub-gate 2 — Jun 2026. Primary-key: OPFS-backed index. Line-by-line: O(1) heap.
28M rows	not tested	217s / 132K rows/sec	Phase 2 Gate 3 — Jun 2026. L-b-L O(1) heap: A=28M rows, B=28.56M rows, 6.1 GB combined.
50M rows (10GB Gate 3, verified)	not tested	360s / 141K rows/sec	10GB Gate 3 — Jun 2026. A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined. added + deleted paths both verified. Throughput improves at scale due to pipeline saturation.

100K–5M rows: Phase 1 results (Feb 2026). 15M rows: Phase 2 Sub-gate 2 (Jun 2026). 28M rows: Phase 2 Gate 3 (Jun 2026). 50M rows: 10GB Gate 3 verified (Jun 2026) — A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined; added + deleted paths both confirmed. Phase 2 uses OPFS-backed primary-key index — throughput at scale reflects OPFS I/O overhead vs pure-memory Phase 1. Line-by-line uses O(1) streaming without OPFS Map, achieving higher throughput. Results vary by hardware, browser, and file complexity.

Test Methodology

How Phase 2 Streaming Architecture Works

OPFS-Backed Primary-Key Index

Phase 2 stores the primary key Map on the Origin Private File System (OPFS) rather than in heap memory. This means the index is written to disk as rows are processed — the worker heap stays bounded while supporting up to ~16.7M unique keys (the V8 Map architectural limit). OPFS I/O adds some overhead vs pure-memory, but unlocks file sizes that would previously OOM the browser.

O(1) Line-by-Line Streaming

Line-by-line mode uses streamLinesFastFromStream — both files are read as concurrent ReadableStreams. No rows are accumulated in memory: each row pair is compared and classified immediately. Worker heap stays ~10-20 MB regardless of file size. This mode achieves 141K rows/sec at 50M rows (10GB Gate 3) and has no practical ceiling. Requires both files to be in identical sort order; if order might differ, use Primary Key mode.

Web Worker Isolation + OPFS Result Buffers

All processing runs in a dedicated Web Worker thread. Result rows (added, deleted, modified, unchanged) are written to 4 separate OPFS output files as produced — not held in memory. The browser UI stays fully responsive during 15M+ row operations. Progress updates stream every 100ms.

Zero Network Transmission

File reading, parsing, indexing, comparison, and result generation all happen inside the browser sandbox. No data leaves the device at any point. This is not a proxy model or edge function — the JavaScript engine running in the browser tab is the only compute involved.

Memory Efficiency

~10-20 MB

Worker heap (L-b-L mode, any scale)

~3.8 MB

CDP main-thread heap at 50M rows (10GB Gate 3)

~16.7M

Max unique primary keys (V8 Map limit)

O(n)

Linear time complexity

ROI Calculator

Baseline: ~20 min per manual Excel VLOOKUP comparison session

Comparison sessions per year

Weekly = 52 / Monthly = 12 / Daily = 250

Your hourly rate (USD)

Analyst avg: $35–65/hr / Senior: $75–120/hr

16.9h

Time saved per year

$845

Annual value saved

~30s

SplitForge comparison time

Estimate based on 20-min manual VLOOKUP workflow vs 30-sec SplitForge workflow. Individual results vary.

Reproduce This Benchmark

These results are reproducible. Here's exactly how.

Generate test files

Create two CSV files (File A = baseline, File B = modified). Schema: columns id, name, email, value, status, updated_at — mixed string and numeric types.

For the 5M row benchmark: modify ~5% of rows in File B (change value and status columns), add 0.5% new rows, delete 0.5%.

Python generator (pandas):
df = pd.DataFrame({ 'id': range(5_000_000), ... })
df.to_csv('file_a.csv', index=False)
# Modify 5% of rows → file_b.csv

Run the comparison

Open CSV Compare in Chrome 132+

Load file_a.csv as File A, file_b.csv as File B

Select Primary Key mode → choose "id" column

Start a timer — click Compare

Stop timer when results appear

Repeat 3× on a cleared browser cache, discard highest and lowest

Verify zero uploads: Open DevTools → Network tab before clicking Compare. No requests containing file contents will appear — all processing is local.

Hardware note: Our test machine (Chrome 132, Windows 11, i5-12600KF, 64GB RAM) produced 14.3 sec for 5M rows. A 16GB laptop may be 20-30% slower — the comparison step still completes; absolute time varies. The structural advantage over Excel VLOOKUP holds regardless of hardware.

Honest Limitations: Where Falls Short

No tool is perfect for every use case. Here's where might be a better choice, and the real limitations of our browser-based architecture.

Browser-Based Processing

Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.

Workaround:

Close unnecessary browser tabs to free up memory. For files over 50M rows, consider database solutions.

No Offline Mode (Initial Load)

Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.

Workaround:

Once loaded, you can disconnect and continue processing. For true offline environments, desktop tools may be better.

Browser Tab Memory Limits

Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.

Workaround:

Use 64-bit browsers with sufficient RAM. Chrome and Firefox handle large files best.

Primary Key Mode: V8 Map Ceiling at ~16.7M Unique Keys

Phase 2 primary-key mode uses an OPFS-backed index — not in-heap RAM — so it is no longer memory-bound in the traditional sense. The practical ceiling is the V8 Map architectural limit (~16.7M unique keys). Line-by-line mode has no such ceiling; worker heap stays ~10-20 MB regardless of file size. For datasets with more than ~16.7M unique rows, use Python pandas merge, DuckDB, or a database JOIN.

Line-by-Line Requires Identical Sort Order

Line-by-line mode compares rows by position — row 1 vs row 1, row 2 vs row 2. If the files are sorted differently, this produces incorrect results. If sort order might differ between files, always use Primary Key mode instead.

CSV and TSV Only

SplitForge CSV Compare accepts .csv and .tsv files. Excel .xlsx files must be converted to CSV first using the Excel to CSV Converter tool. JSON, Parquet, and database format comparisons are not supported.

No Persistent History

Comparison results are not saved between browser sessions. To preserve a diff report, export to CSV or JSON immediately after the comparison completes. Closing or refreshing the tab discards in-memory results.

No Automation or Scheduling

SplitForge is a manual browser tool — not a CLI, API, or pipeline component. It cannot be run on a schedule, triggered by webhooks, or integrated into CI/CD workflows. For automated comparisons, use Python pandas merge, dbt tests, or a database-level diff query.

Questions about limitations? Check our FAQ section below or contact us via the feedback button.

Performance FAQ

How accurate are the Phase 2 benchmarks?

Why is primary key mode slower than line-by-line at 15M rows?

What changed between Phase 1 and Phase 2 architecture?

How does the comparison handle files with millions of differences?

What is the memory requirement for large file comparisons?

Can I compare files that use different delimiters?

Can I reproduce these benchmarks?

Ready to Compare?

Drop your two CSV files. 50M rows compared in 360 seconds at 141K rows/sec. Your data never leaves your browser.

No account requiredNo upload — everNo file size limits under 1GB

CSV Compare: 50M Rowsat 141K Rows/Sec

Benchmark Performance

Performance at Scale

Test Methodology

How Phase 2 Streaming Architecture Works

OPFS-Backed Primary-Key Index

O(1) Line-by-Line Streaming

Web Worker Isolation + OPFS Result Buffers

Zero Network Transmission

Memory Efficiency

ROI Calculator

Reproduce This Benchmark

Generate test files

Run the comparison

Honest Limitations: Where Falls Short

Browser-Based Processing

No Offline Mode (Initial Load)

Browser Tab Memory Limits

Primary Key Mode: V8 Map Ceiling at ~16.7M Unique Keys

Line-by-Line Requires Identical Sort Order

CSV and TSV Only

No Persistent History

No Automation or Scheduling

Performance FAQ

Ready to Compare?

CSV Compare: 50M Rows
at 141K Rows/Sec