Benchmark Performance
Performance at Scale
Chrome 132 · Windows 11 · Intel i5-12600KF · 64GB RAM · February 2026
| File Size | Primary Key Mode | Line-by-Line Mode | Test Notes |
|---|---|---|---|
| 100K rows | 0.5 sec / 200K rows/sec | ~0.4 sec / 250K rows/sec | Startup overhead visible at small sizes; Map initialization ~50ms |
| 500K rows | ~2.4 sec / 208K rows/sec | ~1.8 sec / 277K rows/sec | Mixed text + numeric columns, 8-column test file |
| 1M rows | 4.8 sec / 208K rows/sec | ~3.6 sec / 277K rows/sec | Comma delimiter, string primary key (email), 10 columns |
| 5M rows | 14.3 sec / 350K rows/sec (Phase 1) | ~11 sec / 454K rows/sec (Phase 1) | Phase 1 architecture (Feb 2026). Phase 2 adds OPFS index — see 15M row row below. |
| 15M rows | 275s / 54K rows/sec | 121s / 123K rows/sec | Phase 2 Sub-gate 2 — Jun 2026. Primary-key: OPFS-backed index. Line-by-line: O(1) heap. |
| 28M rows | not tested | 217s / 132K rows/sec | Phase 2 Gate 3 — Jun 2026. L-b-L O(1) heap: A=28M rows, B=28.56M rows, 6.1 GB combined. |
| 50M rows (10GB Gate 3, verified) | not tested | 360s / 141K rows/sec | 10GB Gate 3 — Jun 2026. A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined. added + deleted paths both verified. Throughput improves at scale due to pipeline saturation. |
100K–5M rows: Phase 1 results (Feb 2026). 15M rows: Phase 2 Sub-gate 2 (Jun 2026). 28M rows: Phase 2 Gate 3 (Jun 2026). 50M rows: 10GB Gate 3 verified (Jun 2026) — A=5.77GB/50M rows, B=5.88GB/50.5M rows, 11.6GB combined; added + deleted paths both confirmed. Phase 2 uses OPFS-backed primary-key index — throughput at scale reflects OPFS I/O overhead vs pure-memory Phase 1. Line-by-line uses O(1) streaming without OPFS Map, achieving higher throughput. Results vary by hardware, browser, and file complexity.
Test Methodology
How Phase 2 Streaming Architecture Works
OPFS-Backed Primary-Key Index
Phase 2 stores the primary key Map on the Origin Private File System (OPFS) rather than in heap memory. This means the index is written to disk as rows are processed — the worker heap stays bounded while supporting up to ~16.7M unique keys (the V8 Map architectural limit). OPFS I/O adds some overhead vs pure-memory, but unlocks file sizes that would previously OOM the browser.
O(1) Line-by-Line Streaming
Line-by-line mode uses streamLinesFastFromStream — both files are read as concurrent ReadableStreams. No rows are accumulated in memory: each row pair is compared and classified immediately. Worker heap stays ~10-20 MB regardless of file size. This mode achieves 141K rows/sec at 50M rows (10GB Gate 3) and has no practical ceiling. Requires both files to be in identical sort order; if order might differ, use Primary Key mode.
Web Worker Isolation + OPFS Result Buffers
All processing runs in a dedicated Web Worker thread. Result rows (added, deleted, modified, unchanged) are written to 4 separate OPFS output files as produced — not held in memory. The browser UI stays fully responsive during 15M+ row operations. Progress updates stream every 100ms.
Zero Network Transmission
File reading, parsing, indexing, comparison, and result generation all happen inside the browser sandbox. No data leaves the device at any point. This is not a proxy model or edge function — the JavaScript engine running in the browser tab is the only compute involved.
Memory Efficiency
ROI Calculator
Baseline: ~20 min per manual Excel VLOOKUP comparison session
Estimate based on 20-min manual VLOOKUP workflow vs 30-sec SplitForge workflow. Individual results vary.
Reproduce This Benchmark
These results are reproducible. Here's exactly how.
Generate test files
Create two CSV files (File A = baseline, File B = modified). Schema: columns id, name, email, value, status, updated_at — mixed string and numeric types.
For the 5M row benchmark: modify ~5% of rows in File B (change value and status columns), add 0.5% new rows, delete 0.5%.
Python generator (pandas):df = pd.DataFrame({ 'id': range(5_000_000), ... }) df.to_csv('file_a.csv', index=False)# Modify 5% of rows → file_b.csv
Run the comparison
Verify zero uploads: Open DevTools → Network tab before clicking Compare. No requests containing file contents will appear — all processing is local.
Hardware note: Our test machine (Chrome 132, Windows 11, i5-12600KF, 64GB RAM) produced 14.3 sec for 5M rows. A 16GB laptop may be 20-30% slower — the comparison step still completes; absolute time varies. The structural advantage over Excel VLOOKUP holds regardless of hardware.
Honest Limitations: Where Falls Short
No tool is perfect for every use case. Here's where might be a better choice, and the real limitations of our browser-based architecture.
Browser-Based Processing
Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.
No Offline Mode (Initial Load)
Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.
Browser Tab Memory Limits
Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.
Primary Key Mode: V8 Map Ceiling at ~16.7M Unique Keys
Phase 2 primary-key mode uses an OPFS-backed index — not in-heap RAM — so it is no longer memory-bound in the traditional sense. The practical ceiling is the V8 Map architectural limit (~16.7M unique keys). Line-by-line mode has no such ceiling; worker heap stays ~10-20 MB regardless of file size. For datasets with more than ~16.7M unique rows, use Python pandas merge, DuckDB, or a database JOIN.
Line-by-Line Requires Identical Sort Order
Line-by-line mode compares rows by position — row 1 vs row 1, row 2 vs row 2. If the files are sorted differently, this produces incorrect results. If sort order might differ between files, always use Primary Key mode instead.
CSV and TSV Only
SplitForge CSV Compare accepts .csv and .tsv files. Excel .xlsx files must be converted to CSV first using the Excel to CSV Converter tool. JSON, Parquet, and database format comparisons are not supported.
No Persistent History
Comparison results are not saved between browser sessions. To preserve a diff report, export to CSV or JSON immediately after the comparison completes. Closing or refreshing the tab discards in-memory results.
No Automation or Scheduling
SplitForge is a manual browser tool — not a CLI, API, or pipeline component. It cannot be run on a schedule, triggered by webhooks, or integrated into CI/CD workflows. For automated comparisons, use Python pandas merge, dbt tests, or a database-level diff query.
Questions about limitations? Check our FAQ section below or contact us via the feedback button.