Benchmark Performance
Performance at Scale
Chrome 132 · Windows 11 · Intel i7-12700K · 32GB RAM · February 2026
| File Size | Primary Key Mode | Line-by-Line Mode | Test Notes |
|---|---|---|---|
| 100K rows | 0.5 sec / 200K rows/sec | ~0.4 sec / 250K rows/sec | Startup overhead visible at small sizes; Map initialization ~50ms |
| 500K rows | ~2.4 sec / 208K rows/sec | ~1.8 sec / 277K rows/sec | Mixed text + numeric columns, 8-column test file |
| 1M rows | 4.8 sec / 208K rows/sec | ~3.6 sec / 277K rows/sec | Comma delimiter, string primary key (email), 10 columns |
| 5M rows (verified) | 14.3 sec / 350K rows/sec | ~11 sec / 454K rows/sec | Peak verified result — chunk batching reaches full throughput |
| 10M rows (estimated) | ~28 sec / 357K rows/sec | ~21 sec / 476K rows/sec | Estimated from 5M trajectory. In testing, requires ~2-3GB browser RAM on 32GB machine. |
Results vary by hardware, browser, and file complexity. Line-by-line values are estimated. Performance improves at scale due to chunk pipeline optimization — the 50K-row batching pipeline reaches full throughput at 1M+ rows.
Test Methodology
How It Achieves 350K Rows/Sec
Streaming 50K-Row Chunks
Neither file is ever fully loaded into memory. Both files are read in 50,000-row chunks — input data is processed and discarded per chunk. However, the primary key Map and diff result arrays do grow with row count (O(n)), so peak memory scales with dataset size, not raw file size on disk. See the Memory Efficiency section below for tested numbers.
Map-Based O(n) Indexing
Primary key mode builds a JavaScript Map keyed by the primary key value. Map lookups are O(1) — each row in File B is looked up exactly once, regardless of dataset size. Total comparison is O(n) where n is the number of rows, not O(n²) like naive nested-loop approaches.
Web Worker Isolation
All processing runs in a dedicated Web Worker thread. The browser UI stays fully responsive during 10M+ row operations. Progress updates stream every 100ms — you see row counts and percent complete while the comparison runs, without any UI blocking.
Zero Network Transmission
File reading, parsing, indexing, comparison, and result generation all happen inside the browser sandbox. No data leaves the device at any point. This is not a proxy model or edge function — the JavaScript engine running in the browser tab is the only compute involved.
Memory Efficiency
ROI Calculator
Baseline: ~20 min per manual Excel VLOOKUP comparison session
Estimate based on 20-min manual VLOOKUP workflow vs 30-sec SplitForge workflow. Individual results vary.
Reproduce This Benchmark
These results are reproducible. Here's exactly how.
Generate test files
Create two CSV files (File A = baseline, File B = modified). Schema: columns id, name, email, value, status, updated_at — mixed string and numeric types.
For the 5M row benchmark: modify ~5% of rows in File B (change value and status columns), add 0.5% new rows, delete 0.5%.
Python generator (pandas):df = pd.DataFrame({ 'id': range(5_000_000), ... }) df.to_csv('file_a.csv', index=False)# Modify 5% of rows → file_b.csv
Run the comparison
Verify zero uploads: Open DevTools → Network tab before clicking Compare. No requests containing file contents will appear — all processing is local.
Hardware note: Our test machine (Chrome 132, Windows 11, i7-12700K, 32GB RAM) produced 14.3 sec for 5M rows. A 16GB laptop may be 20-30% slower — the comparison step still completes; absolute time varies. The structural advantage over Excel VLOOKUP holds regardless of hardware.
Honest Limitations: Where Falls Short
No tool is perfect for every use case. Here's where might be a better choice, and the real limitations of our browser-based architecture.
Browser-Based Processing
Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.
No Offline Mode (Initial Load)
Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.
Browser Tab Memory Limits
Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.
Memory-Bound at 10M+ Rows
In internal testing (32GB RAM, Chrome 132), 10M row comparisons required ~2-3GB browser RAM. Machines with less than 8GB total RAM (or browsers with restricted memory) may fail with an out-of-memory error. The 5M row benchmark is achievable on most modern laptops. For 10M+ rows, use a machine with 16GB+ RAM.
Line-by-Line Requires Identical Sort Order
Line-by-line mode compares rows by position — row 1 vs row 1, row 2 vs row 2. If the files are sorted differently, this produces incorrect results. If sort order might differ between files, always use Primary Key mode instead.
CSV and TSV Only
SplitForge CSV Compare accepts .csv and .tsv files. Excel .xlsx files must be converted to CSV first using the Excel to CSV Converter tool. JSON, Parquet, and database format comparisons are not supported.
No Persistent History
Comparison results are not saved between browser sessions. To preserve a diff report, export to CSV or JSON immediately after the comparison completes. Closing or refreshing the tab discards in-memory results.
No Automation or Scheduling
SplitForge is a manual browser tool — not a CLI, API, or pipeline component. It cannot be run on a schedule, triggered by webhooks, or integrated into CI/CD workflows. For automated comparisons, use Python pandas merge, dbt tests, or a database-level diff query.
Questions about limitations? Check our FAQ section below or contact us via the feedback button.