Question 1

How was the 72,043 rows/sec figure measured?

Accepted Answer

Chrome stable, Windows 11, Intel i5-12600KF, 64GB RAM, February 2026. 10 runs of the ANALYZE_WORKBOOK operation on a 1M-row XLSX file with mixed data types (~5% nulls, ~3% near-duplicates). Highest and lowest runs discarded. Remaining 8 averaged. 1M rows ÷ 13.88 seconds = 72,043 rows/sec.

Question 2

Why does my file process slower than the benchmark?

Accepted Answer

Variability is ±15–25%. Common causes: older CPU or less RAM, files with many merged cells or complex formula chains, high duplicate density triggering more comparison work, other browser tabs consuming memory, or antivirus software intercepting File API reads.

Question 3

What is the maximum file size supported?

Accepted Answer

Up to approximately 1GB — limited by browser memory available to the Web Worker. On systems with 16GB+ RAM running Chrome on 64-bit OS, 1GB files process reliably. For files over 700MB, close other tabs to free memory.

Question 4

Why is fuzzy deduplication so much slower than everything else?

Accepted Answer

Fuzzy deduplication uses Levenshtein distance — an O(n²) algorithm. Each row must be compared to every other row. At 10K rows that's 100M comparisons; at 100K rows, 10 billion. SplitForge uses blocking to reduce this but the worst-case complexity remains O(n²).

Question 5

Does the browser tab need to stay open during processing?

Accepted Answer

Yes. The Web Worker runs inside the browser tab context. If you navigate away or close the tab, processing stops. Background tab throttling in Chrome may slow processing if the tab is not in focus. Keep the tab active for best performance on large files.

Question 6

How does performance compare to Python pandas?

Accepted Answer

Python pandas reads and processes data faster than browser-based tools for pure computation — pandas has direct memory access and numpy vectorization. SplitForge's advantage is zero setup, no upload, and GUI-based workflow. For files where privacy prevents cloud upload and you don't want to write code, SplitForge is the practical alternative.

Question 7

Can I reproduce these benchmarks on my own machine?

Accepted Answer

Yes. Drop any 1M-row XLSX file into the tool, open Chrome DevTools Performance tab, record during the analysis pass, and check the total time. Results will vary by hardware — the documented configuration is Intel i5-12600KF, 64GB RAM, Chrome stable, Windows 11.

Question 8

What happens to performance when running multiple operations together?

Accepted Answer

Operations are additive — running three operations adds their overhead factors together (approximately). Running Strip Formatting (1.3×) + Normalize Dates (2.0×) + Standard Dedup (4.0×) would be roughly 7× the baseline analysis time. Fuzzy dedup dominates any combined run.

Dataset	Analysis time	Rows/sec	Source
10K rows	0.139s	71.9K/sec	Calculated from baseline
50K rows	0.69s	72.5K/sec	Calculated from baseline
100K rows	1.39s	71.9K/sec	Calculated from baseline
500K rows	6.9s	72.5K/sec	Calculated from baseline
1M rows	13.88s	72.0K/sec	Verified (10-run avg)

Operation	Overhead factor	Speed class	Est. time (1M rows)
Analyze workbook (baseline)	1×	fastest	~14s
Remove empty rows/columns	1.1×	fastest	~15s
Trim whitespace	1.2×	fastest	~17s
Strip cell formatting	1.3×	fastest	~18s
Flatten formulas	1.5×	fast	~21s
Normalize date formats	2×	fast	~28s
Normalize data types	2.2×	fast	~31s
Remove merged cells	2.5×	fast	~35s
Conditional rules engine	3.5×	moderate	~49s
Standard deduplication¹	4×	moderate	~56s
Fuzzy deduplication (Levenshtein)¹	40×	slow	15–90 min

1 Million Excel Rows. 13.88 Seconds. In Your Browser.

When SplitForge Fits — and When It Doesn't

Analysis Speed by Dataset Size

Operation Overhead Relative to Analysis

Real-World Scenario Timings

Verify It Yourself in DevTools

Test Methodology

Hardware Configuration

Test Protocol

Time Value Calculator

Performance Limitations

Browser Memory Cap (~1GB)

Fuzzy Dedup Slows at 100K+ Rows

No Batch or Automation Support

External Formula References Cannot Be Flattened

Performance Questions