When SplitForge Fits — and When It Doesn't
Analysis Speed by Dataset Size
ANALYZE_WORKBOOK operation only (issue detection + column profiling). Cleaning operations add overhead — see Operation Overhead section below.
| Dataset | Analysis time | Rows/sec | Source |
|---|---|---|---|
| 10K rows | 0.139s | 71.9K/sec | Calculated from baseline |
| 50K rows | 0.69s | 72.5K/sec | Calculated from baseline |
| 100K rows | 1.39s | 71.9K/sec | Calculated from baseline |
| 500K rows | 6.9s | 72.5K/sec | Calculated from baseline |
| 1M rows | 13.88s | 72.0K/sec | Verified (10-run avg) |
Test configuration: Chrome stable, Windows 11, Intel i7-12700K, 32GB RAM, February 2026. 10 runs per dataset size, highest and lowest discarded, remaining 8 averaged. ANALYZE_WORKBOOK operation only — no cleaning applied. File: standard XLSX with mixed text, numeric, and date columns. Results vary by hardware, browser, and file complexity (±15–25%).
Operation Overhead Relative to Analysis
How much longer each cleaning operation takes compared to the analysis baseline (1.0×). Operations are additive — running three operations multiplies accordingly.
| Operation | Overhead factor | Speed class | Est. time (1M rows) |
|---|---|---|---|
| Analyze workbook (baseline) | 1× | fastest | ~14s |
| Remove empty rows/columns | 1.1× | fastest | ~15s |
| Trim whitespace | 1.2× | fastest | ~17s |
| Strip cell formatting | 1.3× | fastest | ~18s |
| Flatten formulas | 1.5× | fast | ~21s |
| Normalize date formats | 2× | fast | ~28s |
| Normalize data types | 2.2× | fast | ~31s |
| Remove merged cells | 2.5× | fast | ~35s |
| Conditional rules engine | 3.5× | moderate | ~49s |
| Standard deduplication1 | 4× | moderate | ~56s |
| Fuzzy deduplication (Levenshtein)1 | 40× | slow | 15–90 min |
1 Standard and fuzzy deduplication overhead factors measured from internal testing, February 2026 (same hardware and methodology as analysis baseline). Fuzzy dedup uses Levenshtein distance — O(n²) worst case. Blocking by prefix reduces comparisons but does not change asymptotic complexity. Actual timing varies significantly by data entropy and threshold setting (0.50–0.99). All other operation factors are internal estimates based on profiling; treat as indicative ranges, not guaranteed values.
Real-World Scenario Timings
Representative workflow scenarios with actual measured times. Not customer case studies. Times reflect mixed operation sets including fuzzy matching where applicable — analysis-only speeds are higher.
Verify It Yourself in DevTools
Open Chrome DevTools before you drop a file in. Go to Network. Filter by "XHR" or "All". Drop your file. Run a full clean. You will see zero outbound requests to any external endpoint. The file is read by the browser's File API, processed in a Web Worker thread, and never serialized to any network call.
No server endpoint exists in this tool's architecture.
DevTools screenshots pending production capture. You can run this verification yourself on any file — the architecture makes the claim auditable by anyone.
Test Methodology
Hardware Configuration
Test Protocol
Variability disclosure: Results vary by hardware, browser, and file complexity (±15–25%). Older CPUs, systems with less RAM, or files with more complex data structures (deep nesting, many merged regions, high duplicate density) will see higher times. The 1M-row verified benchmark represents a modern hardware configuration — treat as an upper-bound reference, not a guaranteed result.
Time Value Calculator
Estimate the time and cost difference between manual cleaning and SplitForge.
Assumes ~40 seconds per file (100K-row standard clean, no fuzzy dedup). Actual time varies by file size, operations selected, and hardware.
Performance Limitations
Browser Memory Cap (~1GB)
Browser memory is capped at roughly 1GB for Web Worker processes on most systems. A 500MB .xlsx file requires ~500MB of RAM before processing begins. Files approaching or exceeding this limit may cause out-of-memory errors.
Fuzzy Dedup Slows at 100K+ Rows
Fuzzy deduplication is an O(n²) algorithm. At 100K rows: ~10B comparisons. At 500K rows: ~250B comparisons. Expect 2–15 minutes for 500K rows, potentially hours for larger datasets depending on data entropy and threshold.
No Batch or Automation Support
SplitForge is a manual, one-file-at-a-time tool. No API, no CLI, no file watcher, no scheduled runs. It cannot be integrated into CI/CD pipelines, cron jobs, or server-side workflows.
External Formula References Cannot Be Flattened
Formulas referencing other workbooks (e.g., =[Budget.xlsx]Sheet1!A1) cannot be evaluated or flattened — the referenced files are not available in the browser context. Cells with unresolvable references will be cleared or preserved as-is depending on your settings.
Performance Questions
How was the 72,043 rows/sec figure measured?
Why does my file process slower than the benchmark?
What is the maximum file size supported?
Why is fuzzy deduplication so much slower than everything else?
Does the browser tab need to stay open during processing?
How does performance compare to Python pandas?
Can I reproduce these benchmarks on my own machine?
What happens to performance when running multiple operations together?
The Benchmark Is Public. Reproduce It.
Drop your own file in. Open DevTools. Time it yourself. That's the level of transparency we're building toward.
Also try: Excel Splitter · Data Masking · Data Profiler · Remove Duplicates · View All Features