Verified Feb 202672,043 rows/sec10-run average

1 Million Excel Rows. 13.88 Seconds. In Your Browser.

No server round trip. No upload wait. No install. Just a Web Worker doing the work on your machine — verifiably, auditably, in DevTools.

Chrome stable · Windows 11 · Intel i7-12700K · 32GB RAM · February 2026 · Analysis only

72,043
rows/sec (analysis)
Verified baseline
13.88s
1M rows — analysis
Chrome · i7-12700K
0 sec
Upload time
No server involved
±15–25%
Performance variability
By hardware/file

When SplitForge Fits — and When It Doesn't

SplitForge performs well when...
One-time or monthly Excel cleaning — not automated pipelines
Files up to ~1GB / 1M+ rows on modern hardware
Sensitive files that cannot leave your device
Cleaning before import — not ongoing real-time transforms
Standard operations (dedup, date normalize, trim, merge cell fix)
Teams who need a reusable cleaning preset without code
Use a different tool when...
Automated pipelines or CI/CD workflows → Python openpyxl / pandas
Files over 2M rows per sheet → AWS Glue or Apache Spark
Fuzzy clustering on 500K+ rows (tight deadline) → OpenRefine
Multiple analysts processing simultaneously → dbt or ETL orchestrator
Long-running fuzzy clustering / deduplication jobs → OpenRefine
Tableau-native ETL and workflow orchestration → Tableau Prep

Analysis Speed by Dataset Size

ANALYZE_WORKBOOK operation only (issue detection + column profiling). Cleaning operations add overhead — see Operation Overhead section below.

10K50K100K500K1M0s4s8s12s16s0.139s0.69s1.39s6.9s13.88s
Verified — 10-run average, Feb 2026
Calculated from 72,043 rows/sec baseline
DatasetAnalysis timeRows/secSource
10K rows0.139s71.9K/secCalculated from baseline
50K rows0.69s72.5K/secCalculated from baseline
100K rows1.39s71.9K/secCalculated from baseline
500K rows6.9s72.5K/secCalculated from baseline
1M rows13.88s72.0K/secVerified (10-run avg)

Test configuration: Chrome stable, Windows 11, Intel i7-12700K, 32GB RAM, February 2026. 10 runs per dataset size, highest and lowest discarded, remaining 8 averaged. ANALYZE_WORKBOOK operation only — no cleaning applied. File: standard XLSX with mixed text, numeric, and date columns. Results vary by hardware, browser, and file complexity (±15–25%).

Operation Overhead Relative to Analysis

How much longer each cleaning operation takes compared to the analysis baseline (1.0×). Operations are additive — running three operations multiplies accordingly.

15×30×45×Analyze workbook(baseline)Remove emptyrows/columnsTrim whitespaceStrip cell formattingFlatten formulasNormalize date formatsNormalize data typesRemove merged cellsConditional rulesengineStandard deduplicationFuzzy deduplication(Levenshtein)1.1×1.2×1.3×1.5×2.2×2.5×3.5×40×
Fastest (1–1.5×)
Fast (1.5–3×)
Moderate (3–5×)
Slow (5×+)
OperationOverhead factorSpeed classEst. time (1M rows)
Analyze workbook (baseline)fastest~14s
Remove empty rows/columns1.1×fastest~15s
Trim whitespace1.2×fastest~17s
Strip cell formatting1.3×fastest~18s
Flatten formulas1.5×fast~21s
Normalize date formatsfast~28s
Normalize data types2.2×fast~31s
Remove merged cells2.5×fast~35s
Conditional rules engine3.5×moderate~49s
Standard deduplication1moderate~56s
Fuzzy deduplication (Levenshtein)140×slow15–90 min

1 Standard and fuzzy deduplication overhead factors measured from internal testing, February 2026 (same hardware and methodology as analysis baseline). Fuzzy dedup uses Levenshtein distance — O(n²) worst case. Blocking by prefix reduces comparisons but does not change asymptotic complexity. Actual timing varies significantly by data entropy and threshold setting (0.50–0.99). All other operation factors are internal estimates based on profiling; treat as indicative ranges, not guaranteed values.

Real-World Scenario Timings

Representative workflow scenarios with actual measured times. Not customer case studies. Times reflect mixed operation sets including fuzzy matching where applicable — analysis-only speeds are higher.

Finance: Vendor Ledger
47s
380,000 rows · 8,085 eff. rows/sec
Remove merged cells
Normalize dates
Fuzzy dedup (0.85)
Includes fuzzy dedup — analysis-only speed is higher
Healthcare: Patient Registry
72s
520,000 rows · 7,222 eff. rows/sec
Normalize data types
Split columns
Trim whitespace
Standard dedup
Mixed operations; no fuzzy dedup — faster than vendor scenario
E-commerce: Product Catalog
38s
750,000 rows · 19,737 eff. rows/sec
Strip formatting
Find & replace
Exact dedup
Conditional rules
No fuzzy dedup — faster per-row than smaller fuzzy scenarios

Seen enough? Drop your file in.

No account · No upload · Health report in 10 seconds.

Verify It Yourself in DevTools

Open Chrome DevTools before you drop a file in. Go to Network. Filter by "XHR" or "All". Drop your file. Run a full clean. You will see zero outbound requests to any external endpoint. The file is read by the browser's File API, processed in a Web Worker thread, and never serialized to any network call.

No server endpoint exists in this tool's architecture.

DevTools screenshots pending production capture. You can run this verification yourself on any file — the architecture makes the claim auditable by anyone.

Test Methodology

Full Benchmark Methodology — February 2026
Hardware, protocol, test file specs, and variability disclosure

Hardware Configuration

Browser:Chrome stable (latest as of Feb 2026)
OS:Windows 11 (64-bit)
CPU:Intel Core i7-12700K
RAM:32GB DDR5
Storage:NVMe SSD
Network:Offline (no network dependency)

Test Protocol

Runs per test:10 (highest + lowest discarded, 8 averaged)
Operation:ANALYZE_WORKBOOK only (no cleaning applied)
File format:Standard .xlsx (Office Open XML)
Data types:Mixed: text, integer, float, date columns
Null rate:~5% nulls distributed across columns
Duplicates:~3% near-duplicate rows (Levenshtein 0.85)

Variability disclosure: Results vary by hardware, browser, and file complexity (±15–25%). Older CPUs, systems with less RAM, or files with more complex data structures (deep nesting, many merged regions, high duplicate density) will see higher times. The 1M-row verified benchmark represents a modern hardware configuration — treat as an upper-bound reference, not a guaranteed result.

Time Value Calculator

Estimate the time and cost difference between manual cleaning and SplitForge.

4
2h
$55
8.0h
Manual hrs/month
2.7m
Tool mins/month
8.0h
Hours saved/month
$5,251
Saved/year

Assumes ~40 seconds per file (100K-row standard clean, no fuzzy dedup). Actual time varies by file size, operations selected, and hardware.

Performance Limitations

Browser Memory Cap (~1GB)

High impact

Browser memory is capped at roughly 1GB for Web Worker processes on most systems. A 500MB .xlsx file requires ~500MB of RAM before processing begins. Files approaching or exceeding this limit may cause out-of-memory errors.

Mitigation: Close other tabs to free memory. For files over 700MB, use Excel Splitter to break the workbook into smaller chunks, clean each piece, then reassemble with Excel Sheet Merger.

Fuzzy Dedup Slows at 100K+ Rows

High impact

Fuzzy deduplication is an O(n²) algorithm. At 100K rows: ~10B comparisons. At 500K rows: ~250B comparisons. Expect 2–15 minutes for 500K rows, potentially hours for larger datasets depending on data entropy and threshold.

Mitigation: Run standard (exact) deduplication first using Remove Duplicates to reduce dataset size. Then apply fuzzy matching to the smaller remaining set. For CSV files, Remove Duplicates supports the same fuzzy algorithm with a dedicated progress view.

No Batch or Automation Support

Medium impact

SplitForge is a manual, one-file-at-a-time tool. No API, no CLI, no file watcher, no scheduled runs. It cannot be integrated into CI/CD pipelines, cron jobs, or server-side workflows.

Mitigation: For automated batch cleaning, use Python + openpyxl or pandas. Export your SplitForge cleaning recipe as a workflow JSON file first — it documents exactly which operations to replicate in code.

External Formula References Cannot Be Flattened

Medium impact

Formulas referencing other workbooks (e.g., =[Budget.xlsx]Sheet1!A1) cannot be evaluated or flattened — the referenced files are not available in the browser context. Cells with unresolvable references will be cleared or preserved as-is depending on your settings.

Mitigation: Resolve external references in Excel before uploading. If you only need to preview file structure without cleaning, try Excel Preview which reads workbook metadata without attempting formula resolution.

Performance Questions

The Benchmark Is Public. Reproduce It.

Drop your own file in. Open DevTools. Time it yourself. That's the level of transparency we're building toward.

72,043 rows/sec analysis — verified, documented, reproducible
Zero bytes uploaded — auditable in Chrome DevTools Network tab
Web Worker architecture — UI stays responsive during processing
Full methodology documented — hardware, protocol, variability

Also try: Excel Splitter · Data Masking · Data Profiler · Remove Duplicates · View All Features