Engine v2.3 · Last updated: February 26, 2026 · Intel i7-12700K · Chrome 131 · Windows 11

Data Cleaner Performance Benchmarks

Smart Clean All (trim whitespace + remove empty rows/columns + deduplication) processes 10 million rows in 23 seconds (~435K rows/sec). Here's the full methodology, per-operation breakdown, and what affects speed on your machine.

Two performance modes: Trim Whitespace only runs at ~1.2M rows/sec (8.3s for 10M rows). Smart Clean All (trim + empty removal + deduplication) runs at ~435K rows/sec because it executes three operations in one pass. Results vary by hardware, browser, and file complexity.

Test Configuration

Hardware & Software

CPUIntel Core i7-12700K (12-core, 3.6GHz base / 5.0GHz boost)
RAM32GB DDR4-3200 (dual-channel)
StorageSamsung 970 EVO NVMe SSD (read: 3,500 MB/s)
OSWindows 11 Pro (22H2)
BrowserChrome 131 (stable), single tab, extensions disabled
DevToolsClosed during all tests (no observer overhead)

Test File Specifications

Row count100K · 1M · 10M (three separate files)
Columns15 columns (mix of text, numeric, date, email)
Data type split40% text, 30% numeric, 20% date, 10% email
File size100K rows: ~8MB · 1M rows: ~82MB · 10M rows: ~820MB
EncodingUTF-8, comma-delimited, CRLF line endings
Duplicates~8% duplicate rows injected for dedup tests
NBSP injected~5% of text cells contain non-breaking spaces
Empty rows/cols~3% empty rows, 2 fully empty columns injected
Methodology: Each operation run 10 times. Highest and lowest values discarded. Remaining 8 runs averaged. Browser tab opened fresh before each test session. System idle (no background processes). Time measured from file drop → "Download ready" notification via performance.now() in the Web Worker.

What the Results Look Like

[Screenshot: Smart Clean All progress bar — 10M row file loading in Chrome. Shows "Processing 10,000,000 rows..." progress overlay with percentage counter and animated spinner. Browser tab title updating live.]
Smart Clean All progress overlay — UI stays fully responsive during processing
[Screenshot: Smart Clean All completion toast — "Cleaned! Removed 823K duplicates, 31K empty rows, trimmed whitespace in 23.1 seconds. 9,146,012 rows remaining." Download button prominent.]
Completion summary: rows removed, time elapsed, download prompt
[Screenshot: Chrome DevTools Network panel during active cleaning — zero requests in-flight. Filter: All. No XHR/Fetch/WebSocket. Status bar reads "0 requests". Confirms client-side-only processing.]
DevTools Network tab during cleaning: zero network requests — data never leaves browser
[Screenshot: Chrome DevTools Performance timeline — Main Thread idle (green). Web Worker thread active (purple/yellow bursts). Worker handles all processing while main thread stays free for UI interaction.]
Web Worker architecture: heavy processing on background thread, UI thread stays free
[GIF: Smart Clean All — 10M row file. Drag and drop → progress bar fills in 23s → completion toast → "9.1M rows" result displayed → Download CSV button. Entire sequence. 30 seconds, Chrome, Windows 11.]
[GIF: Dedupe by Columns — Column picker opens, "Email" checkbox selected → Apply → "823,441 duplicate emails removed in 12.1 seconds" toast. Shows real-time chip display before/after.]
[GIF: Advanced Filter + AND/OR — Add "Status = Active" filter chip, toggle AND, add "Revenue > 50000" filter chip, row count updates in real-time from 1M → 127K rows. Export filtered result.]

Operation Benchmarks — 10M Row File

🧹 Cleaning Operations (10M rows)

Smart Clean All (trim + empty + dedupe)23s
~435K rows/sec · 3 operations combined
Remove Duplicate Rows (full-row hash)14.7s
~680K rows/sec · SHA-256 hash per row
Dedupe by Specific Columns12.1s
~827K rows/sec · hash on selected columns only
Trim Whitespace (incl. NBSP)8.3s
~1.2M rows/sec · Unicode-aware regex
Standardize Text Case6.1s
~1.6M rows/sec · Title/UPPER/lower/Sentence
Replace Empty Values4.2s
~2.4M rows/sec · cell-by-cell null check
Remove Empty Rows3.8s
~2.6M rows/sec · all-empty row scan
Remove Empty Columns1.1s
~9.1M rows/sec · column-level scan (1 pass)

🔎 Filter Operations (10M rows)

Quick Search (all columns)7.8s
~1.3M rows/sec · substring match all columns
Column filter: contains5.2s
~1.9M rows/sec · single column
Column filter: regex pattern9.4s
~1.1M rows/sec · compiled RegExp object
Column filter: date range5.8s
~1.7M rows/sec · Date.parse comparison
Column filter: numeric between4.6s
~2.2M rows/sec · float comparison
Detect column types (all cols)1.2s
~8.3M rows/sec · first 1,000 rows sampled
Filter note: Filter operations run on parsed data already in memory. If data hasn't been parsed yet, add ~9–12 seconds for PapaParse streaming parse on a 10M row file. Subsequent filter operations on the same loaded dataset use in-memory arrays.
All timings: Intel Core i7-12700K, 32GB DDR4-3200, Chrome 131 stable, Windows 11, February 2026. Median of 8 runs (top/bottom discarded). 10M row file, ~820MB CSV, 15 columns. Results vary by hardware, browser, and file complexity.

Scalability Across File Sizes

Operation100K rows (~8MB)1M rows (~82MB)10M rows (~820MB)Throughput (10M)
Smart Clean All0.26s2.4s23.0s~435K rows/sec
Trim Whitespace (+ NBSP)0.09s0.82s8.3s~1.2M rows/sec
Remove Duplicate Rows0.16s1.5s14.7s~680K rows/sec
Dedupe by Columns0.13s1.2s12.1s~827K rows/sec
Standardize Case (all cols)0.07s0.61s6.1s~1.6M rows/sec
Replace Empty Values0.05s0.42s4.2s~2.4M rows/sec
Remove Empty Rows0.04s0.38s3.8s~2.6M rows/sec
Remove Empty Columns0.01s0.11s1.1s~9.1M rows/sec
Quick Search (all cols)0.09s0.78s7.8s~1.3M rows/sec
Column filter: contains0.06s0.52s5.2s~1.9M rows/sec
Column filter: regex0.11s0.94s9.4s~1.1M rows/sec
Export to CSV0.02s0.19s1.9s~5.3M rows/sec
Export to Excel (.xlsx)0.8s7.2s72s*~139K rows/sec
* Excel export at 10M rows uses SheetJS (xlsx library) which has higher memory overhead than CSV export. For files over 5M rows, CSV export is recommended for speed. All timings are wall-clock time from operation start to result available. Parse time not included for operations other than "Smart Clean All" (which requires a fresh parse).
RAM ceiling: Processing 10M rows into a JavaScript array requires approximately 2–4GB of browser-accessible memory. Chrome on an 8GB RAM machine (with OS and other apps running) typically handles 5–8M rows before hitting memory pressure and slowing down. A 16GB machine handles 10M rows comfortably. 32GB+ handles 15–20M rows. If you hit a ceiling, use the CSV Splitter to process in chunks, then merge the cleaned results.

Data Cleaner vs Alternatives (10M Rows, Smart Clean All)

Context matters. These numbers show what alternatives take for a comparable "full clean" pass on a 10M row file.

Tool10M Row Clean (full pass)Setup / Learning CurveUpload RequiredNBSP DetectionRegex Filtering
Data Cleaner23 secondsZero — open browser, drop fileNeverAuto-detectedVisual builder + templates
Excel (formulas)Crashes above ~1M rows0 (already installed)NeverTRIM() misses NBSPVBA only
OpenRefine4–8 min (Java app, local)15–30 min install + learningNever (local Java app)Yes (GREL trim)GREL expressions
Python (pandas)45–90 sec (depends on script)Hours (Python + pandas + script)Never (local script)Yes (str.strip + regex)Full regex (str.match)
Cloud CSV tools30–180 sec + upload timeMinutes (sign up, learn UI)Yes — file leaves deviceVaries by toolVaries by tool
Excel and OpenRefine timings based on community benchmarks and internal testing. Python timing assumes a well-written script with vectorized pandas operations on 10M rows, 15 columns. Cloud tool timing includes file upload at 50 Mbps. All estimates for comparable "trim + dedupe + empty removal" workflow. Results vary by hardware, network speed, and file complexity.

Smart Clean All: Operation Overhead Breakdown (10M rows)

Time Breakdown (Total: 23.0 seconds)

PapaParse streaming (CSV parse)9.1s (39.6%)
Reads 820MB file in chunks via streaming API. Time scales linearly with file size.
Trim whitespace (NBSP-aware regex)5.8s (25.2%)
/^[\s\u00A0]+|[\s\u00A0]+$/g applied to every cell. 15 cols × 10M rows = 150M string operations.
Remove empty rows/columns3.4s (14.8%)
Row scan (O(n×cols)) + column scan (O(cols×n)). Low overhead, very fast.
Duplicate detection (hash-based)4.7s (20.4%)
Concatenates all column values per row into a string key, stores in Set. O(n) time, O(n) memory.

Worker Architecture Details

Web Worker setup: Data Cleaner uses a dedicated Web Worker (dataCleanerWorker.worker.js) that loads PapaParse 5.3.2 via CDN importScripts and 5 modular operation handlers (parse, clean, filter, detectColumnTypes, export).

Message protocol: UI sends {id, operation, payload} envelope. Worker routes to handler via operation registry. Each handler posts progress messages every 50,000 rows and a final{type:'complete', result} message.

Memory model: PapaParse streaming parse (not bulk parse) — reads the file in chunks without loading the entire CSV string into memory first. The parsed rows array for 10M rows requires ~2–4GB RAM depending on column width. Browsers with less than 8GB available RAM may run slowly or crash on 10M row files.

Undo history: Each cleaning operation pushes the previous data array onto an undo stack. On a 10M row dataset, each undo step requires a full copy of the data (~1–2GB). Undo is limited by available RAM. Use Reset to return to the original without undo stack overhead.

Keyboard shortcuts: Ctrl+Z triggers undo via the Worker without re-parsing the file. Ctrl+S triggers the export handler which serializes data to CSV/Excel format.

When Data Cleaner Is Slower Than Expected

2–5× slower or browser crash

Low-RAM machine (under 8GB available)

Why: A 10M row dataset with 15 columns requires 2–4GB of working memory for the parsed array plus undo history. On machines with 8GB total RAM and other apps running, the OS may start swapping to disk. Symptom: progress bar stalls around 60–70%.

Fix: Close other browser tabs and apps. Try with a 1M row split first using the CSV Splitter tool.
2–8× slower filter speed

Regex filter with backtracking patterns

Why: Poorly formed regex like /.*(a+)+b/ can cause catastrophic backtracking — exponential time on large text fields. Safe patterns like email validation or ^prefix are fast.

Fix: Use the built-in regex templates (email, phone, URL, ZIP) which are pre-optimized. Test regex on 10K rows first before applying to 10M.
1.5–3× slower per column

Wide files (100+ columns)

Why: Trim whitespace and case standardize iterate over all columns. A file with 150 columns requires 10× more string operations than one with 15 columns. Smart Clean All time scales roughly linearly with column count.

Fix: Use Column Operations to select/remove columns first, then clean the reduced-column file.
1.5–2.5× slower than Chrome

Safari (macOS / iOS)

Why: Safari's JavaScript engine (JavaScriptCore) has lower Web Worker throughput than Chrome's V8 on CPU-bound string operations. PapaParse streaming parse is also slower on Safari due to different FileReader implementation.

Fix: For best performance on macOS, use Chrome or Firefox. Results vary by Safari version — newer versions (17+) are faster.
30–40× slower than CSV export

Excel (.xlsx) export for large files

Why: Excel export uses SheetJS (xlsx library) which constructs an XML-based .xlsx file in memory. Unlike CSV export (simple string concatenation), xlsx requires building a ZIP archive of multiple XML files. For 10M rows: CSV export takes ~1.9s, Excel export takes ~72s.

Fix: Export as CSV first for processing speed. Convert to Excel after using the Excel Converter if your workflow requires .xlsx.
1.2–2× slower than single filter

Filtering with AND logic and 5+ active filters

Why: AND filter applies each filter predicate in sequence — the dataset is scanned up to N times (once per filter). With 5 filters, this is 5 full passes over the dataset. OR logic is slightly faster because early exits are possible.

Fix: Reduce active filters to the minimum needed. Remove filters that do not meaningfully narrow results. Regex filters are the most expensive — place them last.

Calculate Your Annual Savings

ROI Calculator — Your Exact Savings

3
500K
45 min
$60
Manual hours/year
117 hrs
7020 total minutes
With Data Cleaner
1.3 hrs
~30s avg per file (Smart Clean All)
Annual savings
$6,942
116 hours reclaimed
Estimates based on ~30 seconds average per file with Data Cleaner (Smart Clean All at tested speeds). Actual savings depend on file size, complexity, and which operations you use. Methodology: tested on Intel i7-12700K, 32GB RAM, Chrome, Windows 11, February 2026.

Full Test Methodology

Test Procedure

  1. 1Generate test CSV files using Python script (reproducible random seed). Inject known % of duplicates, NBSP, and empty rows/columns.
  2. 2Open Chrome fresh instance, disable all extensions, close DevTools.
  3. 3Load splitforge.app/tools/data-cleaner in a single tab.
  4. 4Drop test file into the tool. Wait for "Parse complete" signal.
  5. 5Click the operation button. Note start time via performance.now() logged in Worker.
  6. 6Wait for "complete" message. Record wall-clock time from Worker log.
  7. 7Repeat 10 times for each operation. Discard highest and lowest values.
  8. 8Average the remaining 8 values. Round to 1 decimal.
  9. 9Verify result row count against expected (known % duplicates/empty rows).
  10. 10Re-test after Chrome update if version changes.

Reproducibility

This benchmark is independently reproducible. The test file composition, Python generator script (seed 42), and methodology are documented below. Drop any 820MB, 15-column CSV with approximately 8% duplicate rows and 5% NBSP injection into Data Cleaner and run Smart Clean All — you should see results in the 20–28 second range on comparable hardware. Deviations outside that range on similar hardware indicate a performance regression and should be reported.

Test file generation: Python script with fixed random seed (42) generates reproducible test files. The 10M row test CSV is available upon request — contact via the SplitForge site.

Timing precision: Times measured via performance.now() posted from the Web Worker at operation start and operation complete. Precision: sub-millisecond. Reported to nearest 0.1 second.

What's included in timing: For "Smart Clean All" — includes parse time. For individual operations — does not include parse time (data already in memory). Export times — from button click to download dialog appearing.

Machine state during tests: No other browser tabs open. No other applications using significant CPU. System idle for 30 seconds before each test session. Tests run at ambient temperature (not during thermal throttle).

Disclaimer: Results vary by hardware, browser version, OS, available RAM, and data complexity. Wide files (100+ columns), deeply nested data, or files with many formula-like values may be slower. Mobile results typically 3–5× slower than the test hardware.

Benchmark Changelog

v2.3February 2026
  • Added NBSP detection to Trim Whitespace — added ~0.4s overhead at 10M rows but catches 5–12% more whitespace issues
  • Smart Clean All re-tested after Dedupe algorithm updated to hash-based (was sort-based in v2.2) — 14.7s → now included in Smart Clean All path at 23s total
  • Added per-column case transform — no measurable performance change vs all-column transform
  • Column picker for Replace Empty Values: negligible additional overhead (<0.1s)
v2.2November 2025
  • Upgrade to PapaParse 5.3.2 from 5.3.0 — ~8% faster streaming parse on Chrome
  • Dedupe algorithm changed from Array.sort → Set-based hashing — 2.3× faster on 10M rows (was 34s, now 14.7s)
  • Regex filter: pre-compile RegExp objects on filter apply (was re-compiling per row) — 3× faster regex filtering

Known Limitations

Memory ceiling: ~10–15M rows

Parsing 10M rows into a JS array requires 2–4GB of browser-accessible RAM. Chrome's V8 heap limit is typically 4GB on 64-bit systems. Files above 10–15M rows (depending on column width) may cause an out-of-memory crash. Use the CSV Splitter to process in chunks.

Excel export slowdown above 5M rows

SheetJS .xlsx generation at 10M rows takes ~72 seconds due to XML/ZIP overhead. CSV export is always faster (~1.9s). For files over 5M rows, export as CSV and convert separately.

No fuzzy/phonetic deduplication

Duplicate detection uses exact string matching (after optional case normalization). "Jon Smith" and "John Smith" are treated as different records. For fuzzy deduplication, use the dedicated Remove Duplicates tool with fuzzy matching mode.

Mobile performance (tablets, phones)

Mobile CPUs process string operations 3–5× slower than desktop CPUs. Safari iOS has additional Web Worker limitations. For files over 100K rows, desktop is recommended. Mobile works well for files under 50K rows.

Performance FAQs

See These Speeds on Your Own Files

Drop your messy CSV into Data Cleaner and run Smart Clean All. No signup, no upload, no wait.

Smart Clean All
23s @ 10M rows
Trim Whitespace
8.3s @ 10M rows
Dedupe by Column
12.1s @ 10M rows
Intel i7-12700K, 32GB RAM, Chrome 131, Windows 11, Feb 2026. Results vary by hardware, browser, and file complexity.