Chrome 131, Windows 11, 16GB RAM, Intel i7-12700K · Results vary by hardware, browser, and file complexity.
Dataset
Time
Rows/sec
File Size
Status
94K rows
Small file baseline
0.4s
235K/s
~9 MB
Tested
1.5M rows
Mid-scale dataset
14.85s
101K/s
~145 MB
Tested
5M rows
Large-scale analysis
53.45s
94K/s
~485 MB
Verified
10M rows
Maximum verified capacity
106.83s
94K/s
1.26 GB
Verified
Scaling Visualized
Chrome 131, Windows 11, 16GB RAM, Intel i7-12700K. Throughput stabilizes after ~1.5M rows as parsing and analysis overhead converges.
Processing Time (seconds)
Elapsed time grows near-linearly with row count
Throughput (K rows/sec)
Stabilizes ~93–101K/s after initial parse overhead
When Performance Degrades
Our benchmarks use mixed-type files representing typical real-world exports. Specific data characteristics will push processing time above or below these numbers. Here's what to expect.
High Cardinality ColumnsModerate slowdown
Columns with millions of unique string values increase memory allocation for distinct-value counting. A 10M-row file where every row has a unique UUID in a string column will profile ~15% slower than a numeric-only file.
Wide Files (100+ Columns)Significant slowdown
Each additional column adds type detection, statistics, and histogram passes. A 1M-row file with 200 columns will take roughly 3–4× longer than a 1M-row file with 15 columns. Column count matters more than row count at scale.
String-Heavy DatasetsModerate slowdown
Numeric columns process faster than string columns because type inference and statistical calculations are cheaper. Free-text columns (product descriptions, notes fields) add overhead due to whitespace detection and cardinality analysis.
Available Browser MemoryHard ceiling
Browser memory is the practical ceiling. The profiler streams data in chunks to stay within limits, but very large files on machines with <8GB RAM may trigger garbage collection pauses, increasing elapsed time by 20–40%.
Quote-Aware ParsingMinor slowdown
Files with quoted fields containing embedded delimiters require RFC 4180 parsing (~15% slower than simple split). Most real-world CSVs export with some quoted fields, so our benchmark files include a representative mix.
All 11 Analysis Types
Every type runs in a single pass — no re-scanning the file.
Duplicate detection uses FNV-1a hashing. Foreign key candidates identified by value overlap percentage.
Pearson Correlation
Pearson r coefficients across all numeric column pairs.
Classified by strength: weak (<0.5), moderate (0.5–0.7), strong (>0.7). Top 10 significant pairs shown.
ML Anomaly Detection
Isolation Forest algorithm identifies statistically anomalous rows across all numeric columns.
Anomaly score 0–1. Moderate >0.65, High >0.7, Critical >0.8. No labelled training data required.
Time Series Patterns
Trend detection (increasing / decreasing / stable), data frequency, and gap analysis for date columns.
Auto-detects daily, weekly, monthly patterns. Reports gaps with start/end dates and day counts.
Client-Side Architecture
All 10 analysis types run in Web Workers inside your browser. Zero server communication.
Designed to avoid server transmission of PHI or PII. No network requests made during file processing. Compliance with your organization's specific data handling policies remains your responsibility.
Mathematically Verified Accuracy
Statistics were verified against closed-form formulas. For a sequential ID column (1 to 10,000,000), the correct sum is n(n+1)/2. The profiler produces the exact value — no rounding, no approximation.
Formula: n(n+1)/2
50,000,005,000,000
Profiler Sum
50,000,005,000,000
Expected Mean
5,000,000.5
Profiler Mean
5,000,000.5
Why Client-Side Processing Matters for Performance
Cloud profiling tools are bottlenecked by upload time before processing even starts. A 500MB file at 100 Mbps takes 40 seconds to upload — before analysis begins. SplitForge reads directly from your local disk via the browser FileReader API. No network round-trip. The clock starts immediately.
Web Workers offload all analysis to a background thread, keeping the UI responsive during long profiling runs. The streaming architecture processes data in chunks — memory usage stays bounded regardless of file size, which is why 1.26 GB files complete without crashing the tab.
Benchmark methodology: All tests run on Chrome 131 (stable), Windows 11, Intel i7-12700K, 16GB DDR4 RAM. Files contained mixed data types (integers, floats, strings, dates) to reflect real-world conditions. Each benchmark was run three times; times reported are the median. Results vary by hardware, browser version, available RAM, and file data complexity. Your results may differ.
Free · No account required · Designed to avoid server transmission of PHI · Files never leave your browser
Benchmarks run February 2026 · SplitForge v2.1 · Chrome 131, Windows 11, Intel i7-12700K, 16GB DDR4 RAM Median of 3 runs per dataset size · Mixed data types (integers, floats, strings, dates) Next scheduled re-benchmark: May 15, 2026