Navigated to data-profiler-performance
Back to Data Profiler
Verified Benchmark β€” February 2026

10 Million Rows Profiled in 107 Seconds

11 analysis types. 93,604 rows/sec at steady state. 100% mathematically accurate β€” no sampling. Browser-only. File contents never uploaded.

Results measured on Chrome 131, Windows 11, Intel i5-12600KF, 16GB RAM. Your hardware may differ.

235K
rows/sec (94K dataset)
Peak Throughput
94K rows Β· 0.4s
93K
rows/sec (large files)
Steady-State Speed
5M–10M rows
1.26
GB CSV file
Max File Tested
10M rows Β· 106.83s
100
% β€” no sampling
Statistical Accuracy
Mathematically verified
10,000,000
rows profiled in a single session β€” verified benchmark
Processing Time
106.83s
Throughput
93,604/s
File Size
1.26 GB
Accuracy
100%

Verified Performance Results

Chrome 131 Β· Windows 11 Β· Intel i5-12600KF (3.70GHz) Β· 64GB RAM Β· NVMe SSD

Dataset
Time
Rows/sec
File Size
Status
94,465 rows
Typical CRM contact export
0.4s
235K/s
~9 MB
Tested
1,500,000 rows
Medium dataset β€” Salesforce full org export
14.85s
101K/s
~145 MB
Tested
5,000,000 rows
Large dataset β€” enterprise transaction log
53.45s
94K/s
~485 MB
Verified
10,000,000 rows
Maximum tested β€” 1.26 GB CSV file
106.83s
94K/s
1.26 GB
Verified

Performance Charts

Processing time and throughput across tested dataset sizes.

Processing Time vs Dataset Size

Seconds to complete full 11-type analysis

94K1.5M5M10MDataset Size0s30s60s90s120s

Throughput vs Dataset Size

Rows processed per second (K/s)

94K1.5M5M10MDataset Size0K65K130K195K260KSteady state ~93K/s

Performance Variability Factors

Benchmarks were run on a controlled configuration. Real-world performance varies based on these factors:

High-Cardinality String Columns20–40% slower

Columns with many unique string values (e.g., free-text notes, UUID fields) require more memory for frequency counting and top-value tracking. A 10M-row file with 5 high-cardinality text columns may take 140–150 seconds instead of 107.

Very Wide Files (100+ Columns)30–50% slower

All 11 analysis types run per column. A 100-column file requires roughly 10Γ— the correlation matrix computation vs a 10-column file. The benchmark dataset had 11 columns β€” files with 50+ columns will be proportionally slower.

String-Heavy Datasets15–25% slower

Text parsing and pattern matching (for type detection on string values) is slower than numeric operations. Files with mostly text columns β€” names, addresses, descriptions β€” will be on the slower end of the performance range.

Available Browser RAMUp to 2Γ— slower on low-RAM machines

The benchmark used 16GB RAM with Chrome as the primary process. On machines with 8GB RAM and other applications running, large files may trigger garbage collection, significantly slowing throughput. Recommended: 16GB+ RAM for files over 5M rows.

Quoted Fields / Embedded Newlines5–15% slower

CSV files where fields contain commas or newlines (requiring quote-wrapping) are slower to parse because each character must be checked against the quoting state machine. Simple comma-delimited files without quotes are fastest.

All 11 Analysis Types

Every type included in the benchmark. All run in a single profiling pass.

Automatic Type Detection
Classifies each column into one of 15 data types with confidence scoring.
Tests each column's values against: integer, float, boolean, date (8 formats), email, phone, URL, currency, NPI, ICD-10, CPT, SSN, and generic string. Confidence calculated as the ratio of successfully parsed values.
Descriptive Statistics
Mean, median, mode, min, max, range, std deviation, variance, Q1, Q3, IQR.
Calculated on the full dataset β€” no sampling. For 10M rows, the sum of a sequential integer column matched the mathematical formula result exactly (50,000,005,000,000). Single-pass algorithm: statistics accumulate as rows stream in from the parser.
Value Histograms
Distribution visualization using actual data values β€” not sampled approximations.
For numeric columns: Sturges' rule determines bin count; bins computed from full-pass min/max. For categorical columns: top 20 value frequencies. Histogram data is included in JSON export.
Quality Issue Detection
Null rates, IQR outliers, whitespace issues, duplicates, constant columns.
Five issue types: (1) null rate per column, (2) statistical outliers via IQR fencing (< Q1 βˆ’ 1.5Γ—IQR or > Q3 + 1.5Γ—IQR), (3) leading/trailing whitespace in string values, (4) duplicate values in high-cardinality columns, (5) constant columns (all values identical β€” zero variance).
Cardinality & Uniqueness
Unique value count, uniqueness ratio, primary key detection.
Uniqueness ratio = unique values / total non-null rows. Columns with ratio = 1.0 and zero nulls are flagged as candidate primary keys. Columns with ratio < 0.01 are flagged as low-cardinality (suitable for enum validation).
Top Values Frequency
Most frequent values with occurrence counts and percentages.
Top 20 values by frequency for categorical and low-cardinality columns. Suppressed for columns with uniqueness ratio > 95% (e.g., UUID/hash columns) β€” those are identified as candidate primary keys instead.
Cross-Column Insights
Duplicate rows, correlated null patterns, candidate foreign keys.
Three sub-analyses: (1) exact duplicate row detection across all columns, (2) correlated null detection β€” columns that are null together significantly more than chance β€” using co-occurrence ratio, (3) candidate foreign key detection β€” columns in one dataset that match high-cardinality columns in another.
Pearson Correlation
Full correlation matrix for numeric column pairs.
Standard Pearson r coefficient for all numeric column pairs. Correlations |r| > 0.7 flagged as strong positive; r < -0.7 flagged as strong negative. Useful for identifying redundant features before ML training or for detecting expected relationships (e.g., age and birth year).
ML Anomaly Detection
Isolation Forest algorithm applied to numeric columns.
Isolation Forest works by building random decision trees that attempt to isolate individual points. Anomalies require fewer splits β€” they're unusual across multiple dimensions simultaneously. Applied to all numeric columns together, identifying multivariate outliers not visible in single-column IQR analysis.
Time Series Pattern Detection
Date range, frequency, gaps, most recent data point.
Applied to date and timestamp columns. Detects: date range (min β†’ max), inferred frequency (daily/weekly/monthly based on median gap), gaps larger than 2Γ— median gap, and most recent data point. Useful for identifying data freshness issues and missing periods.
Export (JSON + CSV)
Full machine-readable JSON profile and human-readable CSV summary.
JSON export: complete profile including all statistics, histogram data, quality issues, correlation matrix, anomaly flags, and time series results. CSV export: one row per column, key metrics only β€” suitable for importing into Excel or Google Sheets for further review.

Mathematical Accuracy Verification

We verified statistical accuracy against known mathematical results before publishing benchmark claims. The 10M-row test used sequential integers 1 through 10,000,000.

Formula Sum (n(n+1)/2)
50,000,005,000,000
Profiler Sum (verified)
50,000,005,000,000
Expected Mean ((1+n)/2)
5,000,000.5
Profiler Mean (verified)
5,000,000.5

Why Client-Side Processing Doesn't Compromise Speed

The profiler runs in a Web Worker β€” a background thread separate from the browser's UI thread. This means the page stays responsive while profiling runs. The Web Worker has direct access to the File API, reading data from disk without network round-trips.

The bottleneck is CPU and RAM, not network. On the benchmark hardware (i5-12600KF), the profiler saturates a single core for the streaming parse + analysis phase. Multi-core parallelization is possible for future versions β€” the single-threaded benchmark numbers are the conservative baseline.

Benchmark Disclaimer: All benchmarks measured on the hardware above using an 11-column synthetic dataset with mixed types (integers, strings, dates, floats). Results will vary by hardware, browser, column count, column types, and file encoding. The 10M-row test used a 1.26 GB UTF-8 CSV with comma delimiter.
Related:Data Profiler β€” Tool OverviewBlog: Profile 5M Rows GuideTry Data Profiler
Profile My CSV β€” Free

No signup. No upload. Files never leave your browser.

Benchmarks last updated: February 2026 Β· Chrome 131 Β· Windows 11 Β· Intel i5-12600KF Β· 16GB RAM Β· Data Profiler v1.0 Β· SBPP-2026 Protocol