Verified Benchmark — February 2026

10 Million Rows Profiled in 107 Seconds

11 analysis types. 93,604 rows/sec at steady state. 100% mathematically accurate — no sampling. Browser-only. File contents never uploaded.

Results measured on Chrome 131, Windows 11, Intel i5-12600KF, 16GB RAM. Your hardware may differ.

235K

rows/sec (94K dataset)

Peak Throughput

94K rows · 0.4s

93K

rows/sec (large files)

Steady-State Speed

5M–10M rows

1.26

GB CSV file

Max File Tested

10M rows · 106.83s

100

% — no sampling

Statistical Accuracy

Mathematically verified

10,000,000

rows profiled in a single session — verified benchmark

Processing Time

106.83s

Throughput

93,604/s

File Size

1.26 GB

Accuracy

100%

Verified Performance Results

Chrome 131 · Windows 11 · Intel i5-12600KF (3.70GHz) · 64GB RAM · NVMe SSD

Dataset

Time

Rows/sec

File Size

Status

94,465 rows

Typical CRM contact export

0.4s

235K/s

~9 MB

Tested

1,500,000 rows

Medium dataset — Salesforce full org export

14.85s

101K/s

~145 MB

Tested

5,000,000 rows

Large dataset — enterprise transaction log

53.45s

94K/s

~485 MB

Verified

10,000,000 rows

Maximum tested — 1.26 GB CSV file

106.83s

94K/s

1.26 GB

Verified

Performance Charts

Processing time and throughput across tested dataset sizes.

Processing Time vs Dataset Size

Seconds to complete full 11-type analysis

Throughput vs Dataset Size

Rows processed per second (K/s)

Performance Variability Factors

Benchmarks were run on a controlled configuration. Real-world performance varies based on these factors:

High-Cardinality String Columns20–40% slower

Columns with many unique string values (e.g., free-text notes, UUID fields) require more memory for frequency counting and top-value tracking. A 10M-row file with 5 high-cardinality text columns may take 140–150 seconds instead of 107.

Very Wide Files (100+ Columns)30–50% slower

All 11 analysis types run per column. A 100-column file requires roughly 10× the correlation matrix computation vs a 10-column file. The benchmark dataset had 11 columns — files with 50+ columns will be proportionally slower.

String-Heavy Datasets15–25% slower

Text parsing and pattern matching (for type detection on string values) is slower than numeric operations. Files with mostly text columns — names, addresses, descriptions — will be on the slower end of the performance range.

Available Browser RAMUp to 2× slower on low-RAM machines

The benchmark used 16GB RAM with Chrome as the primary process. On machines with 8GB RAM and other applications running, large files may trigger garbage collection, significantly slowing throughput. Recommended: 16GB+ RAM for files over 5M rows.

Quoted Fields / Embedded Newlines5–15% slower

CSV files where fields contain commas or newlines (requiring quote-wrapping) are slower to parse because each character must be checked against the quoting state machine. Simple comma-delimited files without quotes are fastest.

All 11 Analysis Types

Every type included in the benchmark. All run in a single profiling pass.

Automatic Type Detection

Classifies each column into one of 15 data types with confidence scoring.

Tests each column's values against: integer, float, boolean, date (8 formats), email, phone, URL, currency, NPI, ICD-10, CPT, SSN, and generic string. Confidence calculated as the ratio of successfully parsed values.

Descriptive Statistics

Mean, median, mode, min, max, range, std deviation, variance, Q1, Q3, IQR.

Calculated on the full dataset — no sampling. For 10M rows, the sum of a sequential integer column matched the mathematical formula result exactly (50,000,005,000,000). Single-pass algorithm: statistics accumulate as rows stream in from the parser.

Value Histograms

Distribution visualization using actual data values — not sampled approximations.

For numeric columns: Sturges' rule determines bin count; bins computed from full-pass min/max. For categorical columns: top 20 value frequencies. Histogram data is included in JSON export.

Quality Issue Detection

Null rates, IQR outliers, whitespace issues, duplicates, constant columns.

Five issue types: (1) null rate per column, (2) statistical outliers via IQR fencing (< Q1 − 1.5×IQR or > Q3 + 1.5×IQR), (3) leading/trailing whitespace in string values, (4) duplicate values in high-cardinality columns, (5) constant columns (all values identical — zero variance).

Cardinality & Uniqueness

Unique value count, uniqueness ratio, primary key detection.

Uniqueness ratio = unique values / total non-null rows. Columns with ratio = 1.0 and zero nulls are flagged as candidate primary keys. Columns with ratio < 0.01 are flagged as low-cardinality (suitable for enum validation).

Top Values Frequency

Most frequent values with occurrence counts and percentages.

Top 20 values by frequency for categorical and low-cardinality columns. Suppressed for columns with uniqueness ratio > 95% (e.g., UUID/hash columns) — those are identified as candidate primary keys instead.

Cross-Column Insights

Duplicate rows, correlated null patterns, candidate foreign keys.

Three sub-analyses: (1) exact duplicate row detection across all columns, (2) correlated null detection — columns that are null together significantly more than chance — using co-occurrence ratio, (3) candidate foreign key detection — columns in one dataset that match high-cardinality columns in another.

Pearson Correlation

Full correlation matrix for numeric column pairs.

Standard Pearson r coefficient for all numeric column pairs. Correlations |r| > 0.7 flagged as strong positive; r < -0.7 flagged as strong negative. Useful for identifying redundant features before ML training or for detecting expected relationships (e.g., age and birth year).

ML Anomaly Detection

Isolation Forest algorithm applied to numeric columns.

Isolation Forest works by building random decision trees that attempt to isolate individual points. Anomalies require fewer splits — they're unusual across multiple dimensions simultaneously. Applied to all numeric columns together, identifying multivariate outliers not visible in single-column IQR analysis.

Time Series Pattern Detection

Date range, frequency, gaps, most recent data point.

Applied to date and timestamp columns. Detects: date range (min → max), inferred frequency (daily/weekly/monthly based on median gap), gaps larger than 2× median gap, and most recent data point. Useful for identifying data freshness issues and missing periods.

Export (JSON + CSV)

Full machine-readable JSON profile and human-readable CSV summary.

JSON export: complete profile including all statistics, histogram data, quality issues, correlation matrix, anomaly flags, and time series results. CSV export: one row per column, key metrics only — suitable for importing into Excel or Google Sheets for further review.

Mathematical Accuracy Verification

We verified statistical accuracy against known mathematical results before publishing benchmark claims. The 10M-row test used sequential integers 1 through 10,000,000.

Formula Sum (n(n+1)/2)

50,000,005,000,000

Profiler Sum (verified)

50,000,005,000,000

Expected Mean ((1+n)/2)

5,000,000.5

Profiler Mean (verified)

5,000,000.5

Why Client-Side Processing Doesn't Compromise Speed

The profiler runs in a Web Worker — a background thread separate from the browser's UI thread. This means the page stays responsive while profiling runs. The Web Worker has direct access to the File API, reading data from disk without network round-trips.

The bottleneck is CPU and RAM, not network. On the benchmark hardware (i5-12600KF), the profiler saturates a single core for the streaming parse + analysis phase. Multi-core parallelization is possible for future versions — the single-threaded benchmark numbers are the conservative baseline.

Benchmark Disclaimer: All benchmarks measured on the hardware above using an 11-column synthetic dataset with mixed types (integers, strings, dates, floats). Results will vary by hardware, browser, column count, column types, and file encoding. The 10M-row test used a 1.26 GB UTF-8 CSV with comma delimiter.

Profile My CSV — Free

No signup. No upload. Files never leave your browser.

Benchmarks last updated: February 2026 · Chrome 131 · Windows 11 · Intel i5-12600KF · 16GB RAM · Data Profiler v1.0 · SBPP-2026 Protocol