Navigated to data-profiler
No Upload Required β€” Files Stay in Your Browser

Understand Any CSV File
In Under 2 Minutes

11 analysis types. Type detection, statistics, ML anomaly detection, correlation analysis, time series patterns β€” run entirely in your browser on files up to 10M rows. No uploads. HIPAA/GDPR-safe by architecture.

PHI Never Leaves Your Browser
Up to 93K Rows/Sec
100% Mathematically Accurate
Export JSON + CSV Reports
Profile My CSV Free

No signup. No upload. Works on files up to 10M rows.

You opened the file. Now what?

What everyone else recommends:

  • Excel: crashes on files over 1M rows, no ML anomaly detection, sampling distorts statistics
  • Google Sheets: 5M cell limit, no statistical analysis, no type detection
  • Python: requires setup, pandas expertise, writing EDA scripts per file
  • Cloud BI tools: upload your sensitive data to a server, monthly fees, overkill for quick EDA
  • Data consultants: $150–$300/hr to tell you what's in your own file

What SplitForge Data Profiler does:

  • Drop your CSV. Nothing is uploaded.
  • Automatic type detection on every column β€” 15 types including email, phone, date formats, currency.
  • Descriptive statistics: mean, median, quartiles, standard deviation, IQR.
  • Quality issues flagged: nulls, outliers, duplicates, whitespace, constant columns.
  • Cross-column insights, Pearson correlations, ML anomaly detection, time series patterns.

TL;DR β€” What you get in 30 seconds:

Complete column inventory: types, null rates, unique counts, value ranges
Statistical summary: mean, median, std deviation, quartiles for every numeric column
Quality report: outliers (IQR method), duplicates, whitespace issues, constant columns
Cross-column patterns: correlated nulls, candidate foreign keys, duplicate rows

Built for the data analyst who needs answers, not another tool to configure. Every analysis runs in a Web Worker so the UI never freezes. The 10M-row benchmark (106 seconds) was measured on real hardware with real data β€” not a theoretical projection. Histograms use actual values, not sampled approximations.

β€” SplitForge Engineering, February 2026

What is data profiling, exactly?

Data profiling is the process of examining a dataset to understand its structure, content, and quality before you use it. It answers: What columns exist? What types are they? How clean is the data? Are there patterns, anomalies, or relationships worth knowing about?

Practical example: You receive a 500K-row CRM export before a Salesforce import. Without profiling: you find out halfway through the import that 8% of email addresses are malformed, 3% of records have duplicate emails, and the phone column has 12 different formats. With profiling: you know all of this in 30 seconds and fix it before the first upload attempt.

Calculate Time Saved Per Month

Based on internal workflow analysis. Your mileage will vary.

Manual Time / Month
3.0h
Hours Saved / Month
3.0h
Value Saved / Month
$163

Assumes 30s per file with Data Profiler (tested on 500K-row files). Manual baseline: ~45 min/file in Excel or Python setup.

How to Profile a Large CSV File

Step-by-step for first-time users and experienced analysts alike.

01

Open the Data Profiler tool

No signup required. The tool loads entirely in your browser β€” the profiling engine is a Web Worker that runs locally.

02

Drop or select your CSV file

Drag and drop your file or click to browse. The file never leaves your machine β€” it's read directly from disk into browser memory.

03

Confirm delimiter detection

The profiler auto-detects comma, tab, semicolon, and pipe delimiters. Review the preview to confirm columns parsed correctly.

Learn about CSV delimiter formats
04

Click 'Profile' and wait

Processing runs in the background. A 500K-row file completes in about 30 seconds. A 10M-row file takes under 2 minutes. The progress bar updates in real time.

05

Review the full analysis report

Explore type detection, statistics, quality issues, cross-column insights, correlations, anomalies, and time series results. Use Data Cleaner to fix quality issues identified in the report.

06

Export your results

Download the full profile as JSON (machine-readable, includes all statistics) or as a CSV summary (human-readable, import into Excel or Google Sheets).

11 Analysis Types Included

Every analysis runs client-side in a Web Worker. No sampling on files under 10M rows.

Automatic Type Detection

15 types

Identifies 15 data types per column β€” email, phone, URL, date (multiple formats), integer, float, boolean, currency, NPI, ICD-10, CPT, SSN, and more. Each detection includes a confidence score.

Descriptive Statistics

100% accurate

Mean, median, mode, min, max, range, standard deviation, variance, quartiles (Q1/Q3), IQR β€” calculated on the full dataset without sampling.

Value Histograms

No sampling

Real value distributions for numeric and categorical columns. No approximation β€” histograms reflect actual data values, not a sampled subset.

Quality Issue Detection

5 issue types

Null rates, outlier detection (IQR method: values > Q3 + 1.5Γ—IQR or < Q1 βˆ’ 1.5Γ—IQR), leading/trailing whitespace, duplicate values, and constant columns (all same value).

Cardinality & Uniqueness

Primary key ID

Unique value count and uniqueness ratio per column. Automatically identifies candidate primary keys (100% unique, no nulls) and low-cardinality columns suitable for enums or dropdowns.

Top Values Frequency

Top 20 values

Most frequent values for categorical columns with occurrence counts and percentages. Identifies dominant values that may indicate data entry patterns or enum validation candidates.

Cross-Column Insights

3 insight types

Detects duplicate rows across the full dataset, correlated null patterns (columns that are null together more than chance), and candidate foreign key relationships between columns.

Pearson Correlation Analysis

Full matrix

Correlation matrix for all numeric column pairs. Flags strong correlations (|r| > 0.7) and negative correlations. Useful for identifying redundant features before ML model training.

ML Anomaly Detection

Isolation Forest

Isolation Forest algorithm applied to numeric columns. Flags statistical outliers that aren't caught by simple IQR rules β€” multivariate anomalies visible only when multiple columns are considered together.

Time Series Pattern Detection

Gap detection

Detects date/timestamp columns and analyzes temporal patterns: date range, frequency (daily/weekly/monthly), gaps in the time series, and the most recent data point.

100% Client-Side Processing

Zero uploads

All 11 analysis types run in a Web Worker in your browser. Files are read directly from disk β€” no server transmission, no upload. PHI and PII stay local.

Data Profiler vs Excel, Python, Cloud BI

Honest comparison β€” not a cherry-picked feature matrix.

FeatureExcelPython / pandasCloud BISplitForge
File Size Limit
Hard limit: 1,048,576 rows
Limited by RAM (typically 8GB+)
Varies β€” upload limits common
Up to ~2GB / 10M rows
Processing Speed
Slow: formula recalc on large files
Fast but requires code
Varies by upload speed + queue
93K rows/sec (10M rows tested)
Analysis Depth
Manual only: PivotTables, COUNTIF
Full EDA possible β€” requires coding
Varies; often limited to dashboards
11 types: stats + ML + time series
Statistical Accuracy
Accurate but manual; no IQR outlier detection
Accurate β€” pandas exact calculations
Often sampled for large datasets
100% accurate β€” no sampling
Data Privacy
Local β€” no upload required
Local β€” runs on your machine
Uploads your data to a server
Files never uploaded β€” browser only
Setup Time
Low β€” open file, build manually
High β€” install, write EDA script per file
High β€” account setup, connector config
Zero β€” open browser, drop file
ML Anomaly Detection
Not available
Available via scikit-learn β€” requires coding
Available in some tools β€” usually paid
Isolation Forest β€” built in
Cost
Microsoft 365 subscription required
Free (open source)
$50–$500+/month typical
Free
Best For
Small files, existing workflowProgrammers, reproducible pipelinesOngoing BI dashboardsInstant EDA on any file, any size

Why SplitForge Instead of the Alternatives

Not a knock on the other tools β€” they're good at different things.

Excel / Google Sheets

Great for small files and manual exploration. COUNTIF, PivotTables, conditional formatting β€” all useful. But no IQR outlier detection, no ML anomaly detection, no correlation matrix, and it locks up or refuses to open files over 1M rows.

Use for small files + manual EDA

Python / pandas

The most powerful option if you know how to use it. But writing a full EDA script per file β€” imports, dtype detection, null analysis, histogram generation, IQR outliers, Isolation Forest, correlation matrix β€” takes 30–60 minutes even for experienced users.

Use for reproducible pipelines + programmers

Cloud BI Tools (Tableau, PowerBI, Looker)

Purpose-built for ongoing dashboards and business intelligence. But they require uploading your data to a server, setting up connectors, and a monthly subscription. Severe overkill for ad-hoc CSV exploration.

Use for ongoing dashboards + BI teams

SplitForge Data Profiler

Zero setup. Drop a file, get 11 analysis types in under 2 minutes, no uploads. The sweet spot: files too large for Excel, situations where uploading to a SaaS is not acceptable, and anyone who doesn't want to write EDA scripts from scratch.

Best for: instant EDA, large files, privacy-sensitive data

When to Use Data Profiler

Pre-import validation

Profile your export before uploading to Salesforce, HubSpot, or any CRM. Find null rates, format issues, and duplicate keys before they cause import failures.

HIPAA-sensitive data analysis

Patient data, PHI, PII β€” none of it leaves your browser. Designed to avoid server transmission of protected health information.

Quick exploratory data analysis

Skip the pandas setup. Get mean, median, quartiles, histograms, correlations, and anomaly flags without writing a single line of code.

Data quality audit

Identify null columns, constant columns, whitespace issues, outliers, and duplicate rows before presenting data to stakeholders.

Ready to run all 11 analyses on your file? It takes under 2 minutes.

Profile My CSV Free

Edge Cases and How They're Handled

Honest documentation of known tricky inputs.

Mixed date formats in a single column

Columns with extreme outliers skewing statistics

High-cardinality string columns (UUID/hash fields)

Near-empty columns (95%+ null)

Constant columns (all same value)

Correlated null patterns

Perfect For

  • Data analysts profiling CRM exports before import
  • Healthcare teams with PHI/PII that must stay local
  • Anyone needing quick EDA without Python setup
  • Large files that crash Excel (1M+ rows)
  • Detecting outliers and anomalies before ML model training
  • Auditing data quality before stakeholder presentations
  • Understanding a new dataset's structure in under 2 minutes
  • Finding candidate primary keys and foreign key relationships
  • Time series gap analysis on date/timestamp columns
  • One-off file analysis where scripting is overkill

Not For

  • Files over ~2GB / 15M rows (browser memory limits)
  • Automated, scheduled, or pipeline-based profiling (no API)
  • Collaborative team environments needing shared reports
  • Real-time streaming data (batch file only)
  • Production-grade data quality monitoring
  • Non-CSV/Excel formats (JSON, Parquet, Avro, databases)
  • Advanced ML feature engineering (use Python scikit-learn)

For these scenarios, consider Python + pandas, Great Expectations, or dbt schema tests.

Performance Benchmarks

Chrome 131 Β· Windows 11 Β· 16GB RAM Β· Intel i5-12600KF

View full benchmark methodology
94K rows
0.4s
235K/s
1.5M rows
14.85s
101K/s
5M rows
53.45s
93K/s
10M rows
106.83s
93K/s
Benchmark conditions: Chrome 131, Windows 11, Intel i5-12600KF (3.70GHz), 64GB RAM. 11-column synthetic dataset with mixed types. Results vary by hardware, browser, column count, and file complexity.

Frequently Asked Questions

How fast is the Data Profiler on large CSV files?

What types of analysis does the Data Profiler perform?

Is my data safe when using the Data Profiler?

What's the maximum file size / row count?

Are the statistics exact or approximated?

How does the Isolation Forest anomaly detection work?

What happens with mixed-type columns (numbers stored as text)?

What CSV delimiters are supported?

What export formats are available?

Ready to Profile Your CSV?

11 analysis types: type detection, stats, ML anomaly detection, correlations, time series
Tested to 10M rows β€” 106 seconds on verified hardware
Files never uploaded β€” browser-only, PHI-safe
Export results as JSON or CSV
Profile My CSV β€” Free