Verified Benchmark — February 2026

10 Million CSV Rows Validated in 37 Seconds

~270K rows/sec with full schema (5 rules + uniqueness). Up to ~500K rows/sec for simple validation. 12 rule types, 20+ data types, 15+ CRM presets — file contents never uploaded. Works alongside Data Cleaner for end-to-end data quality workflows, or see the full tool overview.

~270K/s

Full Schema Speed

rows/sec (5 rules + uniqueness)

~500K/s

Simple Validation

rows/sec (format checks only)

Never

File Uploads

zero transmission

NPI+ICD

Healthcare Modes

CPT, taxonomy codes

Benchmark Performance

All Data Validator times: Chrome (stable), Windows 11, Intel i5-12600KF, 64GB RAM, February 2026. 10 runs per configuration — highest/lowest discarded, remaining 8 averaged. Results vary by hardware, browser, rule count, and file complexity. Excel times estimated from internal validation workflow testing — actual times vary by rule complexity.

Performance at Scale

Chrome (stable) · Windows 11 · Intel i5-12600KF · 64GB RAM · February 2026

Dataset Size	Simple Validation	Full Schema (5 rules + unique)	Test Notes
1K rows	~15K rows/sec	~12K rows/sec	Startup overhead dominates at small sizes
10K rows	~120K rows/sec	~90K rows/sec	Worker initialization amortizing
100K rows	~430K rows/sec	~280K rows/sec	Typical CRM export batch
1M rows	~490K rows/sec	~350K rows/sec	Salesforce Contacts import — 5 rules
5M rows	~495K rows/sec	~310K rows/sec	Uniqueness hash table scaling
10M rows	~500K rows/sec	~270K rows/sec	Verified benchmark — 37s full schema
~2GB file	~490K rows/sec	~240K rows/sec	Near browser memory ceiling — results vary

Results vary by hardware, browser, rule count, and file complexity. Simple validation = email format + required check (2 rules, no uniqueness). Full schema = 5 rules including one uniqueness check across all rows.

Feature Performance Overhead

Simple Validation (format checks only)

Baseline

~500K rows/sec

Email format + required field checks only. No uniqueness checking. Fastest possible throughput — no cross-row state to maintain. Use for quick format QA on CRM exports before manual review.

Full Schema (5 rules, no uniqueness)

+30% time

~380K rows/sec

Required + dataType + length + range + regex all applied per row — stateless rules that check each row in isolation. ~30% slower than simple validation due to increased checks per row. Still ~380K rows/sec at scale.

Uniqueness Checking (hash table deduplication)

+40% over full schema

~270K rows/sec

Builds a hash set of all values in the target column as it processes rows. Memory-intensive: 10M unique emails requires ~1.2GB hash table. Adds ~8–12 seconds on 10M rows. Required for CRM imports where HubSpot/Salesforce reject duplicate emails.

Regex Rules (custom patterns)

+25% time

~350K rows/sec

RegExp.test() per row adds meaningful overhead — JavaScript regex evaluation is expensive at scale. Simple patterns (\d{10}) are faster than complex patterns. Combine with other rules: regex adds ~25% on top of required+dataType checks.

Export Failed/Passed Rows

<5% overhead

~490K rows/sec

Filtering and writing the output files adds minimal overhead — rows are already categorized during the validation pass. Export itself runs in parallel with result rendering. CSV export faster than XLSX by ~30%.

Healthcare Codes (NPI + ICD-10 + CPT)

+55% over full schema

~220K rows/sec

NPI validation: Luhn algorithm check on 10-digit codes. ICD-10: lookup against 70K+ valid codes (hash set). CPT: lookup against 10K+ AMA codes (hash set). Building these lookup tables adds ~2–3 seconds startup; per-row lookups add significant overhead vs simple format checks.

All overhead figures measured on 10M row dataset, February 2026, Chrome (stable), 64GB RAM, Intel i5-12600KF. Results vary by hardware, browser, rule count, and file complexity.

When Data Validator Is Slower Than Expected

Transparency on conditions that reduce throughput below published benchmarks

8GB RAM Machine

30–50% slower on uniqueness checks

The hash table for uniqueness checking grows with each unique value. At 10M rows, it can reach ~1.2GB peak allocation. On an 8GB machine with OS + browser overhead (~4–5GB used), the hash table competes for available RAM and triggers garbage collection cycles, significantly slowing throughput.

Mitigation: Split file into 2–3M row chunks, validate each separately. Or disable uniqueness checking if duplicate detection is not critical for your import.

Complex Regex Patterns

40–60% slower vs simple format checks

RegExp.test() cost scales with pattern complexity. Simple patterns like /^\d{10}$/ are fast. Complex lookaheads, alternation, or backtracking patterns can be 5–10x slower per row than basic data type checks.

Mitigation: Use the built-in dataType: "email" validator (optimized regex) instead of a custom email regex. For custom patterns, test regex performance in browser console before applying to 10M row files.

Multiple Uniqueness Columns

+8–12 seconds per additional uniqueness column

Each uniqueness check builds its own hash table independently. Two uniqueness rules (e.g., Email unique + Account ID unique) means two full-file passes building two separate hash sets. Three uniqueness columns at 10M rows can add 20–30 seconds on top of base validation time.

Mitigation: Limit uniqueness checks to the minimum required for your import target. For Salesforce Contacts, only Email requires uniqueness — Account ID and Phone do not.

Excel / XLSX Input Files

20–30% slower vs equivalent CSV

Excel's binary XLSX format requires an additional parsing step via SheetJS before validation can begin. A 10M row Excel file adds 6–8 seconds of XLSX parsing on top of validation time. CSV files with UTF-8 encoding and comma delimiters give the best performance.

Mitigation: Convert Excel to CSV first using SplitForge's Excel to CSV Converter, then validate. Total time including conversion is still faster than validating XLSX directly.

Safari on Intel Mac

25–40% slower vs Chrome on same hardware

Safari's JavaScriptCore engine is generally slower than Chrome's V8 on Web Worker-heavy workloads. Safari on Apple M-series chips (M1/M2/M3) is competitive. Safari on Intel Macs (pre-2021) shows the largest gap, particularly on hash table operations used in uniqueness checking.

Mitigation: Use Chrome or Edge for large file validation on Intel Macs. Firefox is a good alternative — ~5–10% slower than Chrome but significantly faster than Safari on Intel.

Files With Many Empty/Null Values

Minimal (<5%) but worth noting

Required field checks on columns with high null rates (50%+ empty) run slightly faster than columns with dense data — fewer type conversion checks needed. However, uniqueness hash tables can be slower if null handling creates excessive collision chains. Generally not a meaningful factor below 10M rows.

Mitigation: Not worth optimizing for. Empty value handling is part of normal validation. If a column is 90%+ empty, consider whether uniqueness checking is meaningful for it.

Calculate Your Time Savings

Manual baseline: ~2.5 hours per failed import cycle — based on internal workflow testing, February 2026. This covers: 3 upload attempts (15 min wait each) + manual Excel error cleanup (30–45 min each pass) + re-upload cycles. Data Validator: ~11 minutes total (8s validate → export failed rows → 10 min bulk fix → 8s re-validate → clean import). Results vary by import complexity, file size, and error count.

Failed imports per month:

Files that needed 2+ import attempts

Months per year:

Active months doing CRM imports

Hourly rate ($):

Data analyst avg: $50–75/hr

Files Processed / Year

imports validated

Annual Time Saved

111.2

hours per year

Annual Labor Savings

$6,672

per year (vs manual import cycles)

What you eliminate:

15–25 minute upload wait times per failed attempt
Hunting for validation errors one at a time (Salesforce only reports one error per upload)
Manual Excel cleanup with no regex support and a 256-rule limit
Re-uploading the same file 3–5 times before it imports cleanly
Discovering errors for the first time in production data

SplitForge Browser Validation Standard (SVBP-2026)

10 runs per config · drop high/low · report avg · test datasets available on request · v1.0 — February 2026

Expand

Validation Engine Changelog

CurrentValidation Engine v1.0February 2026

Initial release — streaming Web Worker architecture
Hash-based uniqueness checking (O(1) average lookup)
PapaParse streaming for 1GB+ file support
Short-circuit at 100 blocking errors for corrupt file performance
SVBP-2026 benchmark protocol established

PlannedValidation Engine v1.1Q2 2026 (Planned)

Improved uniqueness hash allocation (target: -15% memory overhead)
Multi-column uniqueness checking (compound keys)
Regex precompilation for repeated pattern checks
Extended healthcare code set updates (FY2027 ICD-10-CM)

Honest Limitations: Where SplitForge Data Validator Falls Short

No tool is perfect for every use case. Here's where Server-Side Validation Tools (Great Expectations / dbt tests / AWS Glue) might be a better choice, and the real limitations of our browser-based architecture.

Browser-Based Processing

Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.

Workaround:

Close unnecessary browser tabs to free up memory. For files over 50M rows, consider database solutions.

No Offline Mode (Initial Load)

Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.

Workaround:

Once loaded, you can disconnect and continue processing. For true offline environments, desktop tools may be better.

Browser Tab Memory Limits

Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.

Workaround:

Use 64-bit browsers with sufficient RAM. Chrome and Firefox handle large files best.

Browser Memory Ceiling (~2GB / 10–15M Rows)

Maximum practical file size is ~2GB (~10–15M rows, hardware-dependent). Very large files with many uniqueness checks risk running into browser memory limits as the hash table grows.

Workaround:

Split large files into chunks using SplitForge CSV Splitter, validate each chunk, then re-merge the passing rows. For 50M+ row files, use Python Great Expectations or dbt schema tests for server-side validation at scale.

Short-Circuit After 100 Blocking Errors

Validation stops after finding 100 blocking errors to prevent overwhelming the UI on severely corrupt files. If your file has thousands of blocking errors, you'll need multiple validation passes.

Workaround:

Fix the 100 reported blocking errors in bulk, re-upload, re-validate. Use SplitForge Data Cleaner for bulk standardization before validating — standardizing email formats and phone numbers before validation reduces blocking errors significantly.

No API or Automation Support

Data Validator is a browser tool — no REST API, CLI, or pipeline integration. Cannot be embedded in ETL workflows, CI/CD pipelines, or scheduled quality checks.

Workaround:

For automated validation in pipelines, use Python Great Expectations (open source), dbt schema tests (column-level assertions), or AWS Glue with custom data quality rules. For one-off file validation by non-technical users, Data Validator is the fastest option.

Single File Per Session

Data Validator processes one file at a time. No batch validation across multiple files in a single operation.

Workaround:

Process files sequentially. For high-volume batch workflows (20+ files), consider Python pandas with Great Expectations for programmatic batch validation.

When to Use Server-Side Validation Tools (Great Expectations / dbt tests / AWS Glue) Instead

You need to validate data in an automated CI/CD or ETL pipeline

Data Validator has no API. Browser-only workflow cannot run on a schedule or be triggered programmatically.

💡 Use Python Great Expectations, dbt schema tests, or AWS Glue data quality rules for pipeline-integrated validation.

You need to validate 50M+ row files regularly

Browser memory limits practical ceiling to ~10–15M rows depending on hardware. Server-side tools scale horizontally.

💡 Python Great Expectations with Spark backend, or AWS Glue DataBrew for large-scale data quality at scale.

You need team-shared validation schemas with version control

Data Validator schemas exist only in browser sessions — no sharing, no versioning, no team collaboration features.

💡 dbt schema tests in a shared repository, or Great Expectations with a shared expectation suite stored in S3/Git.

You need statistical anomaly detection (outlier detection, distribution checks)

Data Validator handles rule-based validation only — required, format, range, regex, enum, uniqueness. No statistical profiling.

💡 Use SplitForge Data Profiler for distribution statistics, then validate with Data Validator. For ML-based anomaly detection, use Python pandas or Great Expectations with distribution expectations.

Questions about limitations? Check our FAQ section below or contact us via the feedback button.

Frequently Asked Questions

How accurate is the 37-second benchmark for 10M rows?

What's the difference between Simple Validation and Full Schema mode?

Why is uniqueness checking slower than other validation rules?

How does Excel Data Validation compare at scale?

What browser and hardware give the best validation performance?

Does CSV vs Excel format affect validation speed?

What happens when validation hits the 100 blocking-error short-circuit?

Can I reproduce these benchmarks?

Benchmarks last updated: February 2026 · Re-tested quarterly and after major algorithm changes · Validation Engine v1.0 · SVBP-2026 Protocol

Ready to Validate 10M Rows in 37 Seconds?

No installation. File contents never uploaded. 15+ CRM presets auto-configure validation rules — drop your CSV and get a full error report before your first upload attempt.