Benchmark Performance
Performance at Scale
Chrome (stable) · Windows 11 · Intel i7-12700K · 32GB RAM · February 2026
| Dataset Size | Simple Validation | Full Schema (5 rules + unique) | Test Notes |
|---|---|---|---|
| 1K rows | ~15K rows/sec | ~12K rows/sec | Startup overhead dominates at small sizes |
| 10K rows | ~120K rows/sec | ~90K rows/sec | Worker initialization amortizing |
| 100K rows | ~430K rows/sec | ~280K rows/sec | Typical CRM export batch |
| 1M rows | ~490K rows/sec | ~350K rows/sec | Salesforce Contacts import — 5 rules |
| 5M rows | ~495K rows/sec | ~310K rows/sec | Uniqueness hash table scaling |
| 10M rows | ~500K rows/sec | ~270K rows/sec | Verified benchmark — 37s full schema |
| ~2GB file | ~490K rows/sec | ~240K rows/sec | Near browser memory ceiling — results vary |
Results vary by hardware, browser, rule count, and file complexity. Simple validation = email format + required check (2 rules, no uniqueness). Full schema = 5 rules including one uniqueness check across all rows.
Feature Performance Overhead
When Data Validator Is Slower Than Expected
Transparency on conditions that reduce throughput below published benchmarks
The hash table for uniqueness checking grows with each unique value. At 10M rows, it can reach ~1.2GB peak allocation. On an 8GB machine with OS + browser overhead (~4–5GB used), the hash table competes for available RAM and triggers garbage collection cycles, significantly slowing throughput.
RegExp.test() cost scales with pattern complexity. Simple patterns like /^\d{10}$/ are fast. Complex lookaheads, alternation, or backtracking patterns (e.g., /^(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9...]+)*|"...")$/) can be 5–10x slower per row than basic data type checks.
Each uniqueness check builds its own hash table independently. Two uniqueness rules (e.g., Email unique + Account ID unique) means two full-file passes building two separate hash sets. Three uniqueness columns at 10M rows can add 20–30 seconds on top of base validation time.
Excel's binary XLSX format requires an additional parsing step via SheetJS before validation can begin. A 10M row Excel file adds 6–8 seconds of XLSX parsing on top of validation time. CSV files with UTF-8 encoding and comma delimiters give the best performance.
Safari's JavaScriptCore engine is generally slower than Chrome's V8 on Web Worker-heavy workloads. Safari on Apple M-series chips (M1/M2/M3) is competitive. Safari on Intel Macs (pre-2021) shows the largest gap, particularly on hash table operations used in uniqueness checking.
Required field checks on columns with high null rates (50%+ empty) run slightly faster than columns with dense data — fewer type conversion checks needed. However, uniqueness hash tables can be slower if null handling creates excessive collision chains. Generally not a meaningful factor below 10M rows.
Calculate Your Time Savings
Files that needed 2+ import attempts
Active months doing CRM imports
Data analyst avg: $50–75/hr
- 15–25 minute upload wait times per failed attempt
- Hunting for validation errors one at a time (Salesforce only reports one error per upload)
- Manual Excel cleanup with no regex support and a 256-rule limit
- Re-uploading the same file 3–5 times before it imports cleanly
- Discovering errors for the first time in production data
SplitForge Browser Validation Standard (SVBP-2026)
10 runs per config · drop high/low · report avg · test datasets available on request · v1.0 — February 2026
Validation Engine Changelog
- Initial release — streaming Web Worker architecture
- Hash-based uniqueness checking (O(1) average lookup)
- PapaParse streaming for 1GB+ file support
- Short-circuit at 100 blocking errors for corrupt file performance
- SVBP-2026 benchmark protocol established
- Improved uniqueness hash allocation (target: -15% memory overhead)
- Multi-column uniqueness checking (compound keys)
- Regex precompilation for repeated pattern checks
- Extended healthcare code set updates (FY2027 ICD-10-CM)
Honest Limitations: Where SplitForge Data Validator Falls Short
No tool is perfect for every use case. Here's where Server-Side Validation Tools (Great Expectations / dbt tests / AWS Glue) might be a better choice, and the real limitations of our browser-based architecture.
Browser-Based Processing
Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.
No Offline Mode (Initial Load)
Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.
Browser Tab Memory Limits
Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.
Browser Memory Ceiling (~2GB / 10–15M Rows)
Maximum practical file size is ~2GB (~10–15M rows, hardware-dependent). Very large files with many uniqueness checks risk running into browser memory limits as the hash table grows.
Short-Circuit After 100 Blocking Errors
Validation stops after finding 100 blocking errors to prevent overwhelming the UI on severely corrupt files. If your file has thousands of blocking errors, you'll need multiple validation passes.
No API or Automation Support
Data Validator is a browser tool — no REST API, CLI, or pipeline integration. Cannot be embedded in ETL workflows, CI/CD pipelines, or scheduled quality checks.
Single File Per Session
Data Validator processes one file at a time. No batch validation across multiple files in a single operation.
When to Use Server-Side Validation Tools (Great Expectations / dbt tests / AWS Glue) Instead
You need to validate data in an automated CI/CD or ETL pipeline
Data Validator has no API. Browser-only workflow cannot run on a schedule or be triggered programmatically.
You need to validate 50M+ row files regularly
Browser memory limits practical ceiling to ~10–15M rows depending on hardware. Server-side tools scale horizontally.
You need team-shared validation schemas with version control
Data Validator schemas exist only in browser sessions — no sharing, no versioning, no team collaboration features.
You need statistical anomaly detection (outlier detection, distribution checks)
Data Validator handles rule-based validation only — required, format, range, regex, enum, uniqueness. No statistical profiling.
Questions about limitations? Check our FAQ section below or contact us via the feedback button.
Frequently Asked Questions
How accurate is the 37-second benchmark for 10M rows?
What's the difference between Simple Validation and Full Schema mode?
Why is uniqueness checking slower than other validation rules?
How does Excel Data Validation compare at scale?
What browser and hardware give the best validation performance?
Does CSV vs Excel format affect validation speed?
What happens when validation hits the 100 blocking-error short-circuit?
Can I reproduce these benchmarks?
Benchmarks last updated: February 2026 · Re-tested quarterly and after major algorithm changes · Validation Engine v1.0 · SVBP-2026 Protocol