Benchmark Performance
Performance at Scale
Chrome 131 (stable) · Windows 11 · Intel i7-12700K · 32GB RAM · February 2026
| Configuration | Time | Throughput | Test Notes |
|---|---|---|---|
| 100K rows · 4 types · balanced | ~0.8 sec | ~125K rows/sec | Startup overhead visible at small sizes |
| 1M rows · 4 types · balanced | ~8.5 sec | ~118K rows/sec | Comma-separated values, mixed data types |
| 5M rows · 6 types · balanced | ~45 sec | ~111K rows/sec | Peak benchmark — 640MB CSV, 551K patterns found |
| 5M rows · 8 types · strict | ~58 sec | ~86K rows/sec | All types + Luhn + SSA + area code validation |
| 5M rows · 2 types · permissive | ~28 sec | ~179K rows/sec | Email + phone only, no deep validation |
| ~1GB file · 4 types · balanced | ~75 sec | ~95K rows/sec | Maximum tested capacity, browser-dependent |
| 2.1M rows · 4 types · balanced | ~18.4 sec | ~114K rows/sec | Real-world CRM export — mixed free-text notes fields, anonymized dataset, Feb 2026 |
Results vary by hardware, browser, number of pattern types active, validation mode, and file data density. Throughput increases with permissive mode and fewer pattern types; decreases with strict mode and all 8 types.
SplitForge vs. Python: Which Tool Fits Your Workflow?
No single tool is right for every situation. Here's an honest breakdown.
- Data contains PII, PHI, or regulated financial information that cannot leave your device
- You need Luhn, SSA, or NANP validation without writing or maintaining custom code
- You want normalization (E.164, ISO 8601) in the same step as extraction
- You need context view for audit trails or compliance review documentation
- File is under ~1GB and the task is a one-off or periodic workflow
- You don't write Python — or don't want to for this specific task
- Speed of ~111K rows/sec is sufficient for your dataset size
- You need to run extraction on a schedule or inside an automated data pipeline
- You need custom regex patterns beyond the 8 built-in types
- Files exceed 1GB or have 50M+ rows where browser memory limits apply
- Extraction is part of a larger ETL or transformation workflow
- You're comfortable writing and maintaining Python code
- You need output piped directly into another process or database
Feature Performance Overhead
Input Format Performance
Calculate Your Time Savings
Typical: 2–4 types per task
Monthly = 12, Weekly = 52, Daily = 260
Analyst avg: $45–75/hr
- Writing and maintaining regex for each pattern type
- Debugging false positives from regex-only matching
- Writing Luhn algorithm and SSA validation logic from scratch
- A separate normalization pass after extraction
- The compliance risk of using online tools for PII data
Testing Methodology
10 runs per config · drop high/low · report avg · test datasets available on request · re-tested quarterly
Honest Limitations: Where SplitForge Pattern Extraction Falls Short
No tool is perfect for every use case. Here's where Server-Side Entity Extraction (Python re / AWS Comprehend / spaCy) might be a better choice, and the real limitations of our browser-based architecture.
Browser-Based Processing
Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.
No Offline Mode (Initial Load)
Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.
Browser Tab Memory Limits
Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.
Browser Memory Ceiling (~1GB / 5–10M Rows)
Maximum practical file size is ~1GB (~5–10M rows, browser-dependent). Files larger than this risk hitting browser memory limits, especially with all 8 pattern types active.
No Custom Regex Patterns
Pattern Extraction supports only the 8 built-in types. User-defined regex patterns for custom schemas (account IDs, internal product codes, proprietary identifiers) are not currently supported.
No Automation or API
Pattern Extraction is a browser tool — no REST API, CLI, or pipeline integration. Cannot be embedded in ETL workflows, scheduled jobs, or triggered programmatically.
1M Pattern Result Cap
Results are capped at 1 million patterns per session. On very dense files where nearly every row contains multiple patterns, processing stops before the full scan completes.
Single File Per Session
Pattern Extraction scans one file at a time. No batch processing across multiple files in a single operation.
When to Use Server-Side Entity Extraction (Python re / AWS Comprehend / spaCy) Instead
You need pattern extraction in an automated pipeline or scheduled job
SplitForge has no API. Browser-only workflow cannot run on a schedule or be triggered programmatically.
You need custom regex patterns for proprietary data schemas
SplitForge supports only the 8 built-in types. Custom schemas require user-defined regex.
You need to scan 50M+ row files
Browser memory limits practical ceiling to ~5–10M rows. Server-side tools scale horizontally.
You need named entity recognition beyond structured patterns
SplitForge extracts structured patterns (email, phone, etc.). Extracting person names, organization names, or location references requires NER models.
Questions about limitations? Check our FAQ section below or contact us via the feedback button.
Frequently Asked Questions
How accurate is the 111K rows/second benchmark?
Why does throughput vary so much by validation mode?
What is the difference between CSV streaming mode and text paste mode?
How does validation mode affect result quality vs. speed?
What is the deduplication overhead?
Can I reproduce these benchmarks?
What file formats are supported and does format affect speed?
What happens when the 1M pattern result cap is reached?
Benchmarks last updated: February 2026. Re-tested quarterly and after major algorithm changes. Last algorithm update: February 2026 — improved IANA TLD validation set and strict mode performance by ~8%.
Ready to Extract 551K Patterns in 45 Seconds?
No installation. File contents never uploaded. 8 validated pattern types, post-extraction normalization, and a results preview — all in your browser.