Benchmark Performance
Detailed Performance Metrics
| File Size | Exact Match | Fuzzy Matching | Notes |
|---|---|---|---|
| 100K rows | ~0.2 sec | ~0.3 sec | All columns, exact match |
| 500K rows | ~1.1 sec | ~1.4 sec | Multi-column key |
| 1M rows | ~2-3 sec | ~3-4 sec | Names (Moderate 85%) preset |
| 5M rows | ~11-13 sec | ~14-17 sec | CRM Contacts Smart preset |
| 10M rows | ~22-25 sec | ~27-32 sec | With audit trail generation |
| ~1.3GB file | ~25-30 sec | ~35-42 sec | Maximum tested capacity |
Feature Performance Breakdown
How each advanced feature affects throughput on 10M row datasets
Calculate Your Time Savings
Typical: 100K–5M rows
Weekly = 52, Monthly = 12
Analyst avg: $45–75/hr
- Manual sort → filter → inspect cycles eliminated
- Audit trail auto-generated (no manual documentation needed)
- False positive risk dramatically reduced via column weighting + guardrails
- Data recovery time eliminated (duplicates file preserved for review)
Testing Methodology
10 runs per config • drop high/low • report avg + range • test datasets available on request
Honest Limitations: Where SplitForge Remove Duplicates Falls Short
No tool is perfect for every use case. Here's where Server-Side Dedup Tools (Datablist / Dedupe.io / SQL) might be a better choice, and the real limitations of our browser-based architecture.
Browser-Based Processing
Performance depends on your device's RAM and CPU. Modern laptops (2022+) handle 10M+ rows easily, but older devices may struggle with very large files.
No Offline Mode (Initial Load)
Requires internet connection to load the tool initially. Processing happens offline in your browser after loading.
Browser Tab Memory Limits
Most browsers limit individual tabs to 2-4GB RAM. This is the practical ceiling for file size.
Browser Memory Ceiling (~1.3GB / 10M Rows)
Maximum file size is ~1.3GB (~10M rows). Larger datasets require server-side deduplication tools or database-level DISTINCT operations.
Fuzzy Matching Speed on Very Large Files
Fuzzy matching on 10M rows with broad presets (Loose 75%) can take 40-55 seconds as more candidate pairs must be evaluated. Performance varies with duplicate density.
No API or Automation Support
SplitForge is a browser tool — no REST API, CLI, or pipeline integration. Can't be embedded in automated ETL workflows or CI/CD pipelines.
Single-User Processing (No Shared Configs)
Fuzzy matching configurations and column weights can't be saved, versioned, or shared across team members. Each user configures independently.
When to Use Server-Side Dedup Tools (Datablist / Dedupe.io / SQL) Instead
You need to deduplicate 50M+ rows daily
Browser memory limits SplitForge to ~10M rows. Server-side tools scale horizontally without memory constraints.
You need automated deduplication in a CI/CD or ETL pipeline
SplitForge has no API. Manual browser workflow doesn't work for scheduled automation.
You need CRM-native deduplication (Salesforce/HubSpot built-in)
If your duplicates live in CRM records rather than CSV exports, native CRM tools operate on live records without export/import cycles.
You need real-time deduplication as data streams in
SplitForge processes static files, not data streams. Can't intercept records at write time.
Questions about limitations? Check our FAQ section below or contact us via the feedback button.
Frequently Asked Questions
How accurate are the 350–450K rows/second benchmarks?
How does fuzzy matching affect performance?
Why does Excel struggle with large-file deduplication but SplitForge doesn't?
What is Union-Find transitive matching and why does it matter?
What are guardrails and how do they affect performance?
How does column weighting work technically?
How does SplitForge compare to Datablist or Deduplify?
What file sizes have been successfully tested?
Does generating an audit trail slow down processing?
Can I reproduce these benchmarks myself?
Benchmarks last updated: February 2026. Re-tested quarterly and after major algorithm changes.