Validate Any CSV or Excel File Before It Breaks Your CRM Import
Stop discovering data errors after a failed Salesforce import. Data Validator catches malformed emails, missing required fields, invalid formats, and duplicate records before you upload β saving hours of back-and-forth. Read why CRMs reject 30% of imports or how to validate CSV files automatically.
12 validation rule types. 20+ data types including NPI, ICD-10, and CPT codes. 15+ CRM/database presets. Blocking vs warning error levels. Export failed rows separately.
What is Data Validator?
Data Validator is a browser-based tool that checks your CSV or Excel file against validation rules before you import it anywhere. You define what "valid" means for each column β required fields, data formats, allowed values, uniqueness constraints β and Data Validator tells you exactly which rows fail and why. All processing runs locally in your browser. No files are uploaded.
The CRM Import Cycle of Pain
Every week, for millions of data teams:
- β’ Export CRM contacts from old system or spreadsheet
- β’ Open in Excel to "do a quick check"
- β’ "Looks fine." Upload to Salesforce/HubSpot.
- β’ Wait 15 minutes for the import job
- β’ Salesforce returns: "Import failed β 847 errors"
- β’ Error file shows: Row 2847: Email field exceeds maximum length (82 > 80 chars)
- β’ Manually fix errors. Re-upload. Repeat 2β3 more times.
- β’ Total time lost: 2.5+ hours per failed import cycle.
Same workflow, 11 minutes instead of 2.5 hours:
- β’ Export CRM contacts from old system or spreadsheet
- β’ Drop CSV into Data Validator. Select "Salesforce Contacts" preset.
- β’ 8 seconds: validation complete. 847 errors found β all listed with row, column, rule, and value.
- β’ Export failed rows CSV. Bulk fix in Excel or Data Cleaner.
- β’ Re-validate. 0 blocking errors.
- β’ Upload to Salesforce. Import succeeds on first attempt.
- β’ Total file never left your browser β PHI stayed local.
- β’ Total time: ~11 minutes.
The Real Cost of Failed Imports
β’ 3 upload attempts Γ 15 min wait = 45 min
β’ Excel cleanup per attempt = 30β45 min
β’ Re-formatting + re-uploading = 20 min
β’ Total per cycle: ~2.5 hours
β’ At $60/hr data analyst rate
β’ Cost per cycle: ~$150
β’ Validate CSV: ~8 seconds
β’ Export failed rows: 2 seconds
β’ Bulk fix in Excel/Data Cleaner: ~10 min
β’ Re-validate: 8 seconds
β’ Total per cycle: ~11 minutes
β’ Tool cost: $0
β’ Cost per cycle: ~$11
Quick Comparison
TL;DR β What Data Validator does:
- Rule types: required, dataType, length, range, regex, enum, uniqueness (12 total)
- Data types: email, phone, URL, date (8 formats), NPI, ICD-10, CPT, SSN, integer, float, boolean, currency, and more (20+)
- Presets: Salesforce Contacts/Leads, HubSpot CRM, PostgreSQL, MySQL, and 10+ more
- Error levels: blocking (stops import) vs warning (flagged for review)
- Export: failed rows CSV, passed rows CSV, validation report JSON
- Privacy: 100% browser-based β file contents never uploaded to any server
Stop the Import Failure Loop
Validate your CSV before the first upload attempt. Catch all blocking errors in seconds β not after a 15-minute upload wait and a cryptic error report.
How to Validate a CSV File Before a Salesforce or HubSpot Import
The most common cause of failed CRM imports is data that violates field-level rules the CRM only checks at upload time β malformed emails, fields that exceed max length, missing required values, duplicate records. Data Validator checks all of these rules locally in your browser before you attempt an upload. Upload once, import cleanly. See also: how to remove duplicate emails before CRM import for deduplication strategies.
What Data Validator Checks β And What Excel Can't
Where Excel Data Validation Fails You
12 Validation Rule Types
Required (non-empty), dataType (20+ types), length (min/max chars), range (numeric min/max), regex (custom pattern), enum (allowed values list), uniqueness (no duplicates across column), and more. Each rule can be set as blocking (stops import) or warning (flagged for review).
20+ Data Types
Email (RFC 5322 + common format checks), phone (US formats + international), URL, 8+ date formats (YYYY-MM-DD, MM/DD/YYYY, ISO 8601), integer, float, boolean, currency, NPI (10-digit + Luhn check), ICD-10-CM codes (70K+ valid codes), CPT codes (10K+ AMA codes), SSN, taxonomy codes, and more.
Custom Regex Rules
Write your own regular expression for any column β product codes, internal IDs, custom date formats, zip code patterns. Regex rules are compiled once and applied across all rows without performance degradation.
Hash-Based Uniqueness Checking
Build a complete hash set of all values in a column and check each row against it β across 10M rows in ~37 seconds. No COUNTIF formula that slows down at 100K rows. Validates the Email field for Salesforce duplicate rejection before it happens.
Export Failed / Passed Rows
One-click export of failed rows (for bulk fixing) and passed rows (for immediate import). Each exported row includes the original data, which rules failed, and the invalid values. Ready to open in Excel, Google Sheets, or Data Cleaner.
15+ CRM/Database Presets
Pre-configured rule sets for Salesforce Contacts, Salesforce Leads, HubSpot CRM Contacts, PostgreSQL VARCHAR constraints, MySQL data types, and more. Each preset includes the standard field requirements for that platform β select and validate in one step.
Data Validator vs Excel vs Manual Validation vs Python
Real-World Validation Scenarios
Salesforce Contacts Import β 87,000 Rows
A sales ops team exports 87,000 contacts from their old CRM. Without validation, they'd discover errors only after Salesforce rejects the import.
- β’ Upload to Salesforce β wait 15 minutes
- β’ Import fails: 1,859 errors
- β’ Error file: row-by-row, one error per row shown
- β’ Fix 200 rows manually β re-upload
- β’ Fails again: 800 more errors not shown in first attempt
- β’ 3 upload cycles later: finally imports
- β’ Total time: 2.5+ hours
- β’ Drop CSV β select Salesforce Contacts preset
- β’ 8 seconds: 1,859 errors identified, all at once
- β’ Error breakdown: 1,247 email length > 80 chars, 412 duplicate emails, 200 missing LastName
- β’ Export failed rows CSV
- β’ Bulk fix in Data Cleaner (truncate emails, deduplicate, fill required fields)
- β’ Re-validate: 0 blocking errors
- β’ Upload to Salesforce β imports cleanly
- β’ Total time: 11 minutes
A healthcare billing team needs to validate NPI, ICD-10, and CPT codes in a claims file before submission. All data must stay local β PHI cannot leave the browser.
An e-commerce team migrates product data between platforms. Every SKU must have a unique identifier, valid price format, and required category field.
Technical Deep Dive: How Data Validator Handles Edge Cases
Honest documentation of validation behavior on tricky real-world data.
Email Validation β What "Valid" Actually Means
Uniqueness Checking at 10M Rows β Memory and Performance
NPI, ICD-10, and CPT Validation β Healthcare Code Accuracy
Excel / XLSX Validation β What Changes vs CSV
Blocking vs Warning β When to Use Each
Data Validator Is Perfect For
- β’ Pre-import validation before Salesforce, HubSpot, or any CRM
- β’ Healthcare data with NPI, ICD-10, CPT codes that must stay local
- β’ Files too large for Excel Data Validation (1M+ rows)
- β’ Uniqueness checking across millions of rows
- β’ Teams where configuring Python validation is too slow
- β’ Validation with custom regex rules
- β’ Exporting failed rows for bulk fixing
- β’ Multiple validation passes β validate, fix, re-validate
- β’ Database migration data quality checks
- β’ One-off file validation where setting up Python scripts is overkill
Not Ideal For
- β’ Automated, scheduled, or CI/CD pipeline validation (no API)
- β’ Files over ~2GB / 15M rows (browser memory limits)
- β’ Streaming or real-time data (batch file only)
- β’ Team-shared schemas with version control
- β’ Statistical data quality (distributions, outliers) β use Data Profiler
- β’ Non-CSV/Excel formats (JSON, Parquet, Avro, databases)
- β’ Validation that needs to run headlessly on a server
- β’ More than 50M rows (use Python Great Expectations)
- β’ Validation schemas shared across teams
- β’ Integration with dbt, Airflow, or similar workflow orchestration
Performance: Up to 500K Rows/Sec
Two Validation Modes
β’ Test file: 10M rows, Email + FirstName + LastName + Phone + Company
β’ Rules: Email required + valid format + unique + max 80 chars + LastName required
β’ 50M individual validation checks total (5 rules Γ 10M rows)
β’ Uniqueness hash table for Email column: ~1.2GB peak at 10M unique emails
Frequently Asked Questions
How does Data Validator prevent Salesforce import failures?
What validation rules does Data Validator support?
Can Data Validator handle healthcare data validation?
How fast can Data Validator process large files?
Is my data safe? What about HIPAA compliance?
What's the difference between a blocking error and a warning?
Does Data Validator work with Excel files?
What happens if my file has thousands of errors?
Can I save my validation schema to reuse later?
How does uniqueness checking work across 10M rows?
Why We Built This
Every data team has a story like this: a CRM migration that was supposed to take a day turned into a week because of import failures nobody could fully diagnose until after each upload attempt. The error files were cryptic. Excel's validation rules were inadequate. Python scripts took longer to write than the cleanup itself.
We built Data Validator because the existing options β Excel Data Validation (256 rules, no regex, no uniqueness), manual review (impractical at scale), Python scripts (overkill for one-off files) β all fail at the moment you need them most: when you have 87,000 contacts to import and a 3 PM deadline.
The principle is simple: validate locally, see all errors at once, export failed rows ready for bulk fixing, re-validate before upload. No failed import cycles. No data leaving your browser.
β SplitForge Engineering, 2026
"But I Already Use..."
"I already use Excel Data Validation"
"I just do a manual review in Excel"
"I use Python to validate my data"
"I'll just fix errors after the import fails"
Related Tools
Data Cleaner
Fix the errors Data Validator finds. Standardize formats, fill missing values, trim whitespace, remove duplicates β in bulk.
Data Profiler
Understand your data before you validate it. Type detection, statistics, null rates, anomaly detection, correlations β 11 analysis types.
Data Masking
Mask PII and PHI before sharing files. Anonymize emails, phones, SSNs, and more β preserve format while protecting sensitive data.
Ready to Validate Your Data?
Stop the import failure loop. Validate your CSV locally, catch all blocking errors at once, and import cleanly on the first attempt.