Quick Answer
Pre-upload CSV validation catches five categories of errors that platforms find expensively: structural errors (wrong column count, malformed quotes), encoding errors (garbled characters), data type errors (text in number fields), completeness errors (empty required fields), and uniqueness errors (duplicates). Finding these before upload takes minutes. Finding them after takes hours.
The fix: Run a validation pass before every import — structure first, then encoding, then types, then completeness, then uniqueness. Five checks, 15 minutes, zero surprises.
Why it matters: Your platform's error handling after a failed import is designed to protect data integrity, not to help you debug. It stops the import, rolls back changes, and gives you a cryptic error report. Pre-upload validation finds the same errors with actionable row-level feedback before any data moves.
Why Validate Before Your Platform Does
Platforms validate on import for their own protection. You validate before import for yours.
| Platform validation | Pre-upload validation | |
|---|---|---|
| When | During import — after your data is transmitted | Before upload — in your control |
| Errors shown | One at a time (PostgreSQL) or batched (HubSpot) | All at once, with row numbers |
| Data privacy | File already uploaded to their server | File never leaves your machine |
| Partial import | Possible — some rows imported, some rejected | None — you fix before anything uploads |
| Feedback quality | Cryptic error codes, minimal context | Plain-language descriptions with row numbers |
| Recovery time | Hours — find, fix, re-import | Minutes — fix file, import once |
Most CSV validation tools upload your file to a remote server to process it. For files containing personal data, financial records, or protected health information, that upload occurs before you've verified the data is even correct — creating unnecessary regulatory exposure. SplitForge validates your CSV in Web Worker threads in your browser. For raw file contents, nothing reaches any server until you choose to upload to your actual platform.
Diagnose Your Situation in 60 Seconds
Before running full validation, identify where you are in the workflow:
| Your situation | What you need | Time required |
|---|---|---|
| About to import a new CSV | Full 5-check validation | 15 minutes |
| Import just failed with errors | Targeted validation on failing rows | 5 minutes |
| Recurring import from same source | Checks 1 + 4 + 5 only | 5 minutes |
| Large file (1M+ rows) | Automated validation tool | 2 minutes |
| CRM-specific import | Full validation + platform-specific check | 20 minutes |
Fast Fix (90 Seconds)
Fastest pre-upload validation you can run right now:
- Open in text editor — confirm columns are comma-separated, not semicolons.
- Check row 1 is headers — not data.
- Search for
,,— consecutive empty commas = empty required fields. - Check file size — compare against your platform's limit.
- Run SplitForge Data Validator — automated checks 1–4 across the full file in seconds.
For a thorough validation that catches the most common error types, continue below.
TL;DR: Validating before upload costs 15 minutes and prevents the hours of cleanup that follow a bad import. The five validation checks in this guide are ordered by the damage each error causes: structural errors first (they corrupt all subsequent rows), then encoding, types, completeness, and uniqueness. Run SplitForge Data Validator to automate the first four checks locally before any data leaves your machine.
Your operations team has been building a 10,000-row supplier database for three months. Today is go-live. The import to your ERP system runs for 4 minutes, then returns: "847 rows failed to import. See error report." You download the error report. It lists 12 different error types across 847 rows. Some rows failed for missing fields. Some for wrong date formats. Some for duplicate IDs. You could have caught all of them in 15 minutes this morning.
Pre-upload validation is not extra work. It's moving 15 minutes of debugging from the worst possible moment — after the import fails — to the best possible moment — before anything is uploaded.
Each validation check was verified against current platform behavior in HubSpot, Salesforce, PostgreSQL, and monday.com, March 2026.
Table of Contents
- Check 1: Structure Validation
- Example Validation Report
- Check 2: Encoding Validation
- Check 3: Data Type Validation
- Check 4: Completeness Validation
- Check 5: Uniqueness Validation
- What Platforms Check vs What They Miss
- Validation by File Scale
- What Validation Doesn't Catch
- Additional Resources
- FAQ
This guide is for: Anyone who imports CSV data regularly and wants to stop spending time debugging failed imports.
Check 1: Structure Validation
Structure validation catches formatting problems that corrupt all rows they appear in — not just the row with the problem.
What to validate:
Column count consistency: Every row should have the same number of fields as the header row. A row with too many or too few fields shifts all subsequent values into the wrong columns — silently.
What a column count error looks like:
❌ BROKEN:
Name,Email,Company,Phone
Alice,[email protected],Acme Corp,555-1234
Bob,[email protected],Globex, Inc.,555-5678 ← comma in company name, unquoted
Carol,[email protected],Initech,555-9012
Row 3 (Bob): "Globex" maps to Company, " Inc." maps to Phone, "555-5678" maps to... column 5
which doesn't exist. Row is rejected or shifted.
FIXED:
Bob,[email protected],"Globex, Inc.",555-5678
Quoting consistency: Fields containing commas, newlines, or double quotes must be enclosed in double quotes. An unmatched quote causes the parser to read multiple rows as one field.
Header row position: Row 1 must be headers. No blank rows above headers, no data above headers.
How to run structure validation:
In a text editor: visually inspect a sample of rows. Count commas in a few rows — they should match.
For large files: use SplitForge Data Validator. It scans every row for column count consistency and quoting errors, reports exact row numbers, and processes locally. On a 1-million-row file, this takes under 10 seconds.
For more on column count errors, see CSV column count mismatch fix.
Example Validation Report
This is what a validation output actually looks like when you run your CSV through SplitForge Data Validator. Paste this into your team's SOP as an example of what to fix before importing.
| Row | Column | Error type | Current value | Suggested fix |
|---|---|---|---|---|
| 3 | Amount | Type mismatch (numeric) | $1,250.00 | Remove $ and , → 1250.00 |
| 7 | Email | Unclosed quoted field | "[email protected] | Close the quote → "[email protected]" |
| 12 | Company | Empty required field | (blank) | Add value or placeholder |
| 18 | Email | Duplicate key | [email protected] | Already in rows 4 and 11 — remove duplicates |
| 24 | Close Date | Type mismatch (date) | March 15, 2026 | Convert to 2026-03-15 |
| 31 | (row) | Column count mismatch | 6 fields, expected 5 | Unquoted comma in Globex, Inc. — add quotes |
Row 7 (unclosed quote) is the most dangerous error here. An unclosed quote causes the parser to read rows 7 through whatever-row-closes-the-quote as a single field. In a 10,000-row file, that could shift hundreds of rows into the wrong columns — silently.
What a real validation output looks like (raw, not formatted):
Validating: crm-contacts-march-2026.csv
Total rows: 4,847 | Columns: 12 | File size: 2.3MB
ERRORS FOUND: 23 issues across 19 rows
Row 7 | email | Unclosed quoted field
| value: "alice johnson@
| CRITICAL: parser read rows 7-14 as single field — 8 rows lost
Row 15 | amount | Type mismatch: expected numeric
| value: $1,250.00
| Fix: remove $ and comma
Row 47 | close_date | Type mismatch: expected date (YYYY-MM-DD)
| value: March 15, 2026
| Fix: convert to 2026-03-15
Row 128 | close_date | Type mismatch: expected date
| value: 15/03/2026
| Fix: ambiguous format — convert to 2026-03-15
Row 203 | company | Empty required field
| value: (empty)
Row 541 | email | Duplicate key
| value: [email protected]
| Matches rows: 12, 341
Row 892 | required_notes| Empty required field
| value: (empty)
[...16 more errors...]
SUMMARY: Fix Row 7 first — it's causing 8 rows to parse incorrectly.
After fixing Row 7, re-run validation. Row count will update.
This is messier than a formatted table — and that's the point. Real validation output doesn't arrive pre-organized. Row 7 needs to be fixed before the row count is even accurate, which means the 23 errors might become 15 or 31 after that single fix.
Fix all errors in the report, then re-validate before uploading. One clean validation pass = one upload.
Encoding validation catches character corruption before it reaches your platform and corrupts stored data.
What to validate:
Is the file UTF-8? Every major platform requires UTF-8. Non-UTF-8 files produce garbled characters — but only for rows containing special characters. Rows with only ASCII characters (standard English letters and numbers) look fine, masking the problem.
Does the file have a BOM? A UTF-8 BOM is invisible to humans but breaks some importers (PostgreSQL COPY, certain CRM systems). The BOM causes the first column header to appear garbled.
How to check:
Open in Notepad++ — status bar shows encoding. If it says "UTF-8-BOM", there's a BOM. If it says "ANSI", it's Windows-1252.
Or run SplitForge Delimiter & Encoding Fixer — it detects encoding and BOM automatically and shows you a preview of how the data will look after conversion.
How to fix:
If Windows-1252: re-save from Excel as "CSV UTF-8 (Comma delimited)" or convert via Google Sheets.
If UTF-8-BOM: open in Notepad++ → Encoding → Convert to UTF-8 (not BOM). Or run through SplitForge — BOM stripping is automatic.
For the complete encoding fix guide, see CSV file not opening correctly.
Check 3: Data Type Validation
Data type validation catches value-level mismatches that cause row-level rejections — or worse, silent data corruption in platforms that coerce types.
What to validate:
Numbers contain only numbers: No $, €, %, commas as thousands separators, or text null representations like N/A in numeric columns.
Dates are YYYY-MM-DD: No mixed formats within a date column. No ambiguous 05/03/2026 values.
Emails follow valid format: Standard email format validation ([email protected]). No extra spaces, no missing @, no invalid domains.
Booleans use consistent values: true/false or yes/no — not a mix. Platform-specific requirements vary.
A type validation reference table:
| Field type | Acceptable | Rejected/coerced |
|---|---|---|
| Integer | 1234, -56 | $1,234, 1,234, na |
| Decimal | 12.50, -3.14 | $12.50, 12,50 (European) |
| Date | 2026-03-18 | 03/18/2026, March 18, 18/03/2026 |
[email protected] | name@ domain.com, name@domain | |
| Boolean | true, false | True, TRUE, yes, 1 (varies by platform) |
| Phone | +15552345678 | (555) 234-5678 (works but inconsistent) |
How to run type validation:
Manual: scan each column visually in a spreadsheet. Use Excel's data validation feature to flag non-conforming values.
Automated: SplitForge Data Validator validates each column against specified types and flags every non-conforming value with row number and current value. On large files, this replaces hours of manual column scanning.
For data type mismatch fixes, see CSV data type mismatch fix.
Check 4: Completeness Validation
Completeness validation catches missing required values before they cause row rejections.
What to validate:
Required fields have values in every row: Every platform has required fields. Empty values in required fields produce row-level rejections that aren't obvious from the import summary.
No unexpected empty rows: A blank row mid-file terminates import in several platforms (QuickBooks, MySQL). Empty rows at the end add phantom records in platforms that don't ignore trailing blanks.
Key fields aren't just whitespace: A cell containing only spaces looks filled but isn't. Many platforms catch this; some don't. Strip whitespace from all fields before importing.
How to check:
In Excel: use =COUNTA(A2:A5000) on each required column. Compare against your total row count. Any discrepancy = missing values.
For the full approach to completeness validation, see our CSV import checklist.
Check 5: Uniqueness Validation
Uniqueness validation catches duplicates before they create duplicate records — which are harder to find and remove after the fact than before.
Two uniqueness checks:
Within-file duplicates: Does your CSV have two rows with the same email, ID, or other key field? This is a data quality problem, not a technical one — the import will succeed but you'll have duplicates.
Cross-database duplicates: Does your CSV contain records that already exist in the target platform? An email that's new in your spreadsheet might already be in your CRM from a previous import.
Deduplication by field type:
| File type | Key field to deduplicate on |
|---|---|
| Contact/lead list | Email address |
| Product catalog | SKU or Product ID |
| Transaction data | Transaction ID |
| Employee records | Employee ID |
| Address data | Address + postal code |
How to check within-file duplicates:
In Excel: select your key column → Home → Conditional Formatting → Duplicate Values. Highlighted cells are duplicates.
Automated: SplitForge Remove Duplicates finds and removes duplicates by any column, handles large files, and runs locally.
How to check cross-database duplicates:
Export your existing records from the target platform. Use SplitForge VLOOKUP Join to match email addresses between your import CSV and the export. Remove matches from your import file before uploading.
What Platforms Check vs What They Miss
Understanding what your platform validates helps you focus your pre-upload efforts.
| Validation type | PostgreSQL | HubSpot | Salesforce | Notion | monday.com |
|---|---|---|---|---|---|
| Column count | Strict | Flexible | Flexible | Flexible | Flexible |
| Encoding | Strict | Partial | Partial | Partial | Partial |
| Data types | Strict | Coerces | Enforces picklists | Silent drop | Silent drop |
| Required fields | NOT NULL | Email required | Last Name + Company | Silent drop | Silent drop |
| Duplicates | None | On Email | Configurable | None | None |
The dangerous column: Silent drop. Platforms that silently drop non-conforming values (Notion, monday.com for status fields) report import success even when data is wrong. Pre-upload validation is the only way to catch these before they happen.
Validation by File Scale
Different file sizes require different approaches:
Under 1,000 rows: Manual validation in Excel is feasible. Run through the 5 checks above by eye.
1,000–100,000 rows: Automated validation is faster and more accurate. SplitForge Data Validator scans the full file in seconds, reports every error by row number, and lets you download a report.
100,000–1,000,000 rows: Manual validation is impossible. Automated validation is mandatory. Large files require streaming validation — the file is processed in chunks without loading entirely into memory. SplitForge handles this architecture natively in your browser.
Over 1,000,000 rows: Consider splitting before validating. Use SplitForge CSV Splitter to divide into manageable batches, validate each batch, then import sequentially.
Browser memory note for very large files: Files above approximately 500–800MB may cause browser memory pressure even with streaming parsers, depending on your machine's available RAM and browser version. For files in that range, split first using SplitForge CSV Splitter, validate the chunks separately, then import sequentially. The validation results will be identical — streaming validation doesn't lose accuracy when applied to chunks.
For large-scale validation performance, see our guide on validating CSV before import at 10M rows.
What Validation Doesn't Catch
Pre-upload validation is powerful but not omniscient. Knowing its limits prevents false confidence.
Validation catches:
- Structural errors (column count, quoting, malformed rows)
- Encoding issues (wrong charset, BOM characters)
- Data type mismatches (text in numeric fields, wrong date formats)
- Empty required fields (by column name)
- Duplicate records (by key field)
Validation does NOT catch:
- Platform-specific picklist values — validation doesn't know your CRM's configured Lead Source or Status options. A value of
"Trade Show"is structurally valid even if your Salesforce org uses"Tradeshow"(no space). Only a test import reveals picklist mismatches. - Cross-database logic — validation can't check whether a referenced entity (customer ID, account name) already exists in your target system. NetSuite's reference key errors only appear at import time.
- Business logic rules — custom field validation rules, conditional required fields, and workflow triggers are platform-enforced, not file-level issues.
- API rate limits — large files that would exceed your platform's hourly import limit appear valid but fail mid-import.
- Post-import consequences — a structurally valid CSV that imports successfully can still trigger unwanted automations, fire notification emails, or create records in the wrong pipeline stage.
The correct sequence: Validate locally → import 5 rows as test → verify all field mapping, picklist values, and automation behavior on the test → import the full file. Validation eliminates structural failures. The test import catches platform-logic failures.
Additional Resources
Standards:
- RFC 4180: CSV Format Specification — Official CSV structure rules for quoting, escaping, and field separation
- Unicode UTF-8 Standard — UTF-8 encoding specification
- GDPR Article 5 — Data minimization — Principle of collecting only necessary data, relevant to validation before upload
Platform Documentation:
- HubSpot import troubleshooting — Official guide to post-import error diagnosis
- PostgreSQL COPY documentation — Complete COPY command reference and validation options
Related Guides:
- CSV import checklist — 10 things to verify before every upload
- Prepare CSV for CRM import — CRM-specific prep steps
- CSV file validation guide — How validation works technically
- CSV import errors complete guide — Full error taxonomy