Navigated to blog › match-customer-records-two-csv-files
Back to Blog
csv-guides

How to Match Customer Records Across Two CSV Files

March 15, 2026
13
By SplitForge Team

Quick Answer

Matching customer records across two CSV files means finding rows in File A that correspond to the same real-world customer in File B, then combining their data into a single row.

Why it's hard: Key columns rarely match cleanly across systems — email addresses differ (uppercase vs lowercase), IDs use different formats, and names have typos. Excel VLOOKUP requires exact matches and crashes on large files.

The fix: Use SplitForge VLOOKUP & Join to match on one or more key columns with case normalization and optional fuzzy matching — no formulas, no Python, no upload.


Fast Fix (90 Seconds)

If you need to match two CSV files right now:

  1. Open SplitForge VLOOKUP & Join — no account required
  2. Upload File A (your primary file) and File B (the lookup file)
  3. Select the matching key column(s) — email, customer ID, or name
  4. Choose join type: Inner (matched only), Left (all of File A), or Full Outer (everything)
  5. Download the merged output

Your files never leave your browser.


TL;DR: Customer record matching across two CSV files fails in Excel when files are large, keys are inconsistent, or fuzzy matching is required. SplitForge VLOOKUP & Join handles all three — exact and fuzzy key matching, files of any size, and configurable join types that control which records appear in the output.


Your billing team exports from Stripe. Your CRM team exports from Salesforce. You need to combine them to build a customer health report — MRR from billing, contact details and last activity from CRM. The customer IDs are different between systems. The only common field is email address, but Salesforce has them in lowercase and Stripe has them mixed-case. Excel VLOOKUP fails because "[email protected]" ≠ "[email protected]".

Each matching scenario was tested using SplitForge VLOOKUP & Join against real CRM, billing, and operations CSV exports ranging from 3,000 to 500,000 rows per file, March 2026. In one revenue reconciliation we processed, 8% of email mismatches were case differences alone — a silent failure that produces wrong match counts without any error message.


Excel VLOOKUP requires exact character-for-character matches. It has no case normalization, no fuzzy matching, and no multi-column key support without complex nested formulas. For customer data that spans systems — which is virtually always inconsistent — VLOOKUP can produce significant silent miss rates due to case differences, format variations, and trailing spaces. This guide covers the matching strategies that actually work on production data.

Each matching scenario was tested using SplitForge VLOOKUP & Join against real CRM, billing, and operations CSV exports ranging from 3,000 to 500,000 rows per file, March 2026. In one revenue reconciliation we processed, 8% of email mismatches were case differences alone — a silent failure that produces wrong match counts without any error message.

Benchmark environment: Chrome 122, Apple M2 / 16GB RAM, join keys: email and customer ID columns. Processing times scale with combined row count of both files.


Choosing Your Matching Strategy

Before running any join, identify the best available key column(s) and the likely failure modes:

Key ColumnReliabilityCommon FailureFix
Customer/Account IDHigh (if same system generated both)ID formats differ (C-1047 vs 1047)Strip prefix before matching
Email addressMedium-HighCase differences, typos, shared emailsCase-normalize; deduplicate first
Phone numberMediumFormat differences (+1-555 vs 5551234567)Normalize to E.164 before matching
Company nameLow-MediumAbbreviations, typos, legal vs common nameFuzzy match with threshold
Full nameLowFormatting, nicknames, middle namesLast resort; combine with secondary key

The best key is the most system-generated, least human-entered column you have. Email is usually more reliable than name. A customer ID that exists in both systems is most reliable of all.


Table of Contents


This guide is for: Data analysts, CRM administrators, and operations teams who need to combine data from two different CSV files that share a common customer identifier.


How to Join Two CSV Files in VLOOKUP & Join

Step 1: Prepare your files

Before uploading, confirm:

  • Both files have a header row
  • The key column exists in both files (it doesn't need to have the same name — you'll map them in the next step)
  • Files are saved as CSV (not Excel .xlsx format — if they're Excel files, convert them first with SplitForge Excel to CSV)

Step 2: Upload and configure

  1. Open SplitForge VLOOKUP & Join
  2. Upload File A (your primary/left file — the one whose rows you want to keep)
  3. Upload File B (your lookup/right file — the one you're pulling data from)
  4. Map the key columns: select which column in File A matches which column in File B

Step 3: Select columns to include in output

From the column selector, choose which columns from each file to include in the output. You don't need to include all columns — typically you want the key columns plus a handful of data columns from each file.

Step 4: Choose join type and download

Select your join type (see Understanding Join Types below), click Join, and download the merged output.

Worked Join Example

Here's what a left join looks like on a simple customer dataset.

File A — CRM export (Salesforce):

email,name,last_activity
[email protected],John Smith,2026-02-15
[email protected],Sarah Jones,2026-01-28
[email protected],Mike Chen,2026-03-01

File B — Billing export (Stripe):

email,plan,mrr
[email protected],Pro,299
[email protected],Starter,49

Join key: email (with case normalization enabled)

Result (Left Join — all CRM rows, billing data where matched):

email,name,last_activity,plan,mrr
[email protected],John Smith,2026-02-15,Pro,299
[email protected],Sarah Jones,2026-01-28,Starter,49
[email protected],Mike Chen,2026-03-01,,

Mike Chen appears in the output with blank plan and MRR columns — he exists in the CRM but not in billing. With an Inner Join, his row would be excluded entirely. The join type choice determines whether that blank row is useful to you.


Handling Case Differences in Key Columns

Email addresses are the most common join key and the most common source of case-related mismatches. "[email protected]" and "[email protected]" represent the same customer but fail an exact match.

Enable Case Normalization in the join configuration. This converts all key column values to lowercase before matching. The output retains the original case from File A — normalization is only applied during the match comparison, not in the output data.

The same option handles:

  • CRM systems that capitalize email domains (gmail.com vs Gmail.com)
  • Marketing platforms that uppercase all email addresses on import
  • Legacy systems that stored data in ALL CAPS before normalization was implemented

In one Stripe-to-Salesforce reconciliation we processed, enabling case normalization alone raised the match rate from 82% to 94%.


Multi-Column Key Matching

Single-column keys fail when they're not unique across files. A common name like "John Smith" might appear 15 times in one file. Matching on name alone produces multiple false positives.

Use multi-column key matching to combine two or more key columns for a more precise match. Common combinations:

Primary KeySecondary KeyPrecision
EmailCompany NameHigh — same email at same company
First NameLast NameMedium — better than full name alone
First Name + Last NameCompanyHigh — same person at same company
PhoneZip CodeMedium — same number in same area

To configure multi-column matching, add a second key column mapping in the join configuration. Both columns must match for a row to be considered a match. This significantly reduces false positives in name-based matching.


Fuzzy Matching for Name Columns

Fuzzy matching finds near-matches rather than exact matches. "Acme Corp" matches "ACME Corporation" with fuzzy matching. "Johnson" matches "Johnsen" at a lower threshold.

When to use fuzzy matching:

  • Company name matching across systems (legal name vs common name)
  • Name matching where typos or transliterations exist
  • Address matching where abbreviations differ

Setting the threshold: The fuzzy match threshold (0.0–1.0) controls how similar strings must be to count as a match. A threshold of 0.9 requires very high similarity. A threshold of 0.6 allows more variation but produces more false positives.

For company names, start with 0.85 — this catches most abbreviations and minor variations while excluding clearly different names. For personal names, use 0.9 or higher — name matching is more prone to false positives than company name matching.

Fuzzy threshold reference:

ThresholdWhat it matchesFalse positive riskBest for
0.95–1.0Near-identical stringsVery lowTypo correction only
0.85–0.94Abbreviations, minor variationsLowCompany names
0.75–0.84Significant variationsMediumInformal vs legal names
0.60–0.74Loose similarityHighLast resort only — always review

Always review the fuzzy match preview before downloading. Sort by match score to inspect the lowest-scoring matches — these are most likely to contain false positives.


Understanding Join Types

The join type controls which records appear in your output:

Join TypeWhat It ReturnsWhen to Use
Inner JoinOnly rows that matched in both filesYou only want complete records
Left JoinAll rows from File A, plus matched data from File BYou want all File A rows, matched or not
Right JoinAll rows from File B, plus matched data from File AYou want all File B rows, matched or not
Full Outer JoinAll rows from both filesYou want everything, matched and unmatched

Most common choice: Left Join. You want to keep all rows from your primary file (File A) and enrich the ones that have a match in File B. Rows that don't match get blank values in the File B columns — you can filter those out afterward.

For reconciliation: Full Outer Join. You want to find records that exist in one file but not the other. After the join, filter for rows where File A columns are blank (exists in B only) or File B columns are blank (exists in A only).


Diagnosing Low Match Rates

If your match rate is unexpectedly low, work through these in order:

  1. Check key column selection. The wrong column is the most common cause. Verify the selected columns actually contain the same customer identifier in both files.

  2. Check for leading/trailing spaces. A key value of "[email protected] " (trailing space) won't match "[email protected]". Run both files through Data Cleaner to trim whitespace before matching.

  3. Check encoding. UTF-8 and Windows-1252 produce different byte representations of accented characters. Keys containing names like "García" or "Müller" won't match if files use different encoding. Use Format Checker to normalize encoding.

  4. Check for format differences in IDs. "C-1047" and "1047" won't match. Use Find & Replace to strip prefixes before matching.

  5. Use profiling to compare key columns. Run Data Profiler on both files and compare the distinct value counts for the key column. If one file has 50,000 distinct emails and the other has 48,000, you'd expect ~96% match rate. If your actual match rate is 60%, the key column isn't the right identifier.

For more on CSV join workflows and data quality, see our VLOOKUP 1M+ rows guide and the complete CSV import errors reference.


Performance Benchmarks

Join time scales with the combined row count of both files and the join type. These benchmarks were measured on Chrome 122, Apple M2 / 16GB RAM, joining on a single email key column.

File A rowsFile B rowsJoin typeRuntime
3,0005,000Inner0.2s
50,00080,000Left1.1s
200,000150,000Left3.4s
500,000400,000Left4.8s
500,000400,000Full Outer6.2s

Full Outer joins take longer because the tool must track unmatched rows from both files. Inner joins are fastest because only matched rows are written to output.

How This Compares to SQL and Excel VLOOKUP

For context, the SQL equivalent of a Left Join on email is:

SELECT a.*, b.plan, b.mrr
FROM crm_export a
LEFT JOIN billing_export b ON LOWER(a.email) = LOWER(b.email)

SQL requires a running database, query knowledge, and both files loaded into tables. Excel VLOOKUP requires both files open simultaneously, uses exact matching only, and crashes on files above ~500MB.

SplitForge runs the same join in your browser — no database, no code, no file size limit, no upload.


Additional Resources

Join and Matching Concepts:

Fuzzy Matching:

Browser Processing:

Privacy:


FAQ

Excel VLOOKUP returns a value from a single column in a lookup table based on an exact key match. SplitForge VLOOKUP & Join performs a full relational join — it merges multiple columns from both files, supports case normalization, multi-column keys, fuzzy matching, and handles files that Excel cannot open. For large files or imperfect keys, it's a significant upgrade.

It depends on your join type. With a Left Join, unmatched File A rows appear in the output with blank values in the File B columns. With an Inner Join, unmatched rows are excluded entirely. Choose based on whether you need to keep unmatched records.

Yes. You can add multiple key column mappings. All specified key columns must match for two rows to be considered a match. This reduces false positives significantly on common key columns like names.

VLOOKUP & Join uses streaming join algorithms with no row limit. Files with 1 million rows per file process in under 30 seconds in Chrome on modern hardware. The join operation runs in dedicated Web Worker threads and does not block the browser UI.

The current tool supports two files per join operation. To join three files, perform two sequential joins: first join File A to File B, then join the result to File C.

Yes. The tool auto-detects the delimiter for each uploaded file independently. You can join a comma-delimited file to a semicolon-delimited file without any preprocessing.


Match Records Across Files Without Formulas

Exact and fuzzy key matching with case normalization
Multi-column key support for high-precision matching
All join types: Inner, Left, Right, and Full Outer
Files process entirely in your browser — customer data never leaves your machine

Continue Reading

More guides to help you work smarter with your data

ai-data-prep

AI-Ready Data Checklist: 10 Things to Verify Before Upload (2026)

Before uploading to ChatGPT, Claude, or a fine-tuning API, run through this 10-point checklist. UTF-8 encoding, clean headers, PII removed, size within limits.

Read More
ai-data-prep

Convert Excel to JSON for AI APIs and LLM Pipelines (2026)

AI APIs and LLM pipelines expect JSON, not spreadsheets. Fine-tuning needs JSONL; direct prompts take arrays. Convert locally — no upload, no conversion server.

Read More
ai-data-prep

Prepare Data for AI: The Complete Guide (Privacy-First, 2026)

How to prepare a CSV or Excel file for ChatGPT, Claude, or an AI API — encoding, PII, format, size, and privacy. The complete local-first prep workflow.

Read More