Fast Fix (90 Seconds)
If you have emails, phone numbers, or URLs mixed into a CSV column right now:
- Open SplitForge Pattern Extraction — no account required
- Upload your CSV or paste the column contents
- Select the pattern type: Email, Phone, or URL
- Click Extract — matches appear in a new column
- Download the clean result and import to your CRM or tool
Files process entirely in your browser. Nothing is uploaded.
TL;DR: Most CSV exports bury contact information inside notes fields, address columns, or mixed-format cells. Manually hunting through hundreds of thousands of rows is not viable. SplitForge Pattern Extraction identifies emails, phone numbers, and URLs using browser-based matching — no Python, no upload, no code — and outputs clean, separate columns ready for import.
Your CRM export landed in your inbox. 80,000 rows, exported from a legacy system that crammed everything into a single "Notes" column — email addresses, phone numbers, website URLs, follow-up dates, all mixed together in free text.
Your task is to get the emails into Mailchimp, the phone numbers into your dialer, and the URLs into a prospecting sheet. By hand, that's weeks of copy-paste. With Python, it's a regex project that assumes you know what you're doing. With a cloud extraction tool, you're uploading 80,000 rows of customer contact data to a server you don't control.
There's a fourth option: browser-based pattern extraction that runs locally, touches nothing on a server, and takes about 30 seconds.
Each pattern type was tested using SplitForge Pattern Extraction against real-world CRM exports ranging from 12,000 to 180,000 rows, March 2026. In one Salesforce export we analyzed, 38% of email addresses were buried in the Notes field rather than the dedicated Email column — a common finding in legacy CRM migrations.
Most online email and phone extraction tools process your file on remote servers. Under standard SaaS terms of service, uploaded files are typically retained for 30–90 days for logging and debugging purposes. For files containing customer emails and phone numbers, this creates GDPR Article 5 data minimization exposure. SplitForge Pattern Extraction runs entirely in your browser — files are streamed and processed in dedicated Web Worker threads to avoid blocking the UI thread, and your data is never transmitted to any server.
What Pattern Extraction Actually Does
Pattern extraction scans text within a CSV column and identifies values that match a known structure — the format of an email address, the format of a phone number, the format of a URL. It returns those matches as a new column, leaving everything else behind.
This is different from web scraping tools, which pull contact data from websites. Pattern extraction works on data you already have — buried in a column that mixes contact details with other text.
The three most common use cases break down like this:
| Pattern Type | What It Extracts | Common Source Column |
|---|---|---|
| Email addresses | Any valid email format | Notes, Comments, Contact, Free Text |
| Phone numbers | US, international, formatted variants | Notes, Address, Contact, Comments |
| URLs / domains | http, https, www links | Website, Notes, Description, Bio |
This table is the bookmark. Every time you inherit a messy CRM export, this is what you're scanning for.
The traditional approach to this problem is Python regex — re.findall(r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}', text) — or Excel's TEXTSPLIT function for structured columns. Both work on small datasets with clean data. At 80,000 rows of mixed free text, regex requires scripting expertise and Excel formulas break down completely. The browser-based approach here requires neither.
Table of Contents
- What Pattern Extraction Actually Does
- How to Extract Emails from a CSV Column
- How to Extract Phone Numbers from a CSV Column
- How to Extract URLs and Domains
- Working with Mixed Columns
- Common Scenarios
- Additional Resources
- FAQ
This guide is for: CRM administrators, marketing operations teams, list managers, and anyone who has inherited a CSV file where contact information is buried in the wrong column.
Already know which pattern you need? Jump to Email, Phone, or URL.
How to Extract Emails from a CSV Column
Email addresses follow a consistent structure defined by RFC 5322: a local part, an @ symbol, and a domain. This predictable format makes automated extraction reliable even in highly unstructured text.
Step 1: Identify the source column
Open your CSV and locate the column containing the buried emails. Common column names include: Notes, Comments, Contact Info, Free Text, Description, or Bio. If the column has no obvious name, scan the first 10 rows visually for @ symbols.
Step 2: Load the file and select Email extraction
- Open SplitForge Pattern Extraction
- Upload your CSV file — processing begins immediately in your browser
- From the Pattern Type selector, choose Email
- Select the source column from the dropdown
Step 3: Review the preview and extract
The tool shows a preview of matched addresses before you commit. Check for obvious false positives — [email protected] in template rows, or @mentions that aren't email addresses. Use the filter to exclude patterns matching specific domains if needed.
Step 4: Download the result
Click Extract and download the output. The result CSV contains your original data plus a new column — "Extracted Email" — populated wherever a match was found. Rows with no email match receive a blank cell, not an error.
What success looks like:
- Matched addresses appear in the new column with no surrounding text
- Non-email rows are blank, not removed
- International domains (.co.uk, .de, .fr) are captured correctly
- If you see raw text in the extracted column, re-check that you selected the correct source column
How to Extract Phone Numbers from a CSV Column
Phone numbers are harder to extract than emails because there is no single standardized format. The ITU-T E.164 standard defines the canonical international format (+15551234567 — no spaces or separators), but real-world CRM data contains at least four common US variants: hyphenated (555-123-4567), parenthetical ((555) 123-4567), dot-separated (555.123.4567), and space-separated (555 123 4567). International numbers add country code prefixes (+44, +49, +61) with their own separator conventions.
The extraction engine handles this by normalizing separators during matching — it strips hyphens, dots, spaces, and parentheses before counting digits, then checks whether the digit count falls in the valid range (10 digits for US/Canada, 11–15 for international with country code). This means a hyphenated US number and a parenthetical US number both match the same pattern without requiring two separate extraction passes.
SplitForge Pattern Extraction handles all common North American and international formats in a single pass.
Step 1: Choose Phone as your pattern type
From the Pattern Type selector, choose Phone Number. Select whether you want to match US/Canada only or international formats. International mode is broader and will catch more variations — use it when your data includes contacts from multiple countries.
Step 2: Set the minimum digit threshold
One common false positive in phone extraction is numeric codes that happen to be 10 digits — order numbers, account IDs, zip codes concatenated with extensions. Set the minimum digit count to 10 (US) or 11 (international with country code) and enable the "must contain separator" option to reduce numeric code matches.
Step 3: Extract and verify
The extraction preview shows matched numbers alongside their original source text. Spot-check 10–20 rows across different row ranges — not just the first 10, which may not represent the full data variety.
How to Extract URLs and Domains
URL extraction covers two specific scenarios. The first is when a "Website" column contains URLs mixed with other text ("Visit us at https://acme.com — Mon-Fri 9-5"). The second is when you want to extract the domain only from full URLs you've already captured.
RFC 3986 defines the URI structure that URL extraction is built on. The pattern matches http://, https://, and www. prefixes, capturing the full URL through to the next space or delimiter.
Step 1: Choose URL as your pattern type
Select URL from the pattern type selector. If you only need the domain (e.g., "acme.com" rather than "https://www.acme.com/about-us"), enable the Domain Only option — this strips the protocol and path and returns just the registerable domain.
Step 2: Handle social media URLs
LinkedIn, Twitter/X, and Instagram URLs are valid URLs and will be extracted alongside company websites. If you want to separate them, use the domain filter after extraction to split social domains from custom domains.
Step 3: Extract and clean
Download the result. If using Domain Only mode, check that subdomains are handled correctly — "mail.acme.com" and "acme.com" are different values and may need consolidation depending on your use case.
Working with Mixed Columns
Some columns contain multiple pattern types simultaneously — an email address AND a phone number AND a URL all in one cell. This is common in legacy CRM exports where a "Contact Info" field was used as a catch-all.
Run extraction three times on the same column, once per pattern type. Each run produces a separate output column. Then combine them:
- Extract emails → column: Extracted_Email
- Extract phones → column: Extracted_Phone
- Extract URLs → column: Extracted_URL
- Join all three outputs on the original row identifier
This gives you a clean, structured contact record from a single messy source column. The whole process takes under 5 minutes on files up to 500,000 rows. See our guide on merging CSV files with different column orders if you need help joining the outputs.
Common Scenarios
Scenario 1: Legacy CRM export with a Notes field
Situation: The old CRM stored everything in a Notes column. You need to migrate to HubSpot but HubSpot requires separate Email and Phone fields.
Approach: Run email extraction, then phone extraction. Map the resulting columns to HubSpot's import template. For rows where extraction returns blank, check whether there's a separate email column already populated — the notes extraction supplements it.
Scenario 2: Website contact list with inconsistent URL formatting
Situation: A prospecting list has a Website column where some rows have full URLs, some have bare domains, and some have embedded text around the URL.
Approach: Run URL extraction in Domain Only mode. This normalizes all three variants — "https://acme.com/about", "acme.com", and "Visit acme.com for details" — to the same output: "acme.com".
Scenario 3: Event attendance CSV with phone numbers in the wrong column
Situation: An event sign-up CSV captured phone numbers in a field labeled "How did you hear about us?" because some respondents entered their number instead of a text answer.
Approach: Run phone extraction against that column. Legitimate text answers ("Google", "Referral") contain no phone patterns and return blank. Only rows with actual phone numbers produce output.
Avoiding False Positives
Extraction is reliable when the pattern is unambiguous. These are the false positives that trip up real-world CRM exports — and the fix for each:
| Pattern | Common False Positive | Fix |
|---|---|---|
| Phone numbers | 10-digit order IDs, account numbers | Require separator characters (hyphen, dot, space) |
| Email addresses | Placeholder rows: [email protected] | Filter rows where domain is example.com, test.com, or your company domain |
| URLs | LinkedIn, Twitter/X, Instagram links | Use Domain Filter to exclude social domains after extraction |
Run a 50-row preview before extracting the full file. If the preview shows false positives, apply the relevant fix before committing the full run.
Cleaning and Validating Extracted Results
Extraction output typically needs two additional steps before import.
Deduplication: If the same contact appears in multiple rows, the extracted email column will contain duplicates. Run SplitForge Remove Duplicates on the extracted email column before importing to your CRM or email platform.
Post-extraction validation: A valid regex match is not always a valid active address. For email, check that the extracted domain resolves (no obvious typos like gmali.com). For phone numbers, verify the country code prefix matches your expected market. These are quick spot-checks — 20 rows out of 80,000 takes 3 minutes and catches the most common data entry errors.
For more on handling extracted contact data, see our guide on removing duplicate emails before CRM import.
Additional Resources
Email and Phone Format Standards:
- RFC 5322: Internet Message Format — Defines valid email address structure used by extraction matching
- ITU-T E.164: International Phone Numbering Plan — International telephone number format standard
URL Standards:
- RFC 3986: Uniform Resource Identifier — URI syntax specification underpinning URL extraction
Browser Processing:
- MDN Web Workers API — How browser-based background processing works without blocking the UI
Data Privacy:
- GDPR Article 5: Principles for Processing Personal Data — Data minimization requirement relevant to contact information processing