csv-guides

Extract Emails, Phone Numbers, and URLs from CSV Automatically

March 15, 2026

By SplitForge Team

Fast Fix (90 Seconds)

If you have emails, phone numbers, or URLs mixed into a CSV column right now:

Open SplitForge Pattern Extraction — no account required
Upload your CSV or paste the column contents
Select the pattern type: Email, Phone, or URL
Click Extract — matches appear in a new column
Download the clean result and import to your CRM or tool

Files process entirely in your browser. Nothing is uploaded.

TL;DR: Most CSV exports bury contact information inside notes fields, address columns, or mixed-format cells. Manually hunting through hundreds of thousands of rows is not viable. SplitForge Pattern Extraction identifies emails, phone numbers, and URLs using browser-based matching — no Python, no upload, no code — and outputs clean, separate columns ready for import.

Your CRM export landed in your inbox. 80,000 rows, exported from a legacy system that crammed everything into a single "Notes" column — email addresses, phone numbers, website URLs, follow-up dates, all mixed together in free text.

Your task is to get the emails into Mailchimp, the phone numbers into your dialer, and the URLs into a prospecting sheet. By hand, that's weeks of copy-paste. With Python, it's a regex project that assumes you know what you're doing. With a cloud extraction tool, you're uploading 80,000 rows of customer contact data to a server you don't control.

There's a fourth option: browser-based pattern extraction that runs locally, touches nothing on a server, and takes about 30 seconds.

Each pattern type was tested using SplitForge Pattern Extraction against real-world CRM exports ranging from 12,000 to 180,000 rows, March 2026. In one Salesforce export we analyzed, 38% of email addresses were buried in the Notes field rather than the dedicated Email column — a common finding in legacy CRM migrations.

Most online email and phone extraction tools process your file on remote servers. Under standard SaaS terms of service, uploaded files are typically retained for 30–90 days for logging and debugging purposes. For files containing customer emails and phone numbers, this creates GDPR Article 5 data minimization exposure. SplitForge Pattern Extraction runs entirely in your browser — files are streamed and processed in dedicated Web Worker threads to avoid blocking the UI thread, and your data is never transmitted to any server.

What Pattern Extraction Actually Does

Pattern extraction scans text within a CSV column and identifies values that match a known structure — the format of an email address, the format of a phone number, the format of a URL. It returns those matches as a new column, leaving everything else behind.

This is different from web scraping tools, which pull contact data from websites. Pattern extraction works on data you already have — buried in a column that mixes contact details with other text.

The three most common use cases break down like this:

Pattern Type	What It Extracts	Common Source Column
Email addresses	Any valid email format	Notes, Comments, Contact, Free Text
Phone numbers	US, international, formatted variants	Notes, Address, Contact, Comments
URLs / domains	http, https, www links	Website, Notes, Description, Bio

This table is the bookmark. Every time you inherit a messy CRM export, this is what you're scanning for.

The traditional approach to this problem is Python regex — re.findall(r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}', text) — or Excel's TEXTSPLIT function for structured columns. Both work on small datasets with clean data. At 80,000 rows of mixed free text, regex requires scripting expertise and Excel formulas break down completely. The browser-based approach here requires neither.

What Pattern Extraction Actually Does
How to Extract Emails from a CSV Column
How to Extract Phone Numbers from a CSV Column
How to Extract URLs and Domains
Working with Mixed Columns
Common Scenarios
Additional Resources
FAQ

This guide is for: CRM administrators, marketing operations teams, list managers, and anyone who has inherited a CSV file where contact information is buried in the wrong column.

Already know which pattern you need? Jump to Email, Phone, or URL.

How to Extract Emails from a CSV Column

Email addresses follow a consistent structure defined by RFC 5322: a local part, an @ symbol, and a domain. This predictable format makes automated extraction reliable even in highly unstructured text.

Step 1: Identify the source column

Open your CSV and locate the column containing the buried emails. Common column names include: Notes, Comments, Contact Info, Free Text, Description, or Bio. If the column has no obvious name, scan the first 10 rows visually for @ symbols.

Step 2: Load the file and select Email extraction

Open SplitForge Pattern Extraction
Upload your CSV file — processing begins immediately in your browser
From the Pattern Type selector, choose Email
Select the source column from the dropdown

Step 3: Review the preview and extract

The tool shows a preview of matched addresses before you commit. Check for obvious false positives — [email protected] in template rows, or @mentions that aren't email addresses. Switch to strict validation mode if low-confidence matches are slipping through.

Step 4: Download the result

Click Extract and download the output. The result CSV contains your original data plus a new column — "Extracted Email" — populated wherever a match was found. Rows with no email match receive a blank cell, not an error.

What success looks like:

Matched addresses appear in the new column with no surrounding text
Non-email rows are blank, not removed
International domains (.co.uk, .de, .fr) are captured correctly
If you see raw text in the extracted column, re-check that you selected the correct source column

How to Extract Phone Numbers from a CSV Column

Phone numbers are harder to extract than emails because there is no single standardized format. The ITU-T E.164 standard defines the canonical international format (+15551234567 — no spaces or separators), but real-world CRM data contains at least four common US variants: hyphenated (555-123-4567), parenthetical ((555) 123-4567), dot-separated (555.123.4567), and space-separated (555 123 4567). International numbers add country code prefixes (+44, +49, +61) with their own separator conventions.

The extraction engine handles this by normalizing separators during matching — it strips hyphens, dots, spaces, and parentheses before counting digits, then checks whether the digit count falls in the valid range (10 digits for US/Canada, 11–15 for international with country code). This means a hyphenated US number and a parenthetical US number both match the same pattern without requiring two separate extraction passes.

SplitForge Pattern Extraction handles all common North American and international formats in a single pass.

Step 1: Choose Phone as your pattern type

From the Pattern Type selector, choose Phone Number. There's no separate US-vs-international toggle — the extractor matches the common US/NANP notations (hyphenated, parenthetical, dotted, spaced, and +1-prefixed) in one pass; use the validation mode (below) to tune how strict the matching is.

Step 2: Use strict mode to cut numeric false positives

A common false positive is numeric codes that happen to look like phone numbers — order numbers, account IDs, long digit strings. Set the validation mode to strict to keep only high-confidence matches and drop the ambiguous ones.

Step 3: Extract and verify

The extraction preview shows matched numbers alongside their original source text. Spot-check 10–20 rows across different row ranges — not just the first 10, which may not represent the full data variety.

How to Extract URLs and Domains

URL extraction covers two specific scenarios. The first is when a "Website" column contains URLs mixed with other text ("Visit us at https://acme.com — Mon-Fri 9-5"). The second is when you want to extract the domain only from full URLs you've already captured.

RFC 3986 defines the URI structure that URL extraction is built on. The pattern matches http://, https://, and www. prefixes, capturing the full URL through to the next space or delimiter.

Step 1: Choose URL as your pattern type

Select URL from the pattern type selector. If you only need the domain (e.g., "acme.com" rather than "https://www.acme.com/about-us"), enable the Domain Only option — this strips the protocol and path and returns just the registerable domain.

LinkedIn, Twitter/X, and Instagram URLs are valid URLs and will be extracted alongside company websites. To separate them afterward, sort or filter the extracted column by domain in your spreadsheet, or run SplitForge Find & Replace on the output.

Step 3: Extract and clean

Download the result. If using Domain Only mode, check that subdomains are handled correctly — "mail.acme.com" and "acme.com" are different values and may need consolidation depending on your use case.

Working with Mixed Columns

Some columns contain multiple pattern types simultaneously — an email address AND a phone number AND a URL all in one cell. This is common in legacy CRM exports where a "Contact Info" field was used as a catch-all.

Run extraction three times on the same column, once per pattern type. Each run produces a separate output column. Then combine them:

Extract emails → column: Extracted_Email
Extract phones → column: Extracted_Phone
Extract URLs → column: Extracted_URL
Join all three outputs on the original row identifier

This gives you a clean, structured contact record from a single messy source column. The whole process takes under 5 minutes on files up to 500,000 rows. See our guide on merging CSV files with different column orders if you need help joining the outputs.

Common Scenarios

Scenario 1: Legacy CRM export with a Notes field

Situation: The old CRM stored everything in a Notes column. You need to migrate to HubSpot but HubSpot requires separate Email and Phone fields.

Approach: Run email extraction, then phone extraction. Map the resulting columns to HubSpot's import template. For rows where extraction returns blank, check whether there's a separate email column already populated — the notes extraction supplements it.

Scenario 2: Website contact list with inconsistent URL formatting

Situation: A prospecting list has a Website column where some rows have full URLs, some have bare domains, and some have embedded text around the URL.

Approach: Run URL extraction in Domain Only mode. This normalizes all three variants — "https://acme.com/about", "acme.com", and "Visit acme.com for details" — to the same output: "acme.com".

Scenario 3: Event attendance CSV with phone numbers in the wrong column

Situation: An event sign-up CSV captured phone numbers in a field labeled "How did you hear about us?" because some respondents entered their number instead of a text answer.

Approach: Run phone extraction against that column. Legitimate text answers ("Google", "Referral") contain no phone patterns and return blank. Only rows with actual phone numbers produce output.

Avoiding False Positives

Extraction is reliable when the pattern is unambiguous. These are the false positives that trip up real-world CRM exports — and the fix for each:

Pattern	Common False Positive	Fix
Phone numbers	10-digit order IDs, account numbers	Use strict validation mode (high-confidence matches only)
Email addresses	Placeholder rows: [email protected]	Review the preview; drop placeholder/test rows in your spreadsheet afterward
URLs	LinkedIn, Twitter/X, Instagram links	All are extracted; separate social vs. custom domains in your spreadsheet afterward

Run a 50-row preview before extracting the full file. If the preview shows false positives, apply the relevant fix before committing the full run.

Cleaning and Validating Extracted Results

Extraction output typically needs two additional steps before import.

Deduplication: If the same contact appears in multiple rows, the extracted email column will contain duplicates. Run SplitForge Remove Duplicates on the extracted email column before importing to your CRM or email platform.

Post-extraction validation: A valid regex match is not always a valid active address. For email, check that the extracted domain resolves (no obvious typos like gmali.com). For phone numbers, verify the country code prefix matches your expected market. These are quick spot-checks — 20 rows out of 80,000 takes 3 minutes and catches the most common data entry errors.

For more on handling extracted contact data, see our guide on removing duplicate emails before CRM import.

Additional Resources

Email and Phone Format Standards:

RFC 5322: Internet Message Format — Defines valid email address structure used by extraction matching
ITU-T E.164: International Phone Numbering Plan — International telephone number format standard

URL Standards:

RFC 3986: Uniform Resource Identifier — URI syntax specification underpinning URL extraction

Browser Processing:

MDN Web Workers API — How browser-based background processing works without blocking the UI

Data Privacy:

GDPR Article 5: Principles for Processing Personal Data — Data minimization requirement relevant to contact information processing

FAQ

Yes. Pattern extraction scans the entire cell value regardless of where the target pattern appears. "Please call John at 555-123-4567 or email [email protected]" will extract both the phone number and the email address in a single pass.

Those rows receive an empty cell in the extracted column. They are not removed, flagged, or modified. Your original data remains intact — the extraction only adds a new column.

Currently you select one source column per extraction run. For multi-column extraction, run the tool once per column and join the outputs on row number or a shared ID field.

The extraction pattern follows RFC 5322 and captures all valid TLDs including ccTLDs (.co.uk, .de, .com.au) and newer gTLDs (.io, .ai, .co). UTF-8 encoded files support internationalized domain names.

No hard limit. The tool has been tested on files up to 10 million rows. Processing time scales with both row count and text length per cell — typically 3–5 seconds per 1 million rows for email extraction on i7-class hardware in Chrome, though complex mixed-text columns may take longer depending on pattern complexity.

Yes. The tool auto-detects the delimiter before parsing. Semicolon, tab, pipe, and comma-delimited files are all supported. For more on delimiter issues see our complete guide to CSV import errors.

No. SplitForge creates a new output file with your original data plus the extracted column. Your source file is never modified or stored.

Extract Contact Data From Any CSV Now

Extracts emails, phone numbers, and URLs from any column — no code required

Handles mixed-format cells, international numbers, and all URL structures

Files process locally in your browser — your contact data never leaves your machine

No row limit — tested to 10 million rows in under 60 seconds

Extract Patterns Now →

Extract Emails, Phone Numbers, and URLs from CSV Automatically

Fast Fix (90 Seconds)

What Pattern Extraction Actually Does

Table of Contents

How to Extract Emails from a CSV Column

Step 1: Identify the source column

Step 2: Load the file and select Email extraction

Step 3: Review the preview and extract

Step 4: Download the result

How to Extract Phone Numbers from a CSV Column

Step 1: Choose Phone as your pattern type

Step 2: Use strict mode to cut numeric false positives

Step 3: Extract and verify

How to Extract URLs and Domains

Step 1: Choose URL as your pattern type

Step 3: Extract and clean

Working with Mixed Columns

Common Scenarios

Scenario 1: Legacy CRM export with a Notes field

Scenario 2: Website contact list with inconsistent URL formatting

Scenario 3: Event attendance CSV with phone numbers in the wrong column

Avoiding False Positives

Cleaning and Validating Extracted Results

Additional Resources

FAQ

Does this work if the email or phone number is in the middle of a sentence?

What happens to rows where no pattern is found?

Can I extract patterns from multiple columns at once?

How does SplitForge handle international email domains?

Is there a row limit?

Will this work on semicolon-delimited CSV files?

Does pattern extraction modify my original file?

Extract Contact Data From Any CSV Now

Fast Fix (90 Seconds)

What Pattern Extraction Actually Does

Table of Contents

How to Extract Emails from a CSV Column

Step 1: Identify the source column

Step 2: Load the file and select Email extraction

Step 3: Review the preview and extract

Step 4: Download the result

How to Extract Phone Numbers from a CSV Column

Step 1: Choose Phone as your pattern type

Step 2: Use strict mode to cut numeric false positives

Step 3: Extract and verify

How to Extract URLs and Domains

Step 1: Choose URL as your pattern type

Step 2: Handle social media URLs

Step 3: Extract and clean

Working with Mixed Columns

Common Scenarios

Scenario 1: Legacy CRM export with a Notes field

Scenario 2: Website contact list with inconsistent URL formatting

Scenario 3: Event attendance CSV with phone numbers in the wrong column

Avoiding False Positives

Cleaning and Validating Extracted Results

Additional Resources

FAQ

Does this work if the email or phone number is in the middle of a sentence?

What happens to rows where no pattern is found?

Can I extract patterns from multiple columns at once?

How does SplitForge handle international email domains?

Is there a row limit?

Will this work on semicolon-delimited CSV files?

Does pattern extraction modify my original file?

Extract Contact Data From Any CSV Now

Continue Reading

Do You Need a Database for a Large CSV File? (2026 Answer)

How to Open a Large CSV File — Even 10 GB, No Database (2026)

Excel File Too Large to Open? Fix Every Memory Error (2026)