Navigated to blog › extract-emails-phone-urls-csv
Back to Blog
csv-guides

Extract Emails, Phone Numbers, and URLs from CSV Automatically

March 15, 2026
13
By SplitForge Team

Fast Fix (90 Seconds)

If you have emails, phone numbers, or URLs mixed into a CSV column right now:

  1. Open SplitForge Pattern Extraction — no account required
  2. Upload your CSV or paste the column contents
  3. Select the pattern type: Email, Phone, or URL
  4. Click Extract — matches appear in a new column
  5. Download the clean result and import to your CRM or tool

Files process entirely in your browser. Nothing is uploaded.


TL;DR: Most CSV exports bury contact information inside notes fields, address columns, or mixed-format cells. Manually hunting through hundreds of thousands of rows is not viable. SplitForge Pattern Extraction identifies emails, phone numbers, and URLs using browser-based matching — no Python, no upload, no code — and outputs clean, separate columns ready for import.


Your CRM export landed in your inbox. 80,000 rows, exported from a legacy system that crammed everything into a single "Notes" column — email addresses, phone numbers, website URLs, follow-up dates, all mixed together in free text.

Your task is to get the emails into Mailchimp, the phone numbers into your dialer, and the URLs into a prospecting sheet. By hand, that's weeks of copy-paste. With Python, it's a regex project that assumes you know what you're doing. With a cloud extraction tool, you're uploading 80,000 rows of customer contact data to a server you don't control.

There's a fourth option: browser-based pattern extraction that runs locally, touches nothing on a server, and takes about 30 seconds.

Each pattern type was tested using SplitForge Pattern Extraction against real-world CRM exports ranging from 12,000 to 180,000 rows, March 2026. In one Salesforce export we analyzed, 38% of email addresses were buried in the Notes field rather than the dedicated Email column — a common finding in legacy CRM migrations.


Most online email and phone extraction tools process your file on remote servers. Under standard SaaS terms of service, uploaded files are typically retained for 30–90 days for logging and debugging purposes. For files containing customer emails and phone numbers, this creates GDPR Article 5 data minimization exposure. SplitForge Pattern Extraction runs entirely in your browser — files are streamed and processed in dedicated Web Worker threads to avoid blocking the UI thread, and your data is never transmitted to any server.


What Pattern Extraction Actually Does

Pattern extraction scans text within a CSV column and identifies values that match a known structure — the format of an email address, the format of a phone number, the format of a URL. It returns those matches as a new column, leaving everything else behind.

This is different from web scraping tools, which pull contact data from websites. Pattern extraction works on data you already have — buried in a column that mixes contact details with other text.

The three most common use cases break down like this:

Pattern TypeWhat It ExtractsCommon Source Column
Email addressesAny valid email formatNotes, Comments, Contact, Free Text
Phone numbersUS, international, formatted variantsNotes, Address, Contact, Comments
URLs / domainshttp, https, www linksWebsite, Notes, Description, Bio

This table is the bookmark. Every time you inherit a messy CRM export, this is what you're scanning for.

The traditional approach to this problem is Python regex — re.findall(r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}', text) — or Excel's TEXTSPLIT function for structured columns. Both work on small datasets with clean data. At 80,000 rows of mixed free text, regex requires scripting expertise and Excel formulas break down completely. The browser-based approach here requires neither.


Table of Contents


This guide is for: CRM administrators, marketing operations teams, list managers, and anyone who has inherited a CSV file where contact information is buried in the wrong column.

Already know which pattern you need? Jump to Email, Phone, or URL.


How to Extract Emails from a CSV Column

Email addresses follow a consistent structure defined by RFC 5322: a local part, an @ symbol, and a domain. This predictable format makes automated extraction reliable even in highly unstructured text.

Step 1: Identify the source column

Open your CSV and locate the column containing the buried emails. Common column names include: Notes, Comments, Contact Info, Free Text, Description, or Bio. If the column has no obvious name, scan the first 10 rows visually for @ symbols.

Step 2: Load the file and select Email extraction

  1. Open SplitForge Pattern Extraction
  2. Upload your CSV file — processing begins immediately in your browser
  3. From the Pattern Type selector, choose Email
  4. Select the source column from the dropdown

Step 3: Review the preview and extract

The tool shows a preview of matched addresses before you commit. Check for obvious false positives — [email protected] in template rows, or @mentions that aren't email addresses. Use the filter to exclude patterns matching specific domains if needed.

Step 4: Download the result

Click Extract and download the output. The result CSV contains your original data plus a new column — "Extracted Email" — populated wherever a match was found. Rows with no email match receive a blank cell, not an error.

What success looks like:

  • Matched addresses appear in the new column with no surrounding text
  • Non-email rows are blank, not removed
  • International domains (.co.uk, .de, .fr) are captured correctly
  • If you see raw text in the extracted column, re-check that you selected the correct source column

How to Extract Phone Numbers from a CSV Column

Phone numbers are harder to extract than emails because there is no single standardized format. The ITU-T E.164 standard defines the canonical international format (+15551234567 — no spaces or separators), but real-world CRM data contains at least four common US variants: hyphenated (555-123-4567), parenthetical ((555) 123-4567), dot-separated (555.123.4567), and space-separated (555 123 4567). International numbers add country code prefixes (+44, +49, +61) with their own separator conventions.

The extraction engine handles this by normalizing separators during matching — it strips hyphens, dots, spaces, and parentheses before counting digits, then checks whether the digit count falls in the valid range (10 digits for US/Canada, 11–15 for international with country code). This means a hyphenated US number and a parenthetical US number both match the same pattern without requiring two separate extraction passes.

SplitForge Pattern Extraction handles all common North American and international formats in a single pass.

Step 1: Choose Phone as your pattern type

From the Pattern Type selector, choose Phone Number. Select whether you want to match US/Canada only or international formats. International mode is broader and will catch more variations — use it when your data includes contacts from multiple countries.

Step 2: Set the minimum digit threshold

One common false positive in phone extraction is numeric codes that happen to be 10 digits — order numbers, account IDs, zip codes concatenated with extensions. Set the minimum digit count to 10 (US) or 11 (international with country code) and enable the "must contain separator" option to reduce numeric code matches.

Step 3: Extract and verify

The extraction preview shows matched numbers alongside their original source text. Spot-check 10–20 rows across different row ranges — not just the first 10, which may not represent the full data variety.


How to Extract URLs and Domains

URL extraction covers two specific scenarios. The first is when a "Website" column contains URLs mixed with other text ("Visit us at https://acme.com — Mon-Fri 9-5"). The second is when you want to extract the domain only from full URLs you've already captured.

RFC 3986 defines the URI structure that URL extraction is built on. The pattern matches http://, https://, and www. prefixes, capturing the full URL through to the next space or delimiter.

Step 1: Choose URL as your pattern type

Select URL from the pattern type selector. If you only need the domain (e.g., "acme.com" rather than "https://www.acme.com/about-us"), enable the Domain Only option — this strips the protocol and path and returns just the registerable domain.

Step 2: Handle social media URLs

LinkedIn, Twitter/X, and Instagram URLs are valid URLs and will be extracted alongside company websites. If you want to separate them, use the domain filter after extraction to split social domains from custom domains.

Step 3: Extract and clean

Download the result. If using Domain Only mode, check that subdomains are handled correctly — "mail.acme.com" and "acme.com" are different values and may need consolidation depending on your use case.


Working with Mixed Columns

Some columns contain multiple pattern types simultaneously — an email address AND a phone number AND a URL all in one cell. This is common in legacy CRM exports where a "Contact Info" field was used as a catch-all.

Run extraction three times on the same column, once per pattern type. Each run produces a separate output column. Then combine them:

  1. Extract emails → column: Extracted_Email
  2. Extract phones → column: Extracted_Phone
  3. Extract URLs → column: Extracted_URL
  4. Join all three outputs on the original row identifier

This gives you a clean, structured contact record from a single messy source column. The whole process takes under 5 minutes on files up to 500,000 rows. See our guide on merging CSV files with different column orders if you need help joining the outputs.


Common Scenarios

Scenario 1: Legacy CRM export with a Notes field

Situation: The old CRM stored everything in a Notes column. You need to migrate to HubSpot but HubSpot requires separate Email and Phone fields.

Approach: Run email extraction, then phone extraction. Map the resulting columns to HubSpot's import template. For rows where extraction returns blank, check whether there's a separate email column already populated — the notes extraction supplements it.

Scenario 2: Website contact list with inconsistent URL formatting

Situation: A prospecting list has a Website column where some rows have full URLs, some have bare domains, and some have embedded text around the URL.

Approach: Run URL extraction in Domain Only mode. This normalizes all three variants — "https://acme.com/about", "acme.com", and "Visit acme.com for details" — to the same output: "acme.com".

Scenario 3: Event attendance CSV with phone numbers in the wrong column

Situation: An event sign-up CSV captured phone numbers in a field labeled "How did you hear about us?" because some respondents entered their number instead of a text answer.

Approach: Run phone extraction against that column. Legitimate text answers ("Google", "Referral") contain no phone patterns and return blank. Only rows with actual phone numbers produce output.


Avoiding False Positives

Extraction is reliable when the pattern is unambiguous. These are the false positives that trip up real-world CRM exports — and the fix for each:

PatternCommon False PositiveFix
Phone numbers10-digit order IDs, account numbersRequire separator characters (hyphen, dot, space)
Email addressesPlaceholder rows: [email protected]Filter rows where domain is example.com, test.com, or your company domain
URLsLinkedIn, Twitter/X, Instagram linksUse Domain Filter to exclude social domains after extraction

Run a 50-row preview before extracting the full file. If the preview shows false positives, apply the relevant fix before committing the full run.

Cleaning and Validating Extracted Results

Extraction output typically needs two additional steps before import.

Deduplication: If the same contact appears in multiple rows, the extracted email column will contain duplicates. Run SplitForge Remove Duplicates on the extracted email column before importing to your CRM or email platform.

Post-extraction validation: A valid regex match is not always a valid active address. For email, check that the extracted domain resolves (no obvious typos like gmali.com). For phone numbers, verify the country code prefix matches your expected market. These are quick spot-checks — 20 rows out of 80,000 takes 3 minutes and catches the most common data entry errors.

For more on handling extracted contact data, see our guide on removing duplicate emails before CRM import.


Additional Resources

Email and Phone Format Standards:

URL Standards:

Browser Processing:

  • MDN Web Workers API — How browser-based background processing works without blocking the UI

Data Privacy:


FAQ

Yes. Pattern extraction scans the entire cell value regardless of where the target pattern appears. "Please call John at 555-123-4567 or email [email protected]" will extract both the phone number and the email address in a single pass.

Those rows receive an empty cell in the extracted column. They are not removed, flagged, or modified. Your original data remains intact — the extraction only adds a new column.

Currently you select one source column per extraction run. For multi-column extraction, run the tool once per column and join the outputs on row number or a shared ID field.

The extraction pattern follows RFC 5322 and captures all valid TLDs including ccTLDs (.co.uk, .de, .com.au) and newer gTLDs (.io, .ai, .co). UTF-8 encoded files support internationalized domain names.

No hard limit. The tool has been tested on files up to 10 million rows. Processing time scales with both row count and text length per cell — typically 3–5 seconds per 1 million rows for email extraction on i7-class hardware in Chrome, though complex mixed-text columns may take longer depending on pattern complexity.

Yes. The tool auto-detects the delimiter before parsing. Semicolon, tab, pipe, and comma-delimited files are all supported. For more on delimiter issues see our complete guide to CSV import errors.

No. SplitForge creates a new output file with your original data plus the extracted column. Your source file is never modified or stored.


Extract Contact Data From Any CSV Now

Extracts emails, phone numbers, and URLs from any column — no code required
Handles mixed-format cells, international numbers, and all URL structures
Files process locally in your browser — your contact data never leaves your machine
No row limit — tested to 10 million rows in under 60 seconds

Continue Reading

More guides to help you work smarter with your data

ai-data-prep

AI-Ready Data Checklist: 10 Things to Verify Before Upload (2026)

Before uploading to ChatGPT, Claude, or a fine-tuning API, run through this 10-point checklist. UTF-8 encoding, clean headers, PII removed, size within limits.

Read More
ai-data-prep

Convert Excel to JSON for AI APIs and LLM Pipelines (2026)

AI APIs and LLM pipelines expect JSON, not spreadsheets. Fine-tuning needs JSONL; direct prompts take arrays. Convert locally — no upload, no conversion server.

Read More
ai-data-prep

Prepare Data for AI: The Complete Guide (Privacy-First, 2026)

How to prepare a CSV or Excel file for ChatGPT, Claude, or an AI API — encoding, PII, format, size, and privacy. The complete local-first prep workflow.

Read More