You're migrating 3,000 products from Shopify to a new platform. The CSV export looks perfect—product titles, prices, SKUs, descriptions all present. You map the fields, click Import.
Import failed: 1,847 products rejected. "Invalid characters in description field: row 487, 892, 1,203..."
You scroll to row 487. The description field contains:
<p><strong>Premium Leather Wallet</strong></p><ul><li>Genuine Italian leather</li><li>RFID protection</li><li>Holds 12 cards</li></ul><p>Perfect gift for professionals.</p>
The platform expects plain text. What you see is formatted HTML with paragraph tags, lists, and bold formatting—exactly how it displayed on your Shopify store, but unusable for systems expecting clean text.
TL;DR: CSV imports fail because e-commerce platforms export HTML markup (<p>, <strong>, <ul>) instead of plain text. Strip tags using Excel wildcards (<*>), regex patterns (<[^>]*>), or browser-based tools. Never manually edit 1,000+ rows—automate it in 2-10 minutes depending on file size and method chosen.
Quick Answer
CSV imports fail when exported data contains HTML formatting tags (<p>, <strong>, <div>, etc.) that target systems don't accept. E-commerce platforms export product descriptions as HTML for web display, but when reimporting to different systems or analyzing data, the markup breaks validation rules and corrupts plain text fields.
HTML tags appear because source systems store rich text with formatting—Shopify exports "Body (HTML)" field with full markup, CMS exports include <h2> headers and <ul> lists, CRM exports preserve email formatting with <br> tags and <a> links, and web scraping captures page HTML instead of displayed text.
The fix: Use Excel find/replace with wildcard pattern <*> to remove all tags (works for simple cases), apply regex pattern <[^>]*> in text editors for complex nested HTML, or use automated data cleaning tools that process files client-side without uploading sensitive product/customer data to third-party servers.
Fast Fix (90 Seconds)
If your CSV import just failed due to HTML tags, try this first:
- Open CSV in Excel - Load file in Excel or Google Sheets
- Identify HTML columns - Spot tags:
<p>,<strong>,<br>,<div>,<span> - Excel find/replace - Find:
<*>| Replace: (empty) | Search: Use wildcards - Verify results - Check that actual content remained:
<p>Great product</p>→Great product - Re-export as CSV - Save as CSV (UTF-8) and retry import
If you have 1,000+ rows, nested tags, or HTML entities ( , "), continue below for automated methods.
Table of Contents
- Quick Diagnosis: Identify Your HTML Problem
- Method 1: Excel Find & Replace
- Method 2: Regex Pattern Matching
- Method 3: Automated Data Cleaning Tools
- Common HTML Scenarios
- Troubleshooting Failed Cleanups
- FAQ
Quick Diagnosis: Identify Your HTML Problem
Before choosing a cleanup method, identify your specific HTML situation. Open your CSV in text editor and search for < to reveal tag patterns.
What type of HTML tags are present?
Basic formatting tags:
<p>text</p>(paragraphs)<strong>text</strong>or<b>text</b>(bold)<br>or<br />(line breaks)
Structural tags:
<div>,<span>(containers)<ul><li>item</li></ul>(lists)<h1>,<h2>,<h3>(headers)
Complex tags:
- Inline styles:
<p style="color: red">text</p> - Comments:
<!-- This is a comment --> - Script tags:
<script>code here</script>
Are HTML entities present?
Search your CSV for &:
(non-breaking space)"or"(quotation marks)&(ampersand)<and>(encoded < and >)
If present: You need HTML entity decoding in addition to tag removal.
Match your situation to method:
- Simple tags, <500 rows → Method 1 (Excel)
- Nested tags, any volume → Method 2 (Regex)
- Recurring imports, privacy concerns → Method 3 (Browser tools)
Method 1: Excel Find & Replace
Best for: Small files (<500 rows), simple HTML tags, one-time cleanup.
Excel's wildcard pattern removes basic HTML tags but fails on nested tags or malformed HTML. For simple product descriptions with <p>, <strong>, and <br> tags, this 2-minute solution works. For a comprehensive overview of CSV troubleshooting techniques beyond HTML cleanup, see our CSV troubleshooting guide.
Step-by-step:
- Open CSV in Excel - File → Open → Select CSV
- Enable wildcards - Press
Ctrl+H→ Click "Options >>" → Check "Use wildcards" - Remove tags - Find:
<*>| Replace: (empty) | Click "Replace All" - Clean spacing - Find:
(double space) | Replace:(single space) - Save as CSV - File → Save As → CSV (UTF-8)
Example transformation:
Before: <p><strong>Premium Wallet</strong></p><ul><li>Leather</li></ul>
After: Premium WalletLeather
Limitation: Spacing between elements gets lost. For better results with complex HTML, use Method 2.
When Excel fails:
Excel wildcards don't handle nested tags (<div><p>text</p></div>), tags spanning cells, or malformed HTML (<p>text without closing bracket). Use Method 2 for these cases.
Method 2: Regex Pattern Matching
Best for: Complex nested HTML, large files, users comfortable with text editors.
Regex pattern <[^>]*> matches any content between angle brackets regardless of tag name or nesting. This handles all HTML tags that basic Excel wildcards miss.
Core regex pattern:
<[^>]*>
What it means:
<- Match opening angle bracket[^>]*- Match any character except>, zero or more times>- Match closing angle bracket
Tools that support regex:
- Notepad++ (Windows, free)
- VS Code (Mac/Windows/Linux, free)
- Sublime Text (paid)
- Google Sheets REGEXREPLACE function
Notepad++ example:
- Open CSV in Notepad++
- Press
Ctrl+Hto open Find & Replace - Find:
<[^>]*>| Replace: (empty) - Search mode: Check "Regular expression"
- Click "Replace All"
Decode HTML entities:
After removing tags, replace entities:
→(space)"→"&→&<→<>→>
Google Sheets formula:
=REGEXREPLACE(A2,"<[^>]*>","")
Apply to column, copy results, paste as values, delete original column. See Google Sheets REGEXREPLACE documentation for syntax details.
Test regex patterns:
Use regex101.com to test patterns before applying:
- Select "ECMAScript (JavaScript)" flavor
- Paste sample HTML in test string
- Enter pattern
<[^>]*> - Verify it highlights all tags correctly
- Copy working pattern to your editor
Method 3: Automated Data Cleaning Tools
Best for: Large files (1,000+ rows), recurring imports, privacy requirements.
Browser-based tools process CSV files entirely on your device without uploading data to servers. Critical for GDPR compliance, e-commerce customer data, or any company that cannot upload product information to third parties. For a complete privacy-first CSV workflow, see our data privacy checklist.
Client-side processing advantages:
- File never leaves your computer
- No data transmission over networks
- No third-party access
- GDPR/HIPAA compliant by architecture
- Processing speed: 100K+ rows in seconds
Recommended tools:
SplitForge Data Cleaner (Browser-based, no upload)
- Navigate to Data Cleaner tool
- Drag/drop CSV file (processes in-browser only)
- Select column containing HTML
- Click "Remove HTML Tags" operation
- Preview before/after results
- Download cleaned file
- Processing: 1M rows in 15-25 seconds
OpenRefine (Desktop app, free)
- Download and install locally
- Import CSV file
- Apply transformation:
value.replace(/<[^>]*>/g, "") - Export cleaned data
- Best for: Data professionals, recurring workflows
How client-side processing works:
Processing happens in your browser using JavaScript and Web Workers API. File reads locally via FileReader API, processes in memory, cleaned file downloads directly. Zero data uploads.
Privacy-first workflow:
For companies with GDPR, HIPAA, PCI-DSS, or internal data security policies:
- Use browser-based tools only (no uploads)
- Verify processing happens client-side (check browser Network tab)
- Document compliance in audit logs
- Save workflow for recurring imports
Common HTML Scenarios
Shopify Product Export
Problem: "Body (HTML)" field exports with full markup according to Shopify's CSV format documentation.
Sample data:
<meta charset="utf-8"><p><strong>Product Features:</strong></p><ul><li>Feature 1</li><li>Feature 2</li></ul>
Solution: Use regex pattern <[^>]*> or browser tools. Result: Product Features:Feature 1Feature 2
Shopify tip: When exporting, Shopify includes both HTML and plain text fields. Use bulk editor to copy plain text versions before export. For comprehensive guidance on cleaning product catalogs for Shopify and WooCommerce, see our product catalog CSV cleaning guide.
WooCommerce/WordPress Export
Problem: Product descriptions contain paragraph tags, line breaks, and formatting.
Sample data:
<p>Handcrafted wooden desk organizer.</p><p>Dimensions: 12" x 6" x 4"</p><p><strong>Made from sustainable bamboo.</strong></p>
Solution: Excel wildcards or regex removes all tags. Result: Handcrafted wooden desk organizer. Dimensions: 12" x 6" x 4" Made from sustainable bamboo.
CRM Email Body Export
Problem: Exported email campaigns contain links, images, styling.
Sample data:
<p>Hi <span style="color: #0000ff;">John</span>,</p><p>Check out our new <a href="https://example.com">product line</a>!</p>
Solution: Regex strips all tags and styles. Result: Hi John, Check out our new product line!
Note: Link URLs are lost. If needed, extract URLs separately before stripping tags.
Troubleshooting Failed Cleanups
Problem 1: Tags remain after find/replace
Symptoms: Still see <p>, </div>, or other tags after cleanup.
Causes:
- Pattern didn't match (Excel wildcards vs regex syntax error)
- Tags split across cells/lines
- Malformed HTML:
<p text(missing closing bracket)
Solutions:
- Check pattern syntax - Excel uses
<*>(wildcard), regex uses<[^>]*> - Handle split tags - Combine cells first with
Concatenatefunction - Use robust parser - Browser tools handle malformed HTML automatically
Problem 2: Content disappeared with tags
Symptoms: Cells blank or missing content after tag removal.
Cause: Pattern matched legitimate content like <5kg weight> (not an HTML tag).
Solution: Use more specific pattern that requires tag names to start with letter: </?[a-z][a-z0-9]*[^>]*> excludes <5kg> but matches <p>, <strong>, etc.
Problem 3: HTML entities remain
Symptoms: After cleanup, see , ", '.
Cause: Tag removal doesn't decode entities automatically.
Solution: Manual find/replace for common entities or use tools with automatic entity decoding (SplitForge Data Cleaner, BeautifulSoup).
FAQ
Conclusion
CSV imports fail because e-commerce platforms export HTML markup instead of plain text. Shopify's "Body (HTML)" field, WooCommerce descriptions, and CRM email bodies all contain <p>, <strong>, <ul> tags that break import validation on systems expecting clean text.
The fastest fix depends on file size and complexity:
- Simple cases (<500 rows): Excel find/replace with
<*>wildcard pattern (2 minutes) - Complex nested HTML: Regex pattern
<[^>]*>in Notepad++ or VS Code (5 minutes) - Large files (1,000+ rows): Browser-based tools process in seconds without uploads (GDPR compliant)
Most common mistakes: Manually editing thousands of rows (wastes hours), uploading sensitive product data to third-party cleaners (GDPR violation), using Excel wildcards on complex nested HTML (tags remain), forgetting to decode HTML entities like after tag removal.
Strip once. Import successfully. Never manually remove HTML tags again.
For e-commerce businesses migrating platforms or marketing teams exporting campaign data, HTML tags in CSVs aren't just annoying—they're migration-blocking. Invest 5-10 minutes cleaning HTML from CSVs once instead of troubleshooting failed imports repeatedly.
Sources:
- Shopify CSV Export Documentation - Official Shopify CSV format specifications
- Google Sheets REGEXREPLACE - Google Sheets regex function documentation
- MDN Web Workers API - Client-side processing specification
Migrating e-commerce platforms or cleaning product data? Connect on LinkedIn or share your workflow at @splitforge.