Back to Blog
csv-troubleshooting

Remove HTML Tags from CSV: Complete Guide for Clean Imports

January 23, 2026
14
By SplitForge Team

You're migrating 3,000 products from Shopify to a new platform. The CSV export looks perfect—product titles, prices, SKUs, descriptions all present. You map the fields, click Import.

Import failed: 1,847 products rejected. "Invalid characters in description field: row 487, 892, 1,203..."

You scroll to row 487. The description field contains:

<p><strong>Premium Leather Wallet</strong></p><ul><li>Genuine Italian leather</li><li>RFID protection</li><li>Holds 12 cards</li></ul><p>Perfect gift for professionals.</p>

The platform expects plain text. What you see is formatted HTML with paragraph tags, lists, and bold formatting—exactly how it displayed on your Shopify store, but unusable for systems expecting clean text.

TL;DR: CSV imports fail because e-commerce platforms export HTML markup (<p>, <strong>, <ul>) instead of plain text. Strip tags using Excel wildcards (<*>), regex patterns (<[^>]*>), or browser-based tools. Never manually edit 1,000+ rows—automate it in 2-10 minutes depending on file size and method chosen.


Quick Answer

CSV imports fail when exported data contains HTML formatting tags (<p>, <strong>, <div>, etc.) that target systems don't accept. E-commerce platforms export product descriptions as HTML for web display, but when reimporting to different systems or analyzing data, the markup breaks validation rules and corrupts plain text fields.

HTML tags appear because source systems store rich text with formattingShopify exports "Body (HTML)" field with full markup, CMS exports include <h2> headers and <ul> lists, CRM exports preserve email formatting with <br> tags and <a> links, and web scraping captures page HTML instead of displayed text.

The fix: Use Excel find/replace with wildcard pattern <*> to remove all tags (works for simple cases), apply regex pattern <[^>]*> in text editors for complex nested HTML, or use automated data cleaning tools that process files client-side without uploading sensitive product/customer data to third-party servers.


Fast Fix (90 Seconds)

If your CSV import just failed due to HTML tags, try this first:

  1. Open CSV in Excel - Load file in Excel or Google Sheets
  2. Identify HTML columns - Spot tags: <p>, <strong>, <br>, <div>, <span>
  3. Excel find/replace - Find: <*> | Replace: (empty) | Search: Use wildcards
  4. Verify results - Check that actual content remained: <p>Great product</p>Great product
  5. Re-export as CSV - Save as CSV (UTF-8) and retry import

If you have 1,000+ rows, nested tags, or HTML entities (&nbsp;, &quot;), continue below for automated methods.


Table of Contents


Quick Diagnosis: Identify Your HTML Problem

Before choosing a cleanup method, identify your specific HTML situation. Open your CSV in text editor and search for < to reveal tag patterns.

What type of HTML tags are present?

Basic formatting tags:

  • <p>text</p> (paragraphs)
  • <strong>text</strong> or <b>text</b> (bold)
  • <br> or <br /> (line breaks)

Structural tags:

  • <div>, <span> (containers)
  • <ul><li>item</li></ul> (lists)
  • <h1>, <h2>, <h3> (headers)

Complex tags:

  • Inline styles: <p style="color: red">text</p>
  • Comments: <!-- This is a comment -->
  • Script tags: <script>code here</script>

Are HTML entities present?

Search your CSV for &:

  • &nbsp; (non-breaking space)
  • &quot; or &#34; (quotation marks)
  • &amp; (ampersand)
  • &lt; and &gt; (encoded < and >)

If present: You need HTML entity decoding in addition to tag removal.

Match your situation to method:

  • Simple tags, <500 rows → Method 1 (Excel)
  • Nested tags, any volume → Method 2 (Regex)
  • Recurring imports, privacy concerns → Method 3 (Browser tools)

Method 1: Excel Find & Replace

Best for: Small files (<500 rows), simple HTML tags, one-time cleanup.

Excel's wildcard pattern removes basic HTML tags but fails on nested tags or malformed HTML. For simple product descriptions with <p>, <strong>, and <br> tags, this 2-minute solution works. For a comprehensive overview of CSV troubleshooting techniques beyond HTML cleanup, see our CSV troubleshooting guide.

Step-by-step:

  1. Open CSV in Excel - File → Open → Select CSV
  2. Enable wildcards - Press Ctrl+H → Click "Options >>" → Check "Use wildcards"
  3. Remove tags - Find: <*> | Replace: (empty) | Click "Replace All"
  4. Clean spacing - Find: (double space) | Replace: (single space)
  5. Save as CSV - File → Save As → CSV (UTF-8)

Example transformation:

Before: <p><strong>Premium Wallet</strong></p><ul><li>Leather</li></ul>
After: Premium WalletLeather

Limitation: Spacing between elements gets lost. For better results with complex HTML, use Method 2.

When Excel fails:

Excel wildcards don't handle nested tags (<div><p>text</p></div>), tags spanning cells, or malformed HTML (<p>text without closing bracket). Use Method 2 for these cases.


Method 2: Regex Pattern Matching

Best for: Complex nested HTML, large files, users comfortable with text editors.

Regex pattern <[^>]*> matches any content between angle brackets regardless of tag name or nesting. This handles all HTML tags that basic Excel wildcards miss.

Core regex pattern:

<[^>]*>

What it means:

  • < - Match opening angle bracket
  • [^>]* - Match any character except >, zero or more times
  • > - Match closing angle bracket

Tools that support regex:

  • Notepad++ (Windows, free)
  • VS Code (Mac/Windows/Linux, free)
  • Sublime Text (paid)
  • Google Sheets REGEXREPLACE function

Notepad++ example:

  1. Open CSV in Notepad++
  2. Press Ctrl+H to open Find & Replace
  3. Find: <[^>]*> | Replace: (empty)
  4. Search mode: Check "Regular expression"
  5. Click "Replace All"

Decode HTML entities:

After removing tags, replace entities:

  • &nbsp; (space)
  • &quot;"
  • &amp;&
  • &lt;<
  • &gt;>

Google Sheets formula:

=REGEXREPLACE(A2,"<[^>]*>","")

Apply to column, copy results, paste as values, delete original column. See Google Sheets REGEXREPLACE documentation for syntax details.

Test regex patterns:

Use regex101.com to test patterns before applying:

  1. Select "ECMAScript (JavaScript)" flavor
  2. Paste sample HTML in test string
  3. Enter pattern <[^>]*>
  4. Verify it highlights all tags correctly
  5. Copy working pattern to your editor

Method 3: Automated Data Cleaning Tools

Best for: Large files (1,000+ rows), recurring imports, privacy requirements.

Browser-based tools process CSV files entirely on your device without uploading data to servers. Critical for GDPR compliance, e-commerce customer data, or any company that cannot upload product information to third parties. For a complete privacy-first CSV workflow, see our data privacy checklist.

Client-side processing advantages:

  • File never leaves your computer
  • No data transmission over networks
  • No third-party access
  • GDPR/HIPAA compliant by architecture
  • Processing speed: 100K+ rows in seconds

SplitForge Data Cleaner (Browser-based, no upload)

  1. Navigate to Data Cleaner tool
  2. Drag/drop CSV file (processes in-browser only)
  3. Select column containing HTML
  4. Click "Remove HTML Tags" operation
  5. Preview before/after results
  6. Download cleaned file
  7. Processing: 1M rows in 15-25 seconds

OpenRefine (Desktop app, free)

  • Download and install locally
  • Import CSV file
  • Apply transformation: value.replace(/<[^>]*>/g, "")
  • Export cleaned data
  • Best for: Data professionals, recurring workflows

How client-side processing works:

Processing happens in your browser using JavaScript and Web Workers API. File reads locally via FileReader API, processes in memory, cleaned file downloads directly. Zero data uploads.

Privacy-first workflow:

For companies with GDPR, HIPAA, PCI-DSS, or internal data security policies:

  1. Use browser-based tools only (no uploads)
  2. Verify processing happens client-side (check browser Network tab)
  3. Document compliance in audit logs
  4. Save workflow for recurring imports

Common HTML Scenarios

Shopify Product Export

Problem: "Body (HTML)" field exports with full markup according to Shopify's CSV format documentation.

Sample data:

<meta charset="utf-8"><p><strong>Product Features:</strong></p><ul><li>Feature 1</li><li>Feature 2</li></ul>

Solution: Use regex pattern <[^>]*> or browser tools. Result: Product Features:Feature 1Feature 2

Shopify tip: When exporting, Shopify includes both HTML and plain text fields. Use bulk editor to copy plain text versions before export. For comprehensive guidance on cleaning product catalogs for Shopify and WooCommerce, see our product catalog CSV cleaning guide.

WooCommerce/WordPress Export

Problem: Product descriptions contain paragraph tags, line breaks, and formatting.

Sample data:

<p>Handcrafted wooden desk organizer.</p><p>Dimensions: 12" x 6" x 4"</p><p><strong>Made from sustainable bamboo.</strong></p>

Solution: Excel wildcards or regex removes all tags. Result: Handcrafted wooden desk organizer. Dimensions: 12" x 6" x 4" Made from sustainable bamboo.

CRM Email Body Export

Problem: Exported email campaigns contain links, images, styling.

Sample data:

<p>Hi <span style="color: #0000ff;">John</span>,</p><p>Check out our new <a href="https://example.com">product line</a>!</p>

Solution: Regex strips all tags and styles. Result: Hi John, Check out our new product line!

Note: Link URLs are lost. If needed, extract URLs separately before stripping tags.


Troubleshooting Failed Cleanups

Problem 1: Tags remain after find/replace

Symptoms: Still see <p>, </div>, or other tags after cleanup.

Causes:

  • Pattern didn't match (Excel wildcards vs regex syntax error)
  • Tags split across cells/lines
  • Malformed HTML: <p text (missing closing bracket)

Solutions:

  1. Check pattern syntax - Excel uses <*> (wildcard), regex uses <[^>]*>
  2. Handle split tags - Combine cells first with Concatenate function
  3. Use robust parser - Browser tools handle malformed HTML automatically

Problem 2: Content disappeared with tags

Symptoms: Cells blank or missing content after tag removal.

Cause: Pattern matched legitimate content like <5kg weight> (not an HTML tag).

Solution: Use more specific pattern that requires tag names to start with letter: </?[a-z][a-z0-9]*[^>]*> excludes <5kg> but matches <p>, <strong>, etc.

Problem 3: HTML entities remain

Symptoms: After cleanup, see &nbsp;, &quot;, &#39;.

Cause: Tag removal doesn't decode entities automatically.

Solution: Manual find/replace for common entities or use tools with automatic entity decoding (SplitForge Data Cleaner, BeautifulSoup).


FAQ

Use text editors with regex support (Notepad++, VS Code), Google Sheets REGEXREPLACE function, or browser-based CSV cleaning tools like SplitForge Data Cleaner. Regex pattern <[^>]*> removes all HTML tags when replaced with empty string.

Shopify exports "Body (HTML)" field which contains full HTML markup used for web display. This includes <p>, <ul>, <li>, <strong>, and other formatting tags. To avoid, copy product descriptions to plain text fields before export or strip tags post-export.

Before removing tags, convert <br> or <p> to newline characters. Find <br\s*/?> and replace with \n (newline). Find </p> and replace with \n. Then remove remaining tags. This preserves paragraph structure in cleaned text.

Yes. Use local software (Excel, Notepad++, VS Code) or client-side browser tools that process files locally without server uploads. SplitForge processes CSVs entirely in-browser using JavaScript—file never leaves your computer. Critical for GDPR/HIPAA compliance.

Removing tags strips <p>, <strong>, etc. Decoding entities converts &nbsp; to space, &quot; to quotation marks, &amp; to ampersand. Both needed for complete cleanup. Most automated tools do both. Manual find/replace requires separate steps for each.

Search for < character in text editor or spreadsheet. If found, review context—could be legitimate (temperatures, comparisons) or remaining tags. Use regex search <[^>]*> to find any HTML patterns. Import small sample to target system and check for validation errors.

Dealing with other CSV import errors? See our complete guide: CSV Import Errors: Every Cause, Every Fix (2026)



Conclusion

CSV imports fail because e-commerce platforms export HTML markup instead of plain text. Shopify's "Body (HTML)" field, WooCommerce descriptions, and CRM email bodies all contain <p>, <strong>, <ul> tags that break import validation on systems expecting clean text.

The fastest fix depends on file size and complexity:

  • Simple cases (<500 rows): Excel find/replace with <*> wildcard pattern (2 minutes)
  • Complex nested HTML: Regex pattern <[^>]*> in Notepad++ or VS Code (5 minutes)
  • Large files (1,000+ rows): Browser-based tools process in seconds without uploads (GDPR compliant)

Most common mistakes: Manually editing thousands of rows (wastes hours), uploading sensitive product data to third-party cleaners (GDPR violation), using Excel wildcards on complex nested HTML (tags remain), forgetting to decode HTML entities like &nbsp; after tag removal.

Strip once. Import successfully. Never manually remove HTML tags again.

For e-commerce businesses migrating platforms or marketing teams exporting campaign data, HTML tags in CSVs aren't just annoying—they're migration-blocking. Invest 5-10 minutes cleaning HTML from CSVs once instead of troubleshooting failed imports repeatedly.


Sources:


Migrating e-commerce platforms or cleaning product data? Connect on LinkedIn or share your workflow at @splitforge.

Clean HTML from CSV Files—No Uploads Required

Remove all HTML tags in seconds with browser-based processing
Handle 10,000+ products without manual editing
Zero uploads—your product data never leaves your browser
Works with Shopify, WooCommerce, and all e-commerce exports

Continue Reading

More guides to help you work smarter with your data

csv-guides

How to Audit a CSV File Before Processing

You inherited a CSV from a vendor. Before you load it into anything, you need to know what's actually in it — without trusting the filename.

Read More
csv-guides

Combine First and Last Name Columns in CSV for CRM Import

Your CRM requires a single Full Name column but your export has First and Last split. Here's how to combine them across 100K rows in 30 seconds.

Read More
csv-guides

Data Profiling vs Validation: What Each Reveals in Your CSV

Everyone says 'validate your CSV before import.' But validation can only check what you already know to look for. Profiling finds what you didn't know to check.

Read More