Why do HTML tags appear in Shopify CSV exports?

Shopify exports "Body (HTML)" field which contains full HTML markup used for web display. This includes , , , , and other formatting tags. To avoid, copy product descriptions to plain text fields before export or strip tags post-export.

How do I preserve line breaks when removing HTML tags?

Before removing tags, convert or to newline characters. Find and replace with \n (newline). Find and replace with \n. Then remove remaining tags. This preserves paragraph structure in cleaned text.

csv-troubleshooting

Remove HTML Tags from CSV: Complete Guide for Clean Imports

January 23, 2026

By SplitForge Team

You're migrating 3,000 products from Shopify to a new platform. The CSV export looks perfect—product titles, prices, SKUs, descriptions all present. You map the fields, click Import.

Import failed: 1,847 products rejected. "Invalid characters in description field: row 487, 892, 1,203..."

You scroll to row 487. The description field contains:

<p><strong>Premium Leather Wallet</strong></p><ul><li>Genuine Italian leather</li><li>RFID protection</li><li>Holds 12 cards</li></ul><p>Perfect gift for professionals.</p>

The platform expects plain text. What you see is formatted HTML with paragraph tags, lists, and bold formatting—exactly how it displayed on your Shopify store, but unusable for systems expecting clean text.

TL;DR: CSV imports fail because e-commerce platforms export HTML markup (<p>, <strong>, <ul>) instead of plain text. Strip tags using Excel wildcards (<*>), regex patterns (<[^>]*>), or browser-based tools. Never manually edit 1,000+ rows—automate it in 2-10 minutes depending on file size and method chosen.

Quick Answer

CSV imports fail when exported data contains HTML formatting tags (<p>, <strong>, <div>, etc.) that target systems don't accept. E-commerce platforms export product descriptions as HTML for web display, but when reimporting to different systems or analyzing data, the markup breaks validation rules and corrupts plain text fields.

HTML tags appear because source systems store rich text with formatting—Shopify exports "Body (HTML)" field with full markup, CMS exports include <h2> headers and <ul> lists, CRM exports preserve email formatting with <br> tags and <a> links, and web scraping captures page HTML instead of displayed text.

The fix: Use Excel find/replace with wildcard pattern <*> to remove all tags (works for simple cases), apply regex pattern <[^>]*> in text editors for complex nested HTML, or use automated data cleaning tools that process files client-side without uploading sensitive product/customer data to third-party servers.

Fast Fix (90 Seconds)

If your CSV import just failed due to HTML tags, try this first:

Open CSV in Excel - Load file in Excel or Google Sheets
Identify HTML columns - Spot tags: <p>, <strong>, <br>, <div>, <span>
Excel find/replace - Find: <*> | Replace: (empty) | Search: Use wildcards
Verify results - Check that actual content remained: <p>Great product</p> → Great product
Re-export as CSV - Save as CSV (UTF-8) and retry import

If you have 1,000+ rows, nested tags, or HTML entities ( , "), continue below for automated methods.

Quick Diagnosis: Identify Your HTML Problem
Method 1: Excel Find & Replace
Method 2: Regex Pattern Matching
Method 3: Automated Data Cleaning Tools
Common HTML Scenarios
Troubleshooting Failed Cleanups
FAQ

Quick Diagnosis: Identify Your HTML Problem

Before choosing a cleanup method, identify your specific HTML situation. Open your CSV in text editor and search for < to reveal tag patterns.

What type of HTML tags are present?

Basic formatting tags:

<p>text</p> (paragraphs)
<strong>text</strong> or <b>text</b> (bold)
<br> or <br /> (line breaks)

Structural tags:

<div>, <span> (containers)
<ul><li>item</li></ul> (lists)
<h1>, <h2>, <h3> (headers)

Complex tags:

Inline styles: <p style="color: red">text</p>
Comments: 
Script tags: <script>code here</script>

Are HTML entities present?

Search your CSV for &:

  (non-breaking space)
" or " (quotation marks)
& (ampersand)
< and > (encoded < and >)

If present: You need HTML entity decoding in addition to tag removal.

Match your situation to method:

Simple tags, <500 rows → Method 1 (Excel)
Nested tags, any volume → Method 2 (Regex)
Recurring imports, privacy concerns → Method 3 (Browser tools)

Method 1: Excel Find & Replace

Best for: Small files (<500 rows), simple HTML tags, one-time cleanup.

Excel's wildcard pattern removes basic HTML tags but fails on nested tags or malformed HTML. For simple product descriptions with <p>, <strong>, and <br> tags, this 2-minute solution works. For a comprehensive overview of CSV troubleshooting techniques beyond HTML cleanup, see our CSV troubleshooting guide.

Step-by-step:

Open CSV in Excel - File → Open → Select CSV
Enable wildcards - Press Ctrl+H → Click "Options >>" → Check "Use wildcards"
Remove tags - Find: <*> | Replace: (empty) | Click "Replace All"
Clean spacing - Find: (double space) | Replace: (single space)
Save as CSV - File → Save As → CSV (UTF-8)

Example transformation:

Before: <p><strong>Premium Wallet</strong></p><ul><li>Leather</li></ul>
After: Premium WalletLeather

Limitation: Spacing between elements gets lost. For better results with complex HTML, use Method 2.

When Excel fails:

Excel wildcards don't handle nested tags (<div><p>text</p></div>), tags spanning cells, or malformed HTML (<p>text without closing bracket). Use Method 2 for these cases.

Method 2: Regex Pattern Matching

Best for: Complex nested HTML, large files, users comfortable with text editors.

Regex pattern <[^>]*> matches any content between angle brackets regardless of tag name or nesting. This handles all HTML tags that basic Excel wildcards miss.

Core regex pattern:

<[^>]*>

What it means:

< - Match opening angle bracket
[^>]* - Match any character except >, zero or more times
> - Match closing angle bracket

Tools that support regex:

Notepad++ (Windows, free)
VS Code (Mac/Windows/Linux, free)
Sublime Text (paid)
Google Sheets REGEXREPLACE function

Notepad++ example:

Open CSV in Notepad++
Press Ctrl+H to open Find & Replace
Find: <[^>]*> | Replace: (empty)
Search mode: Check "Regular expression"
Click "Replace All"

Decode HTML entities:

After removing tags, replace entities:

  → (space)
" → "
& → &
< → <
> → >

Google Sheets formula:

=REGEXREPLACE(A2,"<[^>]*>","")

Apply to column, copy results, paste as values, delete original column. See Google Sheets REGEXREPLACE documentation for syntax details.

Test regex patterns:

Use regex101.com to test patterns before applying:

Select "ECMAScript (JavaScript)" flavor
Paste sample HTML in test string
Enter pattern <[^>]*>
Verify it highlights all tags correctly
Copy working pattern to your editor

Method 3: Automated Data Cleaning Tools

Best for: Large files (1,000+ rows), recurring imports, privacy requirements.

Browser-based tools process CSV files entirely on your device without uploading data to servers. Critical for GDPR compliance, e-commerce customer data, or any company that cannot upload product information to third parties. For a complete privacy-first CSV workflow, see our data privacy checklist.

Client-side processing advantages:

File never leaves your computer
No data transmission over networks
No third-party access
GDPR-aligned and HIPAA-aligned by architecture
Processing speed: 100K+ rows in seconds

Recommended tools:

SplitForge Data Cleaner (Browser-based, no upload)

Navigate to Data Cleaner tool
Drag/drop CSV file (processes in-browser only)
Select column containing HTML
Click "Remove HTML Tags" operation
Preview before/after results
Download cleaned file
Processing: 1M rows in 15-25 seconds

OpenRefine (Desktop app, free)

Download and install locally
Import CSV file
Apply transformation: value.replace(/<[^>]*>/g, "")
Export cleaned data
Best for: Data professionals, recurring workflows

How client-side processing works:

Processing happens in your browser using JavaScript and Web Workers API. File reads locally via FileReader API, processes in memory, cleaned file downloads directly. Zero data uploads.

Privacy-first workflow:

For companies with GDPR, HIPAA, PCI-DSS, or internal data security policies:

Use browser-based tools only (no uploads)
Verify processing happens client-side (check browser Network tab)
Document compliance in audit logs
Save workflow for recurring imports

Common HTML Scenarios

Shopify Product Export

Problem: "Body (HTML)" field exports with full markup according to Shopify's CSV format documentation.

Sample data:

<meta charset="utf-8"><p><strong>Product Features:</strong></p><ul><li>Feature 1</li><li>Feature 2</li></ul>

Solution: Use regex pattern <[^>]*> or browser tools. Result: Product Features:Feature 1Feature 2

Shopify tip: When exporting, Shopify includes both HTML and plain text fields. Use bulk editor to copy plain text versions before export. For comprehensive guidance on cleaning product catalogs for Shopify and WooCommerce, see our product catalog CSV cleaning guide.

WooCommerce/WordPress Export

Problem: Product descriptions contain paragraph tags, line breaks, and formatting.

Sample data:

<p>Handcrafted wooden desk organizer.</p><p>Dimensions: 12" x 6" x 4"</p><p><strong>Made from sustainable bamboo.</strong></p>

Solution: Excel wildcards or regex removes all tags. Result: Handcrafted wooden desk organizer. Dimensions: 12" x 6" x 4" Made from sustainable bamboo.

CRM Email Body Export

Problem: Exported email campaigns contain links, images, styling.

Sample data:

<p>Hi <span style="color: #0000ff;">John</span>,</p><p>Check out our new <a href="https://example.com">product line</a>!</p>

Solution: Regex strips all tags and styles. Result: Hi John, Check out our new product line!

Note: Link URLs are lost. If needed, extract URLs separately before stripping tags.

Troubleshooting Failed Cleanups

Problem 1: Tags remain after find/replace

Symptoms: Still see <p>, </div>, or other tags after cleanup.

Causes:

Pattern didn't match (Excel wildcards vs regex syntax error)
Tags split across cells/lines
Malformed HTML: <p text (missing closing bracket)

Solutions:

Check pattern syntax - Excel uses <*> (wildcard), regex uses <[^>]*>
Handle split tags - Combine cells first with Concatenate function
Use robust parser - Browser tools handle malformed HTML automatically

Problem 2: Content disappeared with tags

Symptoms: Cells blank or missing content after tag removal.

Cause: Pattern matched legitimate content like <5kg weight> (not an HTML tag).

Solution: Use more specific pattern that requires tag names to start with letter: </?[a-z][a-z0-9]*[^>]*> excludes <5kg> but matches <p>, <strong>, etc.

Problem 3: HTML entities remain

Symptoms: After cleanup, see  , ", '.

Cause: Tag removal doesn't decode entities automatically.

Solution: Manual find/replace for common entities or use tools with automatic entity decoding (SplitForge Data Cleaner, BeautifulSoup).

FAQ

Use text editors with regex support (Notepad++, VS Code), Google Sheets REGEXREPLACE function, or browser-based CSV cleaning tools like SplitForge Data Cleaner. Regex pattern <[^>]*> removes all HTML tags when replaced with empty string.

Shopify exports "Body (HTML)" field which contains full HTML markup used for web display. This includes <p>, <ul>, <li>, <strong>, and other formatting tags. To avoid, copy product descriptions to plain text fields before export or strip tags post-export.

Before removing tags, convert <br> or <p> to newline characters. Find <br\s*/?> and replace with \n (newline). Find </p> and replace with \n. Then remove remaining tags. This preserves paragraph structure in cleaned text.

Yes. Use local software (Excel, Notepad++, VS Code) or client-side browser tools that process files locally without server uploads. SplitForge processes CSVs entirely in-browser using JavaScript—file never leaves your computer. Critical for GDPR/HIPAA compliance.

Removing tags strips <p>, <strong>, etc. Decoding entities converts   to space, " to quotation marks, & to ampersand. Both needed for complete cleanup. Most automated tools do both. Manual find/replace requires separate steps for each.

Search for < character in text editor or spreadsheet. If found, review context—could be legitimate (temperatures, comparisons) or remaining tags. Use regex search <[^>]*> to find any HTML patterns. Import small sample to target system and check for validation errors.

Dealing with other CSV import errors? See our complete guide: CSV Import Errors: Every Cause, Every Fix (2026)

Conclusion

CSV imports fail because e-commerce platforms export HTML markup instead of plain text. Shopify's "Body (HTML)" field, WooCommerce descriptions, and CRM email bodies all contain <p>, <strong>, <ul> tags that break import validation on systems expecting clean text.

The fastest fix depends on file size and complexity:

Simple cases (<500 rows): Excel find/replace with <*> wildcard pattern (2 minutes)
Complex nested HTML: Regex pattern <[^>]*> in Notepad++ or VS Code (5 minutes)
Large files (1,000+ rows): Browser-based tools process in seconds without uploads (GDPR compliant)

Most common mistakes: Manually editing thousands of rows (wastes hours), uploading sensitive product data to third-party cleaners (GDPR violation), using Excel wildcards on complex nested HTML (tags remain), forgetting to decode HTML entities like   after tag removal.

Strip once. Import successfully. Never manually remove HTML tags again.

For e-commerce businesses migrating platforms or marketing teams exporting campaign data, HTML tags in CSVs aren't just annoying—they're migration-blocking. Invest 5-10 minutes cleaning HTML from CSVs once instead of troubleshooting failed imports repeatedly.

Sources:

Shopify CSV Export Documentation - Official Shopify CSV format specifications
Google Sheets REGEXREPLACE - Google Sheets regex function documentation
MDN Web Workers API - Client-side processing specification

Migrating e-commerce platforms or cleaning product data? Connect on LinkedIn or share your workflow at @splitforge.

Clean HTML from CSV Files—No Uploads Required

Remove all HTML tags in seconds with browser-based processing

Handle 10,000+ products without manual editing

Zero uploads—your product data never leaves your browser

Works with Shopify, WooCommerce, and all e-commerce exports

Clean CSV Data Now →

Remove HTML Tags from CSV: Complete Guide for Clean Imports

Quick Answer

Fast Fix (90 Seconds)

Table of Contents

Quick Diagnosis: Identify Your HTML Problem

What type of HTML tags are present?

Are HTML entities present?

Match your situation to method:

Method 1: Excel Find & Replace

Step-by-step:

When Excel fails:

Method 2: Regex Pattern Matching

Core regex pattern:

Tools that support regex:

Notepad++ example:

Decode HTML entities:

Google Sheets formula:

Test regex patterns:

Method 3: Automated Data Cleaning Tools

Client-side processing advantages:

Recommended tools:

Privacy-first workflow:

Common HTML Scenarios

Shopify Product Export

WooCommerce/WordPress Export

CRM Email Body Export

Troubleshooting Failed Cleanups

Problem 1: Tags remain after find/replace

Problem 2: Content disappeared with tags

Problem 3: HTML entities remain

FAQ

How do I remove HTML tags from CSV without Excel?

Why do HTML tags appear in Shopify CSV exports?

How do I preserve line breaks when removing HTML tags?

Is there a way to remove HTML tags from CSV without uploading to third-party sites?

What's the difference between removing tags and decoding HTML entities?

How do I verify all HTML tags were removed?

Conclusion

Clean HTML from CSV Files—No Uploads Required

Quick Answer

Fast Fix (90 Seconds)

Table of Contents

Quick Diagnosis: Identify Your HTML Problem

What type of HTML tags are present?

Are HTML entities present?

Match your situation to method:

Method 1: Excel Find & Replace

Step-by-step:

When Excel fails:

Method 2: Regex Pattern Matching

Core regex pattern:

Tools that support regex:

Notepad++ example:

Decode HTML entities:

Google Sheets formula:

Test regex patterns:

Method 3: Automated Data Cleaning Tools

Client-side processing advantages:

Recommended tools:

Privacy-first workflow:

Common HTML Scenarios

Shopify Product Export

WooCommerce/WordPress Export

CRM Email Body Export

Troubleshooting Failed Cleanups

Problem 1: Tags remain after find/replace

Problem 2: Content disappeared with tags

Problem 3: HTML entities remain

FAQ

How do I remove HTML tags from CSV without Excel?

Why do HTML tags appear in Shopify CSV exports?

How do I preserve line breaks when removing HTML tags?

Is there a way to remove HTML tags from CSV without uploading to third-party sites?

What's the difference between removing tags and decoding HTML entities?

How do I verify all HTML tags were removed?

Conclusion

Clean HTML from CSV Files—No Uploads Required

Continue Reading

AI-Ready Data Checklist: 10 Things to Verify Before Upload (2026)

Convert Excel to JSON for AI APIs and LLM Pipelines (2026)

Prepare Data for AI: The Complete Guide (Privacy-First, 2026)