Fast Fix (60 Seconds)
If you have product codes buried in a CSV column right now:
- Open SplitForge Pattern Extraction — no account required
- Upload your product CSV
- Select the column containing the buried codes (Description, Title, Notes)
- Choose ID/Code mode and enter your code prefix (e.g., "SKU-" or "ASIN:")
- Download — extracted codes appear in a new column
No upload. No Python. Runs in your browser.
TL;DR: Product catalog CSVs frequently bury SKUs, ASINs, GTINs, and internal codes inside description fields, product titles, or notes columns. Manually isolating these codes across thousands of rows causes errors and delays catalog launches. SplitForge Pattern Extraction identifies and extracts any product code format — with or without a known prefix — in a single browser-based pass.
Your supplier sent a 12,000-row product catalog. The SKU is embedded in the product title: "Blue Widget 500ml [SKU-8847] Case of 12." Your inventory system needs a clean SKU column to import. Your options are to write a formula for every row, pay someone to clean it manually, or spend two hours on a Python script.
There's a faster way.
Each extraction scenario was tested using SplitForge Pattern Extraction against real-world product catalog exports from e-commerce and inventory systems ranging from 5,000 to 400,000 rows, March 2026. In one wholesale catalog we analyzed, SKUs appeared in five different column locations across the file — title, description, notes, a dedicated SKU field, and a legacy reference column — all requiring extraction and consolidation before a clean import was possible.
Product code extraction has one failure mode that breaks basic tools: codes without consistent prefixes. An ASIN is always 10 characters starting with B0 — a reliable pattern. A supplier's internal SKU might be 6 alphanumeric characters with no prefix at all. The strategy for each is different, and most guides only cover the easy prefix case. Files are streamed and processed in dedicated Web Worker threads to avoid blocking the UI thread — your catalog data never leaves your browser. This guide covers all extraction strategies.
Product Code Formats and How to Extract Each
Product codes follow predictable formats once you know the standard. The challenge is that different marketplaces and systems use different conventions — and a single catalog often mixes several.
| Code Type | Format | Example | Extraction Strategy |
|---|---|---|---|
| ASIN (Amazon) | 10 chars, starts with B0 | B0CXYZ1234 | Pattern: B0 + 8 alphanumeric |
| SKU (internal) | Varies — often prefix + digits | SKU-88471 | Anchor to prefix |
| UPC-A (US barcode) | 12 digits exactly | 012345678901 | 12-digit numeric string |
| EAN-13 (EU barcode) | 13 digits exactly | 0123456789012 | 13-digit numeric string |
| GTIN-14 (logistics) | 14 digits exactly | 01234567890128 | 14-digit numeric string |
| ISBN-13 | 13 digits, starts with 978 or 979 | 9781234567890 | Anchor to 978/979 prefix |
| Custom internal code | Alphanumeric, often 6-10 chars | A1B2C3 | Define length and character class |
This is the reference table. Before running any extraction, identify your code type and select the matching strategy. Getting this right before configuring the tool saves multiple re-runs.
Which Extraction Strategy Should You Use?
Before opening the tool, answer these three questions. The answers determine your configuration:
Do you know the code prefix? (e.g., "SKU-", "ASIN:", "INV-")
│
├── YES → Use Prefix + Length mode
│ Enter the prefix. Set expected character count after prefix.
│ False positive risk: very low.
│
└── NO → Is the code a fixed numeric length? (UPC, EAN, GTIN, ISBN)
│
├── YES → Use Exact Numeric Length mode
│ Set exact digit count (12, 13, or 14).
│ Enable check-digit validation if available.
│ False positive risk: low with strict exact length.
│
└── NO → Do you know the character type and approximate length?
│
├── YES → Use Character Class + Length Range mode
│ Define alphanumeric vs digits-only.
│ Set min/max length.
│ False positive risk: medium — always preview first.
│
└── NO → Sample 100–200 rows manually first.
Identify the most consistent structural feature.
Return to step 1 with that anchor.
The decision point most people skip is the last one. If you can't describe the code format in one sentence, extraction will produce garbage. A 10-minute manual review of 100 rows is faster than re-running extraction 5 times trying to tune a pattern against nothing specific.
Table of Contents
- Product Code Formats and How to Extract Each
- Which Extraction Strategy Should You Use?
- How to Extract SKUs with a Known Prefix
- How to Extract ASINs from Product Titles or Descriptions
- How to Extract Barcodes (UPC, EAN, GTIN)
- How to Extract Custom Internal Codes
- Cleaning Extracted Codes
- Common Catalog Scenarios
- Additional Resources
- FAQ
This guide is for: E-commerce managers, catalog teams, Amazon sellers, and inventory managers who need to isolate product identifiers from mixed-content CSV columns.
How to Extract SKUs with a Known Prefix
SKUs with consistent prefixes are the most straightforward extraction case. If your SKUs always start with "SKU-", "ITEM-", or another fixed string, the prefix anchors the pattern reliably and reduces false positives to near zero.
Step 1: Identify the prefix and code structure
Examine 10–20 rows in your source column and answer these questions:
- Does the prefix appear consistently? (SKU-XXXX vs some rows with SKU_XXXX — note the separator difference)
- What characters follow the prefix — digits only, alphanumeric, or mixed?
- Is the code length fixed or variable?
Step 2: Configure extraction
- Open SplitForge Pattern Extraction
- Upload your CSV and select the source column
- Choose ID/Code mode
- Enter your prefix in the Prefix field (e.g., "SKU-")
- Set the expected code length or range (e.g., 4–8 characters after the prefix)
Step 3: Handle prefix variations
If your data has inconsistent separators (SKU-8847 and SKU_8847 both appear), run two extraction passes — one for each variant — then use SplitForge Find & Replace to standardize the separator in the combined output.
What success looks like:
- Only the code portion appears in the extracted column (not the prefix + code)
- Rows without the prefix return blank cells
- No surrounding text appears in the extracted value
- If surrounding text appears, reduce the maximum match length in settings
How to Extract ASINs from Product Titles or Descriptions
Amazon Standard Identification Numbers follow a fixed 10-character format beginning with "B0" for physical products. This makes ASINs reliably extractable even without a surrounding text prefix — the format itself is the anchor.
GS1, the global supply chain standards body, maintains GTIN identifiers that are separate from and distinct from ASINs. Don't conflate them when setting up extraction for multi-marketplace catalogs.
Step 1: Use the ASIN format pattern
In ID/Code mode, enter "B0" as the prefix and set total length to exactly 10 characters (2-character prefix + 8 alphanumeric characters). Enable alphanumeric matching (letters and digits, case-insensitive).
Step 2: Verify the preview
ASINs appear in many contexts in product data — product titles, descriptions, bundle notes, and cross-reference fields. The preview will show every match. Check that all extracted values are exactly 10 characters and start with B0. Any other pattern is a false positive.
Step 3: Handle older B-format ASINs
Older ASINs sometimes start with "B" followed by 9 digits (e.g., B000123456 — 10 chars, starts with B but not B0). If your catalog includes older products, run a second extraction pass targeting the "B" prefix with digit-only characters and merge with the primary output.
How to Extract Barcodes (UPC, EAN, GTIN)
Barcodes are fixed-length numeric strings: UPC-A is 12 digits, EAN-13 is 13 digits, GTIN-14 is 14 digits. The extraction strategy is strict digit-count matching — no prefix needed. The challenge is false positives from other numeric strings (phone numbers, order IDs, date-time values) that happen to contain similar digit sequences.
The fix is strict exact-length enforcement. A phone number is 10–11 digits. A UPC is exactly 12. The one-digit difference is enough to separate them reliably.
Step 1: Choose the barcode standard
Determine which standard your catalog uses. US retail uses UPC-12. European and international products use EAN-13. Distribution and logistics often use GTIN-14. If your catalog mixes standards, extract each in a separate pass.
Step 2: Configure strict numeric matching
In ID/Code mode, set:
- Character class: Digits only
- Exact length: 12, 13, or 14 (strict exact match — not a range)
- No prefix
The strict exact length is what separates barcodes from similar-length numeric codes. Never use a length range for barcode extraction.
Step 3: Validate with check digit awareness
UPC-12 and EAN-13 both include a check digit — the final digit is calculated from the preceding digits using the GS1 check digit algorithm, which applies a weighted modulo-10 calculation to the preceding 11 or 12 digits. SplitForge Pattern Extraction flags potential check digit failures in the preview when barcode mode is enabled. Any value that fails check digit validation is not a genuine barcode — it's a numeric string that happens to be the right length.
How to Extract Custom Internal Codes
Internal codes without consistent prefixes or fixed lengths are the hardest extraction case. The right approach depends on what you know about the code format.
If you know the exact length: Set exact length matching with the appropriate character class. A 6-character alphanumeric code can be reliably extracted if no other 6-character alphanumeric strings appear in the same column.
If you know a partial structural pattern: Even a partial anchor helps. "Codes always contain a hyphen after the third character" (e.g., A1B-234) narrows the match significantly. Use the structural pattern option to define this.
If the code has no distinguishing features: A manual review of the first 100 rows to identify consistent structure is faster than trying to tune extraction against nothing specific. Even "letters in positions 1-3, then digits" is a usable constraint.
Cleaning Extracted Codes
Extraction outputs often need minor cleanup before import. Three issues appear consistently:
Leading and trailing spaces: Cells where the code is surrounded by spaces in the source (" SKU-8847 "). Use SplitForge Data Cleaner to trim whitespace from the extracted column.
Inconsistent case: "SKU-8847" and "sku-8847" in the same column. Standardize to uppercase using the column transform option before import to avoid duplicate records caused by case differences.
Duplicate codes: If the same product appears in multiple rows with different descriptions, extraction produces duplicate code entries. Use SplitForge Remove Duplicates to deduplicate the extracted code column before importing to your catalog system.
Common Catalog Scenarios
Scenario 1: Supplier feed with SKU in the product title
Source column value: "Blue Widget 500ml [SKU-8847] Case of 12" Configuration: Bracket-enclosed SKU — prefix: "[SKU-", suffix anchor: "]" Extracted result: SKU-8847
Scenario 2: Amazon export with ASIN in the description
Source column value: "Premium Headphones (ASIN: B0CXY12345) — Compatible with iOS and Android" Configuration: ASIN prefix with colon — prefix: "ASIN: ", total length: 10 Extracted result: B0CXY12345
Scenario 3: Wholesale catalog with EAN-13 at the start of each row
Source column value: "0123456789012 — Blue Widget 500ml — Case of 12" Configuration: 13-digit numeric string, position: start of cell, exact length: 13 Extracted result: 0123456789012
Scenario 4: Internal catalog with no consistent format
Source: Mixed descriptions with alphanumeric codes scattered throughout, no visible pattern Approach: Export a 200-row sample first. Identify the most consistent structural feature — length, character set, or position. Define that as your extraction pattern and test on the sample before running on the full file.
For more on working with product catalogs, see our guide on cleaning product catalog CSVs for Shopify and WooCommerce.
Additional Resources
Product Identifier Standards:
- GS1 General Specifications — Defines UPC, EAN, GTIN, and GLN barcode standards
- ISO/IEC 15459: Supply Chain Identifiers — International standard for item, transport unit, and location identifiers
Amazon Seller Reference:
- Amazon SP-API Catalog Items Reference — ASIN and catalog item structure documentation
Browser Processing:
- MDN Web Workers API — How browser-based background processing handles large files without blocking
Related SplitForge Guides:
- Extract emails, phone numbers, and URLs from CSV columns — Contact data extraction reference
- Fix 'Invalid SKU Format' in Amazon Seller Central CSV — Amazon-specific SKU validation guide