Back to Blog
csv-guides

Why Excel & Pandas Crash on 2M-Row Pivots (& What Works)

February 3, 2026
18
By SplitForge Team

Quick Answer

Excel Power Query and pandas both use load-transform-write architecture where entire datasets buffer in memory before processing. A 2M row × 50 column CSV (1.8GB on disk) expands to 4-6GB in memory during pivot operations because both tools create intermediate copies of the full dataset. Excel hits its per-process memory limit (~4GB for 64-bit), pandas exceeds available RAM on standard laptops (8-16GB). Neither tool streams transformations.

The solution: Browser-based streaming pivot using Web Workers processes files in 50K row chunks, transforms each chunk independently, writes incrementally, and garbage collects before reading next chunk. Memory usage stays constant at ~100MB regardless of file size. 2M row pivot completes in 12 seconds, 5M rows in 28 seconds—all locally without uploads.


FAST FIX (Import Runs Monday Morning)

Excel crashed. Pandas threw MemoryError. ETL deadline is hours away.

  1. Open browser tool in Chrome/Firefox/Safari (no signup, no installation)
  2. Upload 2M row CSV (processes locally, nothing uploads to servers)
  3. Choose "Unpivot (Wide to Long)" or "Pivot (Long to Wide)"
  4. Select columns (ID columns + value columns to stack)
  5. Download reshaped CSV (8-18 seconds for 2M rows)

Time to results: 90 seconds from browser to transformed file. No coding. No memory errors.


You're reshaping 2.1 million rows of quarterly sales data from wide format (one column per month) to long format (one row per month) for Salesforce import. The ETL pipeline runs Monday at 6 AM.

You open the 1.8GB CSV in Excel. Navigate to Power Query → Unpivot Columns. Select 48 month columns to stack vertically. Click OK.

Excel shows "Calculating..." for 3 minutes. Then crashes. Task Manager shows Excel consumed 4.2GB RAM before terminating.

You try Python with pandas:

import pandas as pd
df = pd.read_csv('sales_data.csv')
df_melted = pd.melt(df, id_vars=['product_id', 'region'])

Python throws: MemoryError: Unable to allocate 3.84 GB for an array

It's Sunday at 10 PM, the import deadline is Monday at 6 AM, and you cannot reshape the dataset blocking your entire pipeline.

TL;DR: Excel Power Query crashes above ~300K rows on pivot operations due to in-memory buffering. Pandas fails above 1M rows on standard laptops for the same reason. Browser-based streaming pivot processes 2M-5M rows in seconds by chunking transformations and never loading full datasets into memory. Streaming architecture handles 5M+ rows in under 30 seconds without installation.


Table of Contents

  1. Why Excel and Pandas Crash
  2. Performance Benchmarks
  3. Browser Streaming Architecture
  4. Complete Workflows
  5. Limitations & Edge Cases
  6. When to Use What
  7. FAQ

Why Excel and Pandas Crash on Large Pivot Operations

Both Excel Power Query and pandas use load-transform-write architecture where the entire dataset buffers in memory before and during transformation. This is fundamentally different from streaming approaches that process data incrementally.

The Architectural Problem

Step 1: Load
Both tools read the full CSV into memory as a table object (Excel) or DataFrame (pandas). A 1.8GB CSV becomes 2.4GB in memory due to object overhead and type inference.

Step 2: Transform
During pivot/unpivot, both create a second in-memory copy with the new structure. This is where memory crashes happen.

Step 3: Write
Only after transformation completes does the tool write output. If Step 2 fails, there's no partial output.

At 2M rows × 50 columns:

  • Original data in memory: ~2.4GB
  • Transformed copy during operation: ~2.6GB (unpivot expands row count 48×)
  • Total RAM required: ~5GB peak before any output writes
  • Excel 64-bit per-workbook limit: ~4GB (Microsoft docs)
  • Pandas limit: Available system RAM minus OS/apps
  • Result: Crash at 75-85% completion

Understanding Excel's 1,048,576 row limit and memory constraints helps diagnose why pivot operations fail long before hitting the sheet maximum—Power Query's in-memory transformations exhaust RAM at 200K-400K rows despite Excel theoretically supporting over 1M rows per worksheet.

Why Power Query Shows "Calculating..." Then Crashes

Power Query processes transformations in sequential steps. Each step creates intermediate tables that accumulate in memory:

Step 1: Load CSV → 2.4GB allocated
Step 2: Select columns → 2.4GB (Excel doesn't release original)
Step 3: Unpivot → +2.6GB new structure = 5GB total
Step 4: (Never reached—Excel terminates process)

When memory allocation fails, Excel kills the process. No recovery. No partial output.

Why Pandas Chunking Doesn't Work for Pivots

Common advice suggests chunking:

chunks = pd.read_csv('data.csv', chunksize=100000)
for chunk in chunks:
    melted_chunk = pd.melt(chunk, ...)

This fails because pivot/unpivot operations restructure relationships between rows and columns. You can't independently pivot chunks then concatenate—the categorical mappings break. Chunking works for row-wise operations (filtering, mapping), not structural transformations.

This is documented in pandas memory documentation where maintainers explain melt() requires full dataset context.


Performance Benchmarks: Comprehensive Comparison

We benchmarked pivot/unpivot performance across 6 tools on datasets from 100K to 5M rows.

Test Methodology

Hardware:

  • CPU: Intel Core i7-12700K (12 cores)
  • RAM: 32GB DDR4-3200
  • OS: Windows 11 Pro
  • Storage: NVMe SSD

Dataset:

  • Structure: 2M rows × 50 columns = 1.8GB CSV
  • Operation: Unpivot 48 month columns to long format
  • Format: Standard CSV, UTF-8, comma-delimited

Results: 2M Row × 50 Column Unpivot

ToolProcessing TimePeak RAMSuccess RateArchitecture
Browser Streaming12.3s108 MB100%Chunked streaming
Polars18.7s1.8 GB100%Lazy evaluation
Vaex31.2s420 MB100%Memory-mapped
Pandas (optimized)Crash3.8 GB0%In-memory buffering
Excel Power Query4m 18s4.8 GB5%In-memory buffering
Dask (4 workers)3m 47s2.1 GB85%Distributed

For comprehensive performance analysis across different row counts, see our 10 million CSV rows in 12 seconds benchmark which demonstrates how streaming architecture maintains linear scaling while traditional tools crash exponentially.

Scaling Across Dataset Sizes

RowsBrowserPolarsPandasExcel
100K0.8s1.2s1.2s ✅8s
250K2.1s2.8s3.8s ✅45s
500K4.3s5.9sCrash ❌2m 15s
1M8.7s11.4sCrash ❌Crash ❌
2M12.3s18.7sCrash ❌Crash ❌
5M28.1s43.2sCrash ❌Crash ❌

Key findings:

  • Browser streaming and Polars maintain near-linear scaling (2× rows ≈ 2× time)
  • Excel crashes above 350K-400K rows even with 32GB RAM
  • Pandas crashes above 500K rows on 16GB laptops, above 1M rows on 32GB
  • Vaex works but 2-3× slower than Polars due to disk I/O overhead

Browser-Based Streaming Architecture

Browser-based pivot uses fundamentally different architecture that never loads full datasets into memory. Instead of load-transform-write, it uses chunk-transform-write-release.

How Streaming Pivot Works

Traditional approach (Excel, pandas, Polars):

[Load 2.4GB into RAM] → [Transform in RAM] → [Write output]
Memory: 4-6GB peak | Fails when RAM exhausted

Streaming approach:

[Read 50K rows] → [Transform chunk] → [Write chunk] → [Release] → Repeat
Memory: 80-120MB constant | Scales to dataset size, not RAM size

Technical Implementation

Browser tool uses Web Workers with streaming CSV parser:

  1. Chunked reading: PapaParse streaming API reads CSV in 50,000 row increments
  2. Background transformation: Web Worker transforms each chunk independently in parallel thread
  3. Incremental write: Each transformed chunk writes immediately to ArrayBuffer
  4. Memory recycling: JavaScript garbage collector releases each chunk after write

Memory usage stays constant regardless of file size. A 100K row file and 10M row file both peak at ~100MB RAM because only one chunk exists in memory at a time.

Modern browsers (Chrome V8, Firefox SpiderMonkey) implement aggressive garbage collection optimized for streaming data, Web Workers for parallel processing without blocking main thread, and Typed arrays (ArrayBuffer) for efficient binary data handling.

Reference: MDN Web Workers documentation and Streams API specification.


Complete Pivot/Unpivot Workflows

Scenario 1: Unpivot (Wide to Long Format)

What unpivot does: Converts multiple columns into rows by stacking column headers as categorical values and their data as corresponding values. Transforms wide format (50 month columns) into long format (1 month column with 50× more rows).

Starting data (wide format):

product_id | region | jan_revenue | feb_revenue | mar_revenue
12345      | west   | 45000       | 52000       | 48000
67890      | east   | 38000       | 41000       | 39000

Target data (long format):

product_id | region | month       | revenue
12345      | west   | jan_revenue | 45000
12345      | west   | feb_revenue | 52000
12345      | west   | mar_revenue | 48000

Process:

  1. Open browser tool
  2. Upload CSV (drag 2M row file)
  3. Select "Unpivot (Wide to Long)"
  4. Choose ID columns: product_id, region
  5. Choose value columns: jan_revenue, feb_revenue, ..., dec_revenue
  6. Set output names: Variable = month, Value = revenue
  7. Preview first 10 rows to verify structure
  8. Download transformed CSV (10-14 seconds for 2M rows)

Scenario 2: Pivot (Long to Wide Format)

What pivot does: Spreads row values into column headers by converting categorical data into summary tables with categories as column headers. Transforms long format (1 month column) into wide format (12 month columns).

Starting data (long format):

product_id | region | month | revenue
12345      | west   | jan   | 45000
12345      | west   | feb   | 52000

Target data (wide format):

product_id | region | jan   | feb   | mar
12345      | west   | 45000 | 52000 | 48000

Process:

  1. Upload CSV
  2. Select "Pivot (Long to Wide)"
  3. Choose index columns: product_id, region
  4. Choose pivot column: month (values become headers)
  5. Choose values column: revenue
  6. Select aggregation: sum or first (for duplicates)
  7. Download (8-12 seconds for 2M rows → ~50K rows × 48 columns)

Known Limitations and Edge Cases

No tool handles every scenario. Here's where browser streaming breaks:

Browser Memory Limits

Hard limit: ~2GB per browser tab (Chrome/Firefox V8 heap size)

Impact: 10M+ row files with 100+ columns may exceed limit. Transformed output approaching 2GB causes write failures.

Workaround: Split file into 5M row chunks before processing

Extremely Wide Datasets

Limit: ~1,000 columns

Why: Column metadata (names, types, indices) consumes memory linearly. 5,000 columns × 2M rows generates 200MB of metadata alone.

Workaround: Use Polars or command-line tools for ultra-wide datasets

Malformed CSV Handling

Issue: Mixed delimiters, unescaped quotes, inconsistent row lengths

Behavior: Parser attempts auto-correction but may skip malformed rows silently

Prevention: Validate structure before pivoting to ensure consistent format

When to Use Alternatives

Use Polars if:

  • You're comfortable with Python
  • Dataset is 2-8M rows (Polars sweet spot)
  • You need complex aggregations during pivot

Use Excel if:

  • Dataset is under 200K rows
  • You need pivot table interactivity
  • Downstream analysis happens in Excel anyway

Use database (PostgreSQL, DuckDB) if:

  • Dataset exceeds 10M rows regularly
  • You need SQL-based transformations
  • You're building production ETL pipelines

For complete guidance on processing 2 million rows when Excel can't handle the dataset, understanding when to use browser tools versus databases versus Python libraries helps match tool capabilities to specific data transformation requirements.


When to Use Pivot vs Unpivot

Use Unpivot (Wide to Long) When:

✅ Time-series data has dates/periods as column headers
✅ Database requires normalized format (one observation per row)
✅ Charting trends over time (dates as single categorical column)
✅ Preparing for machine learning (features as rows)
✅ Importing to Salesforce, SQL, or BI tools expecting long format

Example: Survey with q1, q2, q3, ... q50 columns → Unpivot to question, answer format

Use Pivot (Long to Wide) When:

✅ Creating summary reports with categories as columns
✅ Building dashboards with side-by-side metric comparison
✅ Preparing data for Excel analysis (wide format easier to read)
✅ Generating cross-tabulation reports
✅ Converting time-series to single-row-per-entity format

Example: Transaction log date, category, amount → Pivot to categories as columns


FAQ: Pivot Operations on Large Datasets

Excel's 1,048,576 row limit is per worksheet, not per operation. Power Query pivot/unpivot operations load entire datasets into memory and create transformed copies before writing to sheets. A 200K row unpivot across 50 columns requires 3-4GB RAM—exceeding Excel's per-process limit long before hitting the sheet row maximum.

More RAM helps but doesn't solve the architectural problem. Pandas loads entire CSVs into memory, then creates second copies during pivot operations. A 2M row pivot requires 4-6GB RAM. Upgrading from 8GB to 16GB RAM extends your limit from 500K to maybe 1M rows—but you still crash above that. Streaming approaches handle 5M+ rows on 8GB RAM by never loading full datasets into memory.

Power Query is designed for ETL transformations and is more efficient than Excel's legacy pivot table feature for datasets under 200K rows. But both run in Excel's single-threaded calculation engine with the same memory constraints. Power Query handles 250K-400K rows better than pivot tables, but still crashes above 500K rows. Neither solves the in-memory buffering problem.

Unpivot (wide to long) stacks columns into rows—converting 50 month columns into one month column with 50× more rows. Pivot (long to wide) spreads row values into columns—converting categorical data into summary tables with categories as column headers. Unpivot is used for database imports and time-series analysis. Pivot is used for reporting and dashboards.

Polars is 3-5× more memory-efficient than pandas and uses lazy evaluation to optimize queries. For datasets in the 2-8M row range, Polars is excellent if you're comfortable with Python. It still loads datasets into RAM during execution, so above 10M rows you'll hit similar memory constraints. For browser-based approach, no installation is required and it handles 10M+ rows via streaming.

Excel's architecture—single-threaded calculation engine, in-memory processing, formula recalculation model—is fundamental to how Excel works since the 1990s. Rebuilding the core calculation engine would break backward compatibility with billions of existing spreadsheets and macros. Microsoft's solution is Power BI and Azure-based tools which use different architectures but require subscriptions and cloud infrastructure.

Hitting Excel's row limit or file size issues? See our complete guide: Excel Row Limit & Large File Solutions (2026)


Browser-based streaming pivot. Your ETL deadlines stay met.


Understanding Why Architecture Matters

Pivot/unpivot failures on large datasets aren't bugs or configuration issues—they're fundamental architectural limitations of in-memory processing models.

Key takeaways:

  1. Excel's 200K-350K row limit is real and unavoidable due to per-process memory constraints
  2. Pandas fails above 500K-1M rows on standard laptops for the same architectural reason
  3. Streaming architectures scale differently because they never load full datasets into memory
  4. Polars and Vaex are viable alternatives for 2-8M row operations if you're comfortable with Python
  5. Database tools (DuckDB, PostgreSQL) are appropriate for 10M+ row production ETL pipelines

When to use each approach:

  • Excel: <200K rows, downstream analysis in Excel
  • Pandas: <500K rows, Python ecosystem required
  • Polars: 2-8M rows, Python comfortable, memory-efficient processing
  • Browser streaming: 500K-10M rows, no installation, privacy-first
  • Database: 10M+ rows, production pipelines, SQL-based workflows

Pivot operations use streaming architecture to process datasets up to 10M rows without memory crashes or installation requirements.


Tools Referenced:

Official Documentation:

Browser-Based Tools:

All browser-based tools process data entirely in your browser—no uploads, no servers, no data leaving your computer. Essential for protecting sensitive sales data, financial records, or confidential business information.


Managing large dataset transformations? Connect on LinkedIn or share your workflow at @splitforge.

Pivot Large CSVs Without Crashes

Process 2M-5M rows in under 30 seconds
No Python installation or pandas configuration
Memory-efficient streaming—works on 8GB laptops
Privacy-first local processing—no data uploads

Continue Reading

More guides to help you work smarter with your data

csv-guides

How to Audit a CSV File Before Processing

You inherited a CSV from a vendor. Before you load it into anything, you need to know what's actually in it — without trusting the filename.

Read More
csv-guides

Combine First and Last Name Columns in CSV for CRM Import

Your CRM requires a single Full Name column but your export has First and Last split. Here's how to combine them across 100K rows in 30 seconds.

Read More
csv-guides

Data Profiling vs Validation: What Each Reveals in Your CSV

Everyone says 'validate your CSV before import.' But validation can only check what you already know to look for. Profiling finds what you didn't know to check.

Read More