Working with Large CSV Files
Last updated: March 2026 • 8 min read
When Excel crashes on your 2GB CSV file, you need better tools. Here's how to process massive datasets without running out of memory.
The Problem with Large Files
- Excel limit: ~1 million rows
- Loading entire file into RAM fails for multi-GB files
- Most GUI tools freeze or crash
Solution 1: Process in Chunks
import pandas as pd
# Process 100k rows at a time
chunks = pd.read_csv('huge.csv', chunksize=100000)
for i, chunk in enumerate(chunks):
# Process each chunk
cleaned = chunk.dropna()
cleaned.to_csv(f'output_{i}.csv', index=False)Solution 2: Command Line Tools
# Count rows (fast) wc -l huge.csv # First 100 rows head -100 huge.csv > sample.csv # Filter rows matching pattern grep "error" huge.csv > errors.csv # Extract specific columns (1st and 3rd) cut -d',' -f1,3 huge.csv > subset.csv
Solution 3: DuckDB (SQL for CSVs)
import duckdb
# Query CSV directly without loading to memory
result = duckdb.query("""
SELECT category, COUNT(*), AVG(amount)
FROM 'huge.csv'
GROUP BY category
""").df()Clean Large Files Online
CleanCSV processes files in chunks server-side — no local memory limits.
Try CleanCSV Free →