·7 min read
Sample CSV Files for Pandas Testing
Working with pandas and need test data? This guide covers how to use sample CSV files for testing data analysis pipelines, benchmarking read performance, and validating data transformations in Python.
Loading sample CSV files in pandas
Download a sample CSV file and load it in two lines:
import pandas as pd
# Load from local file
df = pd.read_csv('users-sample.csv')
print(df.head())
print(f"Shape: {df.shape}") # (1000, 6)
# Load directly from URL
df = pd.read_csv(
'https://truefilesize.com/files/csv/users-sample.csv'
)
print(df.describe())
Available sample CSV datasets
- Users data — 1,000 rows: name, email, city, age
- Standard 1,000 rows — general test data
- Large dataset (100K rows) — performance testing
- 1 million rows — stress testing
Benchmarking pandas read performance
import time
sizes = ['sample-1000-rows.csv', 'sample-10000-rows.csv',
'sample-100000-rows.csv']
for filename in sizes:
start = time.time()
df = pd.read_csv(filename)
elapsed = time.time() - start
print(f"{filename}: {elapsed:.3f}s ({len(df)} rows)")
Typical results: 1K rows in 5ms, 10K in 20ms, 100K in 150ms, 1M in 2-5 seconds.
Testing data transformations
# Test filtering
adults = df[df['age'] >= 18]
assert len(adults) == len(df) # All our sample users are 18+
# Test grouping
city_counts = df.groupby('city').size()
assert city_counts.sum() == len(df)
# Test handling of semicolon-separated CSV
df_euro = pd.read_csv('semicolon-separated.csv', sep=';')
assert 'name' in df_euro.columns
Testing with different delimiters
Our CSV collection includes semicolon-separated (European format) and tab-separated (TSV) variants. Test that your pipeline handles all delimiter types correctly.
Converting between formats
Need JSON instead? Convert with pandas or download directly from our JSON files collection. Need Excel? Check our XLSX files.