Skip to content
>_ TrueFileSize.com
·7 min read

Sample CSV Files for Pandas Testing

Working with pandas and need test data? This guide covers how to use sample CSV files for testing data analysis pipelines, benchmarking read performance, and validating data transformations in Python.

Loading sample CSV files in pandas

Download a sample CSV file and load it in two lines:

import pandas as pd

# Load from local file
df = pd.read_csv('users-sample.csv')
print(df.head())
print(f"Shape: {df.shape}")  # (1000, 6)

# Load directly from URL
df = pd.read_csv(
    'https://truefilesize.com/files/csv/users-sample.csv'
)
print(df.describe())

Available sample CSV datasets

Benchmarking pandas read performance

import time

sizes = ['sample-1000-rows.csv', 'sample-10000-rows.csv',
         'sample-100000-rows.csv']

for filename in sizes:
    start = time.time()
    df = pd.read_csv(filename)
    elapsed = time.time() - start
    print(f"{filename}: {elapsed:.3f}s ({len(df)} rows)")

Typical results: 1K rows in 5ms, 10K in 20ms, 100K in 150ms, 1M in 2-5 seconds.

Testing data transformations

# Test filtering
adults = df[df['age'] >= 18]
assert len(adults) == len(df)  # All our sample users are 18+

# Test grouping
city_counts = df.groupby('city').size()
assert city_counts.sum() == len(df)

# Test handling of semicolon-separated CSV
df_euro = pd.read_csv('semicolon-separated.csv', sep=';')
assert 'name' in df_euro.columns

Testing with different delimiters

Our CSV collection includes semicolon-separated (European format) and tab-separated (TSV) variants. Test that your pipeline handles all delimiter types correctly.

Converting between formats

Need JSON instead? Convert with pandas or download directly from our JSON files collection. Need Excel? Check our XLSX files.