Skip to content
>_ TrueFileSize.com

Sample Parquet File Download — Free Apache Parquet for Testing

Download free Apache Parquet example files from 100KB to 50MB — Snappy, GZIP, and uncompressed variants. These Parquet test files are built for data engineers and analysts working with Spark, Pandas, BigQuery, Athena, DuckDB, and Snowflake. Use them as parquet files for testing data lake ingestion, ETL pipelines, and columnar query performance.

sample-100kb.parquet

101 KB

1,100 rows · SNAPPY

sample-500kb.parquet

509 KB

3,300 rows · SNAPPY

sample-1mb.parquet

1.05 MB

5,000 rows · GZIP

sample-5mb.parquet

5.14 MB

22,000 rows · SNAPPY

sample-10mb.parquet

10.28 MB

44,000 rows · SNAPPY

sample-50mb.parquet

52.96 MB

150,000 rows · SNAPPY

sample-uncompressed.parquet

199 KB

2,000 rows · NONE

sample-gzip.parquet

298 KB

3,000 rows · GZIP

Use cases for sample Parquet files

  • Testing Parquet readers (pyarrow, DuckDB, Spark, pandas)
  • Benchmarking Parquet vs CSV read performance
  • Testing data lake ingestion pipelines (S3, GCS, ADLS)
  • Verifying Parquet schema evolution and compatibility
  • Testing BI tool Parquet import (Tableau, Power BI, Metabase)
  • Validating Snappy vs GZIP compression handling

Parquet vs CSV vs JSON for analytics

FeatureParquetCSVJSON
Storage layoutColumnarRow-basedRow-based
File size (1M rows)~50 MB~200 MB~400 MB
Column pruningYes (read only needed cols)No (read all)No (read all)
Schema enforcementYes (typed columns)No (all strings)Partial
Predicate pushdownYes (row group stats)NoNo
Human readableNo (binary)YesYes
Best forAnalytics, data lakes, MLData exchange, importsAPIs, configs

How to read and write Parquet files

# Python (pandas + pyarrow — most common)
import pandas as pd
df = pd.read_parquet('data.parquet')
df.to_parquet('output.parquet', engine='pyarrow')

# Python (polars — faster alternative)
import polars as pl
df = pl.read_parquet('data.parquet')

# DuckDB (SQL on Parquet — zero copy)
duckdb.sql("SELECT * FROM 'data.parquet' WHERE age > 30")
duckdb.sql("COPY (SELECT * FROM my_table) TO 'out.parquet'")

# Apache Spark
df = spark.read.parquet("s3://bucket/data.parquet")

# CLI inspection (parquet-tools / pqrs)
parquet-tools schema data.parquet
parquet-tools head data.parquet
pqrs schema data.parquet

Parquet compression codecs

CodecRatioSpeedWhen to use
SnappyGoodVery fastDefault — best balance (Spark, DuckDB)
GZIPBestSlowLong-term storage, bandwidth-limited
ZSTDBestFastModern alternative to GZIP (Spark 3+)
None1:1FastestTesting, already-compressed data

Technical specifications

Full nameApache Parquet
Extension.parquet
TypeColumnar binary storage format
Magic bytesPAR1 (header and footer)
CompressionSnappy (default), GZIP, ZSTD, LZ4, Brotli, None
EncodingDictionary, RLE, Delta, Bit-packing
Nested typesDremel-style repetition/definition levels
Developed byTwitter + Cloudera (2013), Apache project

Frequently Asked Questions

Other data formats

Related reading