·9 min read
Database Seeding from SQL Dumps
Running tests against an empty database is slow and unrealistic. Seeding with realistic data catches edge cases — users with unicode names, orders with NULL fields, time-series with gaps. This guide covers seeding with SQL dumps for MySQL, PostgreSQL, and SQLite.
When to use SQL dumps vs fixtures
- SQL dumps — large datasets (100K+ rows), schema + data together, cross-database compatibility
- ORM fixtures / factories — small datasets, generated per-test, need full type safety
- CSV import — tabular data only, no schema, needs a loader
Seeding PostgreSQL
# Download sample dump
curl -o seed.sql https://example.com/postgres-sample.sql
# Apply to fresh database
createdb test_db
psql test_db < seed.sql
# Or inside Docker
docker exec -i postgres psql -U postgres test_db < seed.sql
Download sample SQL from our SQL files collection.
Seeding MySQL
# Create database, apply dump
mysql -u root -p -e 'CREATE DATABASE test_db'
mysql -u root -p test_db < seed.sql
# For large dumps, use LOAD DATA for speed
mysql -u root test_db -e "
LOAD DATA INFILE '/tmp/users.csv'
INTO TABLE users
FIELDS TERMINATED BY ','
IGNORE 1 ROWS;
"
Seeding SQLite (for local dev)
sqlite3 test.db < seed.sql
# Or via Node.js
import Database from 'better-sqlite3';
import fs from 'fs';
const db = new Database('test.db');
db.exec(fs.readFileSync('seed.sql', 'utf8'));
Test setup pattern (Jest + Postgres)
import { execSync } from 'child_process';
beforeAll(() => {
execSync('dropdb --if-exists test_db');
execSync('createdb test_db');
execSync('psql test_db < ./fixtures/seed.sql');
});
afterEach(() => {
// Restore baseline between tests
execSync('psql test_db -c "TRUNCATE users, orders RESTART IDENTITY CASCADE"');
execSync('psql test_db < ./fixtures/seed.sql');
});
Sample dump structure
-- schema
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- data
INSERT INTO users (email, name) VALUES
('[email protected]', 'Alice'),
('[email protected]', 'Bob Björk'), -- unicode
('[email protected]', 'Carol');
Realistic data matters
- Include edge cases: empty strings, NULL, unicode, very long names
- Use realistic distributions: 80% common role, 20% edge role
- Time-series: include gaps, DST transitions, timezones
- Include orphaned records to test cleanup logic
Generating your own dumps
Use sample CSV files or tools like faker to generate synthetic data:
import { faker } from '@faker-js/faker';
const rows = Array.from({ length: 10000 }, () => ({
email: faker.internet.email(),
name: faker.person.fullName(),
city: faker.location.city(),
}));
const sql = rows.map((r) =>
`INSERT INTO users (email, name, city) VALUES ('${r.email}', '${r.name.replace(/'/g, "''")}', '${r.city}');`
).join('\n');
Related
For mocking the HTTP layer instead, read mocking REST APIs with JSON fixtures. For CSV-first workflows, see sample CSV for pandas.