April 2, 2025·9 min read

Database Seeding from SQL Dumps

Running tests against an empty database is slow and unrealistic. Seeding with realistic data catches edge cases — users with unicode names, orders with NULL fields, time-series with gaps. This guide covers seeding with SQL dumps for MySQL, PostgreSQL, and SQLite.

When to use SQL dumps vs fixtures

SQL dumps — large datasets (100K+ rows), schema + data together, cross-database compatibility
ORM fixtures / factories — small datasets, generated per-test, need full type safety
CSV import — tabular data only, no schema, needs a loader

Seeding PostgreSQL

# Download sample dump
curl -o seed.sql https://example.com/postgres-sample.sql

# Apply to fresh database
createdb test_db
psql test_db < seed.sql

# Or inside Docker
docker exec -i postgres psql -U postgres test_db < seed.sql

Download sample SQL from our SQL files collection.

Seeding MySQL

# Create database, apply dump
mysql -u root -p -e 'CREATE DATABASE test_db'
mysql -u root -p test_db < seed.sql

# For large dumps, use LOAD DATA for speed
mysql -u root test_db -e "
  LOAD DATA INFILE '/tmp/users.csv'
  INTO TABLE users
  FIELDS TERMINATED BY ','
  IGNORE 1 ROWS;
"

Seeding SQLite (for local dev)

sqlite3 test.db < seed.sql

# Or via Node.js

import Database from 'better-sqlite3';
import fs from 'fs';

const db = new Database('test.db');
db.exec(fs.readFileSync('seed.sql', 'utf8'));

Test setup pattern (Jest + Postgres)

import { execSync } from 'child_process';

beforeAll(() => {
  execSync('dropdb --if-exists test_db');
  execSync('createdb test_db');
  execSync('psql test_db < ./fixtures/seed.sql');
});

afterEach(() => {
  // Restore baseline between tests
  execSync('psql test_db -c "TRUNCATE users, orders RESTART IDENTITY CASCADE"');
  execSync('psql test_db < ./fixtures/seed.sql');
});

Sample dump structure

-- schema
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email TEXT UNIQUE NOT NULL,
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

-- data
INSERT INTO users (email, name) VALUES
  ('[email protected]', 'Alice'),
  ('[email protected]', 'Bob Björk'),  -- unicode
  ('[email protected]', 'Carol');

Realistic data matters

Include edge cases: empty strings, NULL, unicode, very long names
Use realistic distributions: 80% common role, 20% edge role
Time-series: include gaps, DST transitions, timezones
Include orphaned records to test cleanup logic

Generating your own dumps

Use sample CSV files or tools like faker to generate synthetic data:

import { faker } from '@faker-js/faker';

const rows = Array.from({ length: 10000 }, () => ({
  email: faker.internet.email(),
  name: faker.person.fullName(),
  city: faker.location.city(),
}));

const sql = rows.map((r) =>
  `INSERT INTO users (email, name, city) VALUES ('${r.email}', '${r.name.replace(/'/g, "''")}', '${r.city}');`
).join('\n');

For mocking the HTTP layer instead, read mocking REST APIs with JSON fixtures. For CSV-first workflows, see sample CSV for pandas.