·8 min read
Audio Upload Validation and Server-Side Transcoding
Audio uploads come from microphones, uploads, voice notes, podcast episodes. The format, bitrate, and sample rate vary wildly. This guide covers validation, server-side transcoding with ffmpeg, and testing with sample files.
Step 1: validate the real format
A file named voice.mp3 might actually be OGG. Always check magic bytes:
import { fileTypeFromBuffer } from 'file-type';
const type = await fileTypeFromBuffer(new Uint8Array(buffer));
const allowed = ['mp3', 'wav', 'ogg', 'aac', 'm4a', 'flac'];
if (!type || !allowed.includes(type.ext)) {
throw new Error('Unsupported audio format: ' + (type?.ext ?? 'unknown'));
}
Step 2: probe metadata with ffprobe
import { execFile } from 'child_process';
import { promisify } from 'util';
const exec = promisify(execFile);
const { stdout } = await exec('ffprobe', [
'-v', 'quiet',
'-print_format', 'json',
'-show_format', '-show_streams',
'input.mp3',
]);
const info = JSON.parse(stdout);
console.log('Duration:', info.format.duration, 's');
console.log('Bitrate:', info.format.bit_rate);
console.log('Sample rate:', info.streams[0].sample_rate);
console.log('Channels:', info.streams[0].channels);
Step 3: transcode to a canonical format
Accepting arbitrary formats is great for UX, but storing them is expensive. Transcode to one format (usually MP3 128kbps or AAC 128kbps):
ffmpeg -i input.wav \
-vn \
-ar 44100 \
-ac 2 \
-b:a 128k \
-f mp3 \
output.mp3
Streaming transcode from an upload
import { spawn } from 'child_process';
const ff = spawn('ffmpeg', [
'-i', 'pipe:0',
'-ar', '44100', '-ac', '2', '-b:a', '128k',
'-f', 'mp3',
'pipe:1',
]);
uploadStream.pipe(ff.stdin);
ff.stdout.pipe(s3Upload);
Sample files for testing
Cover every edge case:
- MP3 at 64/128/320 kbps — bitrate variants
- WAV 44.1kHz and 48kHz — sample rate variants
- OGG Vorbis — open-source format
- AAC — Apple/streaming default
- Mono and stereo variants to test channel normalization
Things that break in production
- Variable bitrate (VBR) MP3 — reports wrong duration unless you scan the whole file
- 24-bit WAV — some web players can't play; always downsample to 16-bit
- 48kHz source, 44.1kHz target — resampling artifacts; use a proper resampler
- Corrupt ID3 tags — strip them with
-map_metadata -1
Related
For format trade-offs, see WAV vs MP3 vs OGG. For large file handling, read our large file upload guide.