Skip to content
>_ TrueFileSize.com
·8 min read

Audio Upload Validation and Server-Side Transcoding

Audio uploads come from microphones, uploads, voice notes, podcast episodes. The format, bitrate, and sample rate vary wildly. This guide covers validation, server-side transcoding with ffmpeg, and testing with sample files.

Step 1: validate the real format

A file named voice.mp3 might actually be OGG. Always check magic bytes:

import { fileTypeFromBuffer } from 'file-type';

const type = await fileTypeFromBuffer(new Uint8Array(buffer));
const allowed = ['mp3', 'wav', 'ogg', 'aac', 'm4a', 'flac'];
if (!type || !allowed.includes(type.ext)) {
  throw new Error('Unsupported audio format: ' + (type?.ext ?? 'unknown'));
}

Step 2: probe metadata with ffprobe

import { execFile } from 'child_process';
import { promisify } from 'util';
const exec = promisify(execFile);

const { stdout } = await exec('ffprobe', [
  '-v', 'quiet',
  '-print_format', 'json',
  '-show_format', '-show_streams',
  'input.mp3',
]);
const info = JSON.parse(stdout);
console.log('Duration:', info.format.duration, 's');
console.log('Bitrate:', info.format.bit_rate);
console.log('Sample rate:', info.streams[0].sample_rate);
console.log('Channels:', info.streams[0].channels);

Step 3: transcode to a canonical format

Accepting arbitrary formats is great for UX, but storing them is expensive. Transcode to one format (usually MP3 128kbps or AAC 128kbps):

ffmpeg -i input.wav \
  -vn \
  -ar 44100 \
  -ac 2 \
  -b:a 128k \
  -f mp3 \
  output.mp3

Streaming transcode from an upload

import { spawn } from 'child_process';

const ff = spawn('ffmpeg', [
  '-i', 'pipe:0',
  '-ar', '44100', '-ac', '2', '-b:a', '128k',
  '-f', 'mp3',
  'pipe:1',
]);
uploadStream.pipe(ff.stdin);
ff.stdout.pipe(s3Upload);

Sample files for testing

Cover every edge case:

Things that break in production

  • Variable bitrate (VBR) MP3 — reports wrong duration unless you scan the whole file
  • 24-bit WAV — some web players can't play; always downsample to 16-bit
  • 48kHz source, 44.1kHz target — resampling artifacts; use a proper resampler
  • Corrupt ID3 tags — strip them with -map_metadata -1

Related

For format trade-offs, see WAV vs MP3 vs OGG. For large file handling, read our large file upload guide.