File Type Validation Beyond Extensions — Magic Bytes, MIME Sniffing, and Polyglots
Checking file extensions is the most common — and most dangerous — approach to file type validation. An attacker simply renames shell.php to shell.jpg and your validation passes. Here's why extensions are unreliable and how to validate file types properly.
Why Extension Checking Fails
// This "validation" is trivially bypassed
if (!filename.endsWith('.jpg') && !filename.endsWith('.png')) {
return res.status(400).json({ error: 'Only images allowed' });
}
// Attacker: renames malware.exe to malware.jpg → passes
Real Attack Scenario
- Attacker creates
webshell.php(PHP backdoor) - Renames to
webshell.php.jpg(double extension) - Uploads — your extension check sees
.jpgand allows it - Apache with misconfigured
mod_phpexecutes it as PHP - Attacker has remote code execution on your server
The Three Layers of File Type Detection
Layer 1: File Extension (Weakest)
photo.jpg → Looks like JPEG
photo.php.jpg → Double extension — which does the server use?
photo.jpg%00.php → Null byte injection (older systems)
Reliability: None. Anyone can rename a file.
Layer 2: MIME Type (Client-Reported)
The browser sends Content-Type based on the OS file association:
Content-Type: image/jpeg
Reliability: Low. If the attacker renames .php to .jpg, the browser sends image/jpeg.
Layer 3: Magic Bytes (File Signature — Strongest)
The first bytes of a file identify its true format:
| Format | Magic Bytes | Hex |
|--------|------------|-----|
| JPEG | ÿØÿ | FF D8 FF |
| PNG | .PNG | 89 50 4E 47 |
| PDF | %PDF | 25 50 44 46 |
| ZIP/DOCX/XLSX | PK.. | 50 4B 03 04 |
| GIF | GIF8 | 47 49 46 38 |
| MP4 | ftyp | 66 74 79 70 (at offset 4) |
| SQLite | SQLite format 3 | 53 51 4C 69 74 65 |
| WASM | .asm | 00 61 73 6D |
Reliability: High. Attacker must modify the actual binary content.
Implementation: Three-Layer Validation
import { fileTypeFromBuffer } from 'file-type';
import path from 'path';
const ALLOWED = {
'image/jpeg': ['.jpg', '.jpeg'],
'image/png': ['.png'],
'application/pdf': ['.pdf'],
};
async function validateFile(originalName, buffer) {
// Layer 1: Extension whitelist
const ext = path.extname(originalName).toLowerCase();
const allowedExts = Object.values(ALLOWED).flat();
if (!allowedExts.includes(ext)) {
throw new Error(`Extension ${ext} not allowed`);
}
// Layer 2: Magic bytes detection
const detected = await fileTypeFromBuffer(buffer);
if (!detected) {
throw new Error('Could not detect file type from content');
}
// Layer 3: Cross-check — do extension, MIME, and magic bytes agree?
const allowedMimes = Object.keys(ALLOWED);
if (!allowedMimes.includes(detected.mime)) {
throw new Error(`Detected type ${detected.mime} not in allowlist`);
}
// Verify extension matches detected type
const expectedExts = ALLOWED[detected.mime];
if (!expectedExts.includes(ext)) {
throw new Error(
`Extension mismatch: ${ext} but content is ${detected.mime}`
);
}
return { mime: detected.mime, ext: detected.ext };
}
What Are Polyglot Files?
A polyglot file is valid in multiple formats simultaneously. For example, a file that is both a valid JPEG and valid JavaScript:
FF D8 FF E0 ... [JPEG data] ... /* = */ alert('XSS'); //
This file:
- Opens in an image viewer as a normal JPEG
- Executes as JavaScript if served with
text/javascriptMIME type
Defense Against Polyglots
- Re-encode images (strips embedded payloads):
// sharp re-encoding destroys any non-image payload
await sharp(uploadedBuffer)
.jpeg({ quality: 85 })
.toBuffer();
- Set X-Content-Type-Options: nosniff (prevents MIME sniffing):
X-Content-Type-Options: nosniff
- Serve user uploads from a separate domain (same-origin isolation):
Main app: app.example.com
User files: uploads.example.com (different origin → no cookie access)
Python Implementation
import magic
ALLOWED_MIMES = {
'image/jpeg': ['.jpg', '.jpeg'],
'image/png': ['.png'],
'application/pdf': ['.pdf'],
}
def validate_file(filepath, original_name):
# Magic bytes detection
detected_mime = magic.from_file(filepath, mime=True)
if detected_mime not in ALLOWED_MIMES:
raise ValueError(f"Type {detected_mime} not allowed")
# Extension cross-check
ext = os.path.splitext(original_name)[1].lower()
if ext not in ALLOWED_MIMES[detected_mime]:
raise ValueError(f"Extension {ext} doesn't match {detected_mime}")
return detected_mime
Test Your Validation
Download these from TrueFileSize to test each layer:
| Test | File | Extension | Actual Content | Your Validator Should | |------|------|-----------|---------------|----------------------| | Normal JPEG | sample-500kb.jpg | .jpg | JPEG | Accept | | Normal PDF | sample-1mb.pdf | .pdf | PDF | Accept | | Wrong extension | sample-jpg-as-pdf.pdf | .pdf | JPEG data | Reject (MIME mismatch) | | Corrupted | sample-corrupt.pdf | .pdf | Truncated PDF | Reject (invalid content) | | Zero byte | sample-zero-byte.bin | .bin | Empty | Reject (no magic bytes) |
Use our MIME Type Lookup to find magic bytes for any format.
Summary
| Method | Reliability | Speed | Bypassed By | |--------|-----------|-------|-------------| | Extension check | Very low | Instant | Rename file | | MIME type (header) | Low | Instant | Rename file (browser sends wrong MIME) | | Magic bytes | High | Fast (reads first KB) | Polyglot files | | Re-encoding | Highest | Slower | Nothing (destroys payloads) |
Best practice: Use all four layers. Magic bytes for detection, re-encoding for images, X-Content-Type-Options: nosniff to prevent MIME sniffing, and a separate domain for user uploads.
OWASP References
- OWASP: Unrestricted File Upload
- OWASP: Content Type Validation
- A03:2021 Injection, A04:2021 Insecure Design
See also: File Upload Security Checklist · Unsupported File Type Errors · MIME Types Cheat Sheet