Skip to content
>_ TrueFileSize.com
··8 min read

File Type Validation Beyond Extensions — Magic Bytes, MIME Sniffing, and Polyglots

Checking file extensions is the most common — and most dangerous — approach to file type validation. An attacker simply renames shell.php to shell.jpg and your validation passes. Here's why extensions are unreliable and how to validate file types properly.

Why Extension Checking Fails

// This "validation" is trivially bypassed
if (!filename.endsWith('.jpg') && !filename.endsWith('.png')) {
  return res.status(400).json({ error: 'Only images allowed' });
}
// Attacker: renames malware.exe to malware.jpg → passes

Real Attack Scenario

  1. Attacker creates webshell.php (PHP backdoor)
  2. Renames to webshell.php.jpg (double extension)
  3. Uploads — your extension check sees .jpg and allows it
  4. Apache with misconfigured mod_php executes it as PHP
  5. Attacker has remote code execution on your server

The Three Layers of File Type Detection

Layer 1: File Extension (Weakest)

photo.jpg          → Looks like JPEG
photo.php.jpg      → Double extension — which does the server use?
photo.jpg%00.php   → Null byte injection (older systems)

Reliability: None. Anyone can rename a file.

Layer 2: MIME Type (Client-Reported)

The browser sends Content-Type based on the OS file association:

Content-Type: image/jpeg

Reliability: Low. If the attacker renames .php to .jpg, the browser sends image/jpeg.

Layer 3: Magic Bytes (File Signature — Strongest)

The first bytes of a file identify its true format:

| Format | Magic Bytes | Hex | |--------|------------|-----| | JPEG | ÿØÿ | FF D8 FF | | PNG | .PNG | 89 50 4E 47 | | PDF | %PDF | 25 50 44 46 | | ZIP/DOCX/XLSX | PK.. | 50 4B 03 04 | | GIF | GIF8 | 47 49 46 38 | | MP4 | ftyp | 66 74 79 70 (at offset 4) | | SQLite | SQLite format 3 | 53 51 4C 69 74 65 | | WASM | .asm | 00 61 73 6D |

Reliability: High. Attacker must modify the actual binary content.

Implementation: Three-Layer Validation

import { fileTypeFromBuffer } from 'file-type';
import path from 'path';

const ALLOWED = {
  'image/jpeg': ['.jpg', '.jpeg'],
  'image/png': ['.png'],
  'application/pdf': ['.pdf'],
};

async function validateFile(originalName, buffer) {
  // Layer 1: Extension whitelist
  const ext = path.extname(originalName).toLowerCase();
  const allowedExts = Object.values(ALLOWED).flat();
  if (!allowedExts.includes(ext)) {
    throw new Error(`Extension ${ext} not allowed`);
  }

  // Layer 2: Magic bytes detection
  const detected = await fileTypeFromBuffer(buffer);
  if (!detected) {
    throw new Error('Could not detect file type from content');
  }

  // Layer 3: Cross-check — do extension, MIME, and magic bytes agree?
  const allowedMimes = Object.keys(ALLOWED);
  if (!allowedMimes.includes(detected.mime)) {
    throw new Error(`Detected type ${detected.mime} not in allowlist`);
  }

  // Verify extension matches detected type
  const expectedExts = ALLOWED[detected.mime];
  if (!expectedExts.includes(ext)) {
    throw new Error(
      `Extension mismatch: ${ext} but content is ${detected.mime}`
    );
  }

  return { mime: detected.mime, ext: detected.ext };
}

What Are Polyglot Files?

A polyglot file is valid in multiple formats simultaneously. For example, a file that is both a valid JPEG and valid JavaScript:

FF D8 FF E0 ... [JPEG data] ... /* = */ alert('XSS'); //

This file:

  • Opens in an image viewer as a normal JPEG
  • Executes as JavaScript if served with text/javascript MIME type

Defense Against Polyglots

  1. Re-encode images (strips embedded payloads):
// sharp re-encoding destroys any non-image payload
await sharp(uploadedBuffer)
  .jpeg({ quality: 85 })
  .toBuffer();
  1. Set X-Content-Type-Options: nosniff (prevents MIME sniffing):
X-Content-Type-Options: nosniff
  1. Serve user uploads from a separate domain (same-origin isolation):
Main app:    app.example.com
User files:  uploads.example.com  (different origin → no cookie access)

Python Implementation

import magic

ALLOWED_MIMES = {
    'image/jpeg': ['.jpg', '.jpeg'],
    'image/png': ['.png'],
    'application/pdf': ['.pdf'],
}

def validate_file(filepath, original_name):
    # Magic bytes detection
    detected_mime = magic.from_file(filepath, mime=True)

    if detected_mime not in ALLOWED_MIMES:
        raise ValueError(f"Type {detected_mime} not allowed")

    # Extension cross-check
    ext = os.path.splitext(original_name)[1].lower()
    if ext not in ALLOWED_MIMES[detected_mime]:
        raise ValueError(f"Extension {ext} doesn't match {detected_mime}")

    return detected_mime

Test Your Validation

Download these from TrueFileSize to test each layer:

| Test | File | Extension | Actual Content | Your Validator Should | |------|------|-----------|---------------|----------------------| | Normal JPEG | sample-500kb.jpg | .jpg | JPEG | Accept | | Normal PDF | sample-1mb.pdf | .pdf | PDF | Accept | | Wrong extension | sample-jpg-as-pdf.pdf | .pdf | JPEG data | Reject (MIME mismatch) | | Corrupted | sample-corrupt.pdf | .pdf | Truncated PDF | Reject (invalid content) | | Zero byte | sample-zero-byte.bin | .bin | Empty | Reject (no magic bytes) |

Use our MIME Type Lookup to find magic bytes for any format.

Summary

| Method | Reliability | Speed | Bypassed By | |--------|-----------|-------|-------------| | Extension check | Very low | Instant | Rename file | | MIME type (header) | Low | Instant | Rename file (browser sends wrong MIME) | | Magic bytes | High | Fast (reads first KB) | Polyglot files | | Re-encoding | Highest | Slower | Nothing (destroys payloads) |

Best practice: Use all four layers. Magic bytes for detection, re-encoding for images, X-Content-Type-Options: nosniff to prevent MIME sniffing, and a separate domain for user uploads.

OWASP References

See also: File Upload Security Checklist · Unsupported File Type Errors · MIME Types Cheat Sheet