·8 min read
Testing Word and Excel Uploads in Production
Office documents are the most common upload type in B2B apps — and also where most upload bugs live. This guide covers validating DOCX and XLSX uploads, catching corrupted files, detecting macros, and rendering previews.
MIME type validation is not enough
Users rename malware.exe to resume.docx every day. Don't trust the Content-Type header or file extension — check the magic bytes:
import { fileTypeFromBuffer } from 'file-type';
const buffer = await file.arrayBuffer();
const type = await fileTypeFromBuffer(new Uint8Array(buffer));
// Real DOCX: { ext: 'docx', mime: 'application/vnd.openxmlformats-...' }
// Real XLSX: { ext: 'xlsx', mime: 'application/vnd.openxmlformats-...' }
if (!type || !['docx', 'xlsx'].includes(type.ext)) {
throw new Error('Invalid file type');
}
Parsing DOCX on the server
import mammoth from 'mammoth';
const { value: html, messages } = await mammoth.convertToHtml({
path: 'sample.docx'
});
if (messages.length > 0) {
console.warn('Parse warnings:', messages);
}
console.log(html);
Download test files from our sample DOCX to validate your pipeline.
Parsing XLSX with SheetJS
import * as XLSX from 'xlsx';
const workbook = XLSX.read(buffer, { type: 'buffer' });
const firstSheet = workbook.Sheets[workbook.SheetNames[0]];
const rows = XLSX.utils.sheet_to_json(firstSheet);
console.log(rows); // Array of row objects
Test with our sample XLSX files — includes formulas, merged cells, and multiple sheets.
Detecting macros
Macros in Office files are a common malware vector. Any .docm or .xlsm extension is a red flag. For .docx/.xlsx, inspect the ZIP structure:
import JSZip from 'jszip';
const zip = await JSZip.loadAsync(buffer);
const hasMacros = Object.keys(zip.files).some(
(name) => name.includes('vbaProject.bin')
);
if (hasMacros) {
throw new Error('Macros not allowed');
}
Size and row limits
- Reject DOCX over 10MB unless you support long documents
- XLSX over 50K rows often means someone is uploading a database export
- Streaming parse for files over 50MB to avoid OOM
Testing checklist
- Valid 1KB DOCX parses correctly
- Valid XLSX with 1000 rows extracts all rows
- Corrupted file returns clear 400 error, not 500
- Renamed
.exeis rejected on magic byte check - File with macros is rejected before parsing
- Password-protected DOCX returns helpful error
Related
For PDF uploads, see our PDF parsing guide. For CSV-specific handling, see sample CSV files for pandas.