Metadata gone.File intact.Nothing uploaded.

Why PDF metadata is the most-overlooked
privacy leak in your business.

Most people treat PDFs as the safe, finalized version of a document — the format you send when you want it to stay the way you made it. That intuition is wrong about exactly the wrong thing.

A PDF is a finished-looking thing. The text is locked into glyphs. The layout doesn't reflow. Selecting and copying text is sometimes deliberately difficult. So it feels like a sealed envelope.

Underneath that surface, every PDF you send is broadcasting:

The two metadata stores

Most PDF cleaners scrub the Info dictionary and call it a day. But PDFs actually maintain two parallel metadata systems:

  1. The Info dictionary — the original PDF 1.0 way to store metadata. Title, Author, Subject, Keywords, Creator (the app you authored in), Producer (the app that wrote the PDF), CreationDate, ModDate.
  2. The XMP packet — an XML metadata stream that mirrors and extends the Info dictionary. Added in PDF 1.4 and now the default in most modern PDF writers. It can contain everything in the Info dictionary plus additional fields like document/instance IDs, derivation history (when a PDF was generated from another document), and tool-specific metadata.

Wipe one, leave the other, and you've changed nothing about what an attacker reads.

The famous failures

Government agencies regularly release "redacted" PDFs where the redactions cover visible text but leave metadata intact. Journalists then extract metadata to identify who wrote which paragraph. Corporate legal departments do the same with contracts.

The pattern is always similar: the lawyers redact the visible content (block out names, dollar amounts, location names), the redaction tool draws black rectangles over those regions, and the PDF is exported. Nobody checks whether the document's Info dictionary still says Author: Jane Smith, Senior Counsel.

What our PDF cleaner removes

For every PDF, we strip:

The PDF re-saves cleanly via pdf-lib so the metadata cannot be recovered through incremental update inspection — a forensic technique that recovers old states of a PDF from the bytes preserved between revisions.

Limits to what cleaning fixes

Removing metadata does not remove visible content. If sensitive information is written into the body of the document, only redacting (or deleting and re-saving) that content will remove it. Some redaction tools draw black rectangles that LOOK like they cover text but actually layer on top of it — the text is still selectable, copyable, and searchable. Always do a final inspection in a different PDF viewer before relying on a redacted PDF for high-stakes purposes.

Use our PDF metadata removal tool to strip metadata from your PDFs entirely in your browser — no upload, no signup.