Metadata gone.File intact.Nothing uploaded.

Remove PDF metadata.

Strip the Author, Title, Producer, Creator, CreationDate, ModDate, and XMP packet from any PDF — without uploading the file. Drop your PDF below and we'll show you every field that's in it before we remove anything.

Drop your PDF. Nothing uploads.

↓ Drag here · or click to browse

PDF XMP Info dict Embedded EXIF
Multiple files OK · up to 500 MB each

What PDF metadata actually contains

Every PDF carries two parallel metadata stores: the legacy Info dictionary (Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate) and the modern XMP packet — an XML block that mirrors and extends the Info dictionary. Most PDF cleaners only wipe one. We wipe both.

The Author field in a PDF is set by the application that created the file. If you created the PDF in Word, it's your Word user account name. If you exported from Photoshop or InDesign, it's whatever username was logged into Adobe Creative Cloud. This is one of the most common sources of accidental identity exposure in shared documents.

What our tool removes from PDFs

What our tool preserves

We do not touch the visible content of the PDF. Text, images, fonts, page layout, form fields, signatures (where present), and hyperlinks all remain intact. The PDF will open and render exactly as before — just without the identifying metadata.

If you need to redact visible content (names, account numbers, locations written into the document), that requires a separate tool. Metadata removal only handles the hidden descriptive data.

How to remove PDF metadata with this tool

  1. Drop your PDF in the box above. It stays on your device — your browser parses it locally.
  2. Review the inspector. We list every Info dictionary field and XMP property currently in the file. You see exactly what's about to be removed.
  3. Pick a preset. "Privacy" wipes all personal/dating data. "Web Publishing" keeps copyright. "Legal Filing" preserves Title and Subject while wiping authorship. "Maximum Privacy" also strips embedded image EXIF.
  4. Download the cleaned PDF. The file is re-serialized cleanly so no metadata can be recovered via incremental update inspection.
  5. (Optional) Download the audit report. An HTML report listing every field that was removed, with SHA-256 hashes of the before and after files.

How a PDF stores metadata, structurally

To understand what we remove, it helps to understand how a PDF is built. A PDF file is a collection of numbered objects — dictionaries, streams, arrays, and primitives — tied together by a cross-reference table (the xref) that records the byte offset of every object. At the end of the file, a trailer points to two key objects: the document catalog (the /Root) and the information dictionary (the /Info).

The Info dictionary is the classic metadata store. In raw PDF syntax it looks like this:

1 0 obj
<< /Title (Q3 Financial Review)
   /Author (jane.doe)
   /Creator (Microsoft Word)
   /Producer (Acrobat Distiller 23.0)
   /CreationDate (D:20250914103205+01'00')
   /ModDate (D:20250914170412+01'00') >>
endobj

The modern parallel is the XMP metadata stream, an XML packet stored as its own object and referenced from the catalog under /Metadata. It uses the RDF/Dublin Core vocabulary and can duplicate everything in the Info dictionary plus derivation history, document IDs, and tool-specific extensions. Because the two stores exist in parallel, a cleaner that wipes only one leaves the other fully readable. Our tool clears both: it empties every Info dictionary key and removes the /Metadata reference from the catalog so the XMP packet is no longer part of the document.

The byte-accounting: why the file size changes

When metadata is removed, the file shrinks by roughly the combined size of the cleared fields plus the structural overhead that referenced them. You can express the cleaned size as:

cleaned_size = original_size − Σ(field_bytesᵢ) − xref_overhead + rebuild_padding

Here Σ(field_bytesᵢ) is the sum of the byte lengths of every removed metadata field, xref_overhead is the cross-reference entries that pointed to now-deleted objects, and rebuild_padding accounts for the fact that a re-serialized PDF re-numbers and re-packs its object table, which can add or remove a few bytes. Because the document is rewritten from a clean object graph rather than appended to, old metadata cannot be recovered by reading earlier file revisions — a recovery technique that works against tools which merely append an "update" that hides the old values without deleting them.

How the integrity check works (SHA-256)

Every cleaned file in our tool is fingerprinted with the SHA-256 cryptographic hash function, both before and after cleaning. SHA-256 maps an input of any length to a fixed 256-bit (32-byte) output, conventionally written as 64 hexadecimal characters. It has two properties that make it ideal for an audit trail:

The probability of two different files sharing a SHA-256 hash (a collision) is approximately 1 / 2¹²⁸ for a targeted attack, a number so large it is treated as computationally impossible. That is why the hash functions as a tamper-evident seal: if the report's hash matches your file, the file is exactly the one that was cleaned.

When metadata removal is not enough

Removing metadata addresses the hidden, descriptive layer of a document. It does not touch the visible content. Three common situations call for more than metadata removal:

A worked example: before and after

The clearest way to see what the tool does is to look at the raw Info dictionary of a real-world PDF — say, a quarterly report exported from Word — and the same object after cleaning. The dictionary on the left is what an attacker reads with a text editor; the one on the right is what remains after the Privacy preset runs.

Before — exposed

1 0 obj
<< /Title (Q3 Board Pack v7 FINAL)
   /Author (m.okafor)
   /Subject (Internal — do not circulate)
   /Keywords (layoffs, restructure, 2025)
   /Creator (Microsoft Word 2024)
   /Producer (Acrobat Distiller 23.0)
   /CreationDate (D:20250903081544+01'00')
   /ModDate (D:20250914170412+01'00') >>
endobj

After — cleaned

1 0 obj
<< /Title ()
   /Author ()
   /Subject ()
   /Keywords ()
   /Creator ()
   /Producer ()
   /CreationDate (D:19700101000000Z)
   /ModDate (D:19700101000000Z) >>
endobj

Notice how much that "before" block gives away that has nothing to do with the visible document: the author's username, an internal classification note, keywords naming a sensitive project, the exact authoring software, and a precise timeline of when the file was created and last touched. The /Metadata XMP stream (not shown) duplicates most of this in XML and is removed from the catalog at the same time. After cleaning, every field is emptied and the dates are zeroed to the Unix epoch, so nothing identifying remains.

Complete PDF metadata field reference

This table lists every field the tool inspects in a PDF, what each one can reveal about you or your organization, and how each preset treats it. "Removed" means the field is emptied; "Kept" means it is preserved because removing it would break rendering or because a preset deliberately retains it.

FieldWhat it revealsPrivacyLegal FilingMax Privacy
/AuthorThe OS username or full name of whoever created the fileRemovedRemovedRemoved
/TitleDocument title, often an internal working nameRemovedKeptRemoved
/SubjectDescription or classification noteRemovedKeptRemoved
/KeywordsTags, often naming projects or clientsRemovedRemovedRemoved
/CreatorThe application that authored the content (e.g. Word)RemovedRemovedRemoved
/ProducerThe library that wrote the PDF (e.g. Distiller)RemovedRemovedRemoved
/CreationDateExact timestamp the file was createdRemovedRemovedRemoved
/ModDateExact timestamp of last modificationRemovedRemovedRemoved
/Metadata (XMP)XML packet duplicating the above plus edit history and document IDsRemovedRemovedRemoved
Document / Instance IDUnique identifiers that link file revisions togetherRemovedRemovedRemoved
Embedded image EXIFGPS and camera data inside pictures placed in the PDFKeptKeptRemoved

How this has actually burned people

Case · Government redaction

The "redacted" report that wasn't

Government agencies have repeatedly published PDFs where the visible text was blacked out but the metadata was left intact. In several well-documented cases, journalists opened the file properties and found the author's name, the originating department, and revision timestamps that contradicted the official account of when a document existed.

The lesson is that redaction and metadata removal are two separate jobs. Covering text on the page does nothing to the Info dictionary or XMP packet sitting in the file's structure.

Case · Anonymous submission

The whistleblower unmasked by /Author

Someone submits a sensitive document anonymously, having carefully removed their name from the body text. But the PDF was exported from their personal copy of Word, so the /Author field still carries their account name and the /Producer field narrows down their software environment. A single glance at document properties undoes all the care taken with the visible content.

Anyone sharing a document where authorship must stay private should treat metadata removal as mandatory, not optional.

Case · Competitive intelligence

The proposal that revealed the whole timeline

A vendor sends a polished proposal PDF. The recipient checks the metadata and sees a /CreationDate from the morning of the deadline and a /ModDate fifteen minutes before sending — revealing the proposal was rushed. In other cases, keywords and titles have exposed that the same document was reused across multiple competing clients.

How to verify the file is clean

You do not have to take any tool's word for it. After cleaning, you can confirm the metadata is gone using software you already have:

The audit report the tool generates also records a SHA-256 hash of the cleaned file, so you can prove the file you are sharing is exactly the one that was cleaned.

Frequently asked questions

Will removing metadata break my PDF or change how it looks?

No. The page content, fonts, images, form fields, and layout are untouched. Only the Info dictionary and XMP packet are cleared. The document opens and renders identically.

Does this work on password-protected or encrypted PDFs?

If a PDF is encrypted, you will generally need to supply the password (or remove the encryption) before metadata can be rewritten, because the metadata objects themselves are encrypted. For owner-password-protected files that still open without a password, results vary by how the file was secured.

Can the original author be recovered after cleaning?

Because the file is rebuilt from a clean object graph rather than appended to, the cleared Info and XMP values are not retained anywhere in the output. There is no earlier "revision" inside the file to recover them from.

Is there a file-size limit?

Processing happens in your browser using your device's memory, so the practical limit is your available RAM rather than a server cap. Files up to several hundred megabytes process comfortably on a typical laptop.

Does it remove metadata from images embedded inside the PDF?

On the Maximum Privacy preset, yes — EXIF and GPS data inside pictures placed in the PDF are stripped. On the default Privacy preset, embedded image data is left alone so the operation stays fast; switch presets if you need it removed.

What is the difference between the Info dictionary and XMP?

They are two parallel metadata stores. The Info dictionary is the original PDF mechanism (simple key-value pairs); XMP is a newer XML-based packet that can hold the same data plus edit history and identifiers. A cleaner that wipes only one leaves the other readable, which is why this tool removes both.

Does cleaning remove digital signatures?

If a PDF is digitally signed, any change to the file — including metadata removal — will invalidate the signature, because the signature covers the file's bytes. If you need the signature intact, clean the document before it is signed, not after.

Will it strip text I can see on the page?

No. This tool only removes hidden metadata. Visible content — including any names, addresses, or numbers written into the document body — remains. Removing visible content is a separate task called redaction.

Is anything uploaded to a server?

No. The entire process runs in your browser using JavaScript and the pdf-lib library. Your file never leaves your device, which is the whole point of the tool.

Can I clean several PDFs at once?

Yes. Drop multiple files and each is processed locally, then returned individually or as a single ZIP archive accompanied by one audit report covering the whole batch.