Metadata gone.File intact.Nothing uploaded.

Remove Word metadata.

Strip author, company, tracked changes, comments, revision history, and embedded image EXIF from .docx files. Everything Microsoft's Document Inspector misses — handled in your browser, with no upload.

Drop your Word document.

↓ Drag here · or click to browse

DOCX Tracked changes Comments Embedded EXIF Custom XML
Batch supported

Why Word documents leak so much

A .docx file isn't really one file — it's a ZIP archive containing dozens of XML documents and any embedded images. Microsoft's "Inspect Document" feature operates at the Word application layer, which means it cleans some of the well-known fields and leaves quite a lot of others behind.

Our tool unzips the archive, parses the XML directly, and scrubs metadata at every layer where it hides — including the layers Word doesn't expose to its own inspector.

What our tool removes from DOCX files

What Document Inspector misses

Microsoft Word's built-in Document Inspector is a good first pass but has known gaps:

How to use this tool

  1. Drop your .docx file in the box above
  2. Review the inspector — see every field that's about to be removed
  3. Pick a preset (Privacy is the default; Maximum Privacy also wipes embedded EXIF)
  4. Download the cleaned file
  5. Optionally download the audit report with SHA-256 verification
Important: This tool removes metadata. It does not redact visible text. If sensitive information is written into the body of the document, you'll need to remove or redact it separately before sharing.

The anatomy of a .docx file

A modern Word document is not a single binary blob — it is a ZIP archive (the Office Open XML, or OOXML, format) containing a small filesystem of XML parts. If you rename a .docx to .zip and open it, you will find a structure like this:

my-document.docx
├── [Content_Types].xml
├── _rels/
├── docProps/
│   ├── core.xml      ← author, dates, revision, title
│   ├── app.xml       ← company, template, editing time
│   └── custom.xml    ← CRM / DMS injected properties
└── word/
    ├── document.xml  ← the actual text + tracked changes
    ├── comments.xml  ← reviewer comments
    └── media/        ← embedded images (with their own EXIF)

Metadata is spread across several of these parts, which is exactly why a single "remove properties" action in Word does not catch all of it. Our cleaner unzips the archive in your browser, rewrites the relevant XML parts, scrubs the document body of revision markup, strips EXIF from any images in word/media/, then repacks the archive.

How tracked changes are stored — and why "Accept All" is not enough

When change tracking is on, Word does not simply edit the text. It wraps every change in markup. An inserted phrase is stored inside a <w:ins> element and a deleted phrase inside a <w:del> element, each carrying an author name and a timestamp:

<w:ins w:author="jane.doe" w:date="2025-09-14T10:32:00Z">
  <w:r><w:t>confidential figure</w:t></w:r>
</w:ins>

Clicking "Accept All Changes" resolves the visible text, but depending on configuration the document can retain related revision metadata — formatting-change records (<w:rPrChange>, <w:pPrChange>), move operations, and the author/date attributes themselves. Our cleaner explicitly walks word/document.xml and removes every revision element: it keeps the content of insertions (treating them as accepted), discards deletions, and deletes the change-tracking attributes entirely so no author or timestamp survives.

The verification math behind the audit report

After cleaning, the tool computes a SHA-256 hash of both the original and the cleaned file. SHA-256 reduces a file of any size to a fixed 256-bit fingerprint, written as 64 hexadecimal characters. Because the function exhibits the avalanche effect — flipping one input bit flips about half the output bits — the before and after hashes look entirely unrelated, which is visible proof the file changed. The chance of two distinct files producing the same hash is on the order of 1 / 2¹²⁸, which is treated as computationally impossible, so a matching hash reliably identifies a specific file.

A worked example: before and after

A .docx stores its core metadata in docProps/core.xml. Here is what that file looks like in a typical document straight out of Word, and what remains after the Privacy preset runs. The left side is what anyone can read by unzipping your document; the right side is what survives.

Before — core.xml exposed

<cp:coreProperties>
  <dc:creator>Ama Mensah</dc:creator>
  <cp:lastModifiedBy>legal-review</cp:lastModifiedBy>
  <dcterms:created>2025-08-30T09:14:00Z</dcterms:created>
  <dcterms:modified>2025-09-14T17:04:00Z</dcterms:modified>
  <cp:revision>47</cp:revision>
  <dc:title>Settlement draft — confidential</dc:title>
</cp:coreProperties>

After — cleaned

<cp:coreProperties>
  <dc:creator></dc:creator>
  <cp:lastModifiedBy></cp:lastModifiedBy>
  <dcterms:created>2000-01-01T00:00:00Z</dcterms:created>
  <dcterms:modified>2000-01-01T00:00:00Z</dcterms:modified>
</cp:coreProperties>

The "before" version names the original author, the account that performed the legal review, the full creation and modification timeline, a revision count revealing the document went through 47 saves, and a title marking it confidential. The revision count alone can be telling — 47 revisions on a one-page letter signals it was heavily negotiated. After cleaning, the creator and editor are blank and the timestamps are reset to a neutral placeholder date.

Complete Word metadata field reference

Word documents scatter metadata across several parts inside the ZIP. This table covers each location, what it exposes, and how the cleaner treats it.

Location / fieldWhat it revealsAction
core.xml · creatorOriginal author's name or usernameRemoved
core.xml · lastModifiedByWho last saved the fileRemoved
core.xml · created / modifiedExact creation and edit timestampsReset
core.xml · revisionNumber of times the document was savedRemoved
core.xml · title / subject / keywordsInternal naming and tagsRemoved
app.xml · CompanyOrganization name from the Office licenseRemoved
app.xml · ManagerManager name if set in templateRemoved
app.xml · TemplatePath to the template, often a network shareRemoved
app.xml · TotalTimeCumulative minutes spent editingRemoved
custom.xmlCRM / DMS injected properties (matter IDs, client codes)Deleted
document.xml · w:ins / w:delTracked insertions and deletions with author + dateRemoved
document.xml · rPrChange / pPrChangeFormatting-change historyRemoved
comments.xmlReviewer comments with names and timestampsDeleted
word/media/EXIF/GPS inside embedded photosStripped
Document body textVisible content you typedKept

How this has actually burned people

Case · Legal disclosure

The settlement offer that revealed the floor

A law firm sends a counterparty a Word document with "Accept All Changes" applied to the visible text. But the file still contained tracked-change history showing earlier, lower settlement figures that had been edited upward before sending. Opposing counsel recovered the deleted numbers from the XML and learned exactly how much room there was to negotiate.

This is the single most common way Word metadata causes real damage: "Accept All" makes the page look clean while leaving the negotiation history inside the file structure.

Case · Branding

The agency name in the client's report

A consultancy delivers a strategy document under the client's logo. The Company field in app.xml, populated from the consultancy's Office installation, still reads with the agency's name. When the client forwards the document to their board, the board sees who really wrote it.

Case · Source protection

The leaked memo traced to one laptop

An internal memo is leaked to the press. Investigators unzip the document and read the creator and lastModifiedBy fields, plus the Template path pointing to a specific department's network folder. The metadata narrows the source to a handful of people even though the visible text gives nothing away.

How Word's own "Inspect Document" compares

Microsoft Word includes a built-in tool at File → Info → Check for Issues → Inspect Document. It is genuinely useful and worth running, but it has real gaps that catch people out:

Our cleaner works on the file directly rather than through Word, so it catches the parts the application-level inspector leaves behind, and it does so the same way every time.

How to verify the file is clean

Frequently asked questions

Will cleaning change my document's formatting or content?

No. Text, styles, fonts, tables, and images are preserved. Only metadata, comments, and tracked-change records are removed. Accepted edits remain in the final text.

Does this remove comments as well as tracked changes?

Yes. The comment parts (comments.xml and the extended comment files that store reply threads) are deleted, and the comment reference markers in the document body are removed with them.

What is in custom.xml and why does it matter?

Enterprise systems — document management platforms, CRMs, contract tools — often inject custom properties into docProps/custom.xml. These can include internal matter numbers, client IDs, and workflow states. Word's own inspector does not always remove them; our cleaner deletes the part entirely.

Are images inside the document cleaned too?

Yes. Photos pasted into a Word document keep their own EXIF, including any GPS data. The cleaner strips EXIF from JPEGs stored in word/media/.

Does "Accept All Changes" in Word do the same thing?

No. Accepting changes resolves the visible text but can leave revision metadata, formatting-change records, and author/date attributes inside the document XML. Our cleaner removes the revision markup itself, not just its visible effect.

Will the document still open normally in Word and Google Docs?

Yes. The cleaner rewrites the metadata XML and repackages the archive using standard ZIP compression, producing a fully valid .docx that opens in Word, Google Docs, LibreOffice, and Pages.

Does it work on .doc (the old format) too?

The old binary .doc format is handled by the legacy Office path, which blanks the summary-information streams. For the best results, the modern .docx format is recommended.

Are hidden text or fields removed?

Hidden text is content rather than metadata, so it is preserved. If your document contains hidden text you do not want to share, reveal and delete it in Word before cleaning.

Is anything uploaded to a server?

No. The document is unzipped, scrubbed, and repackaged entirely in your browser using the JSZip library. It never leaves your device.

Can I clean a batch of documents at once?

Yes. Drop multiple files and each is processed locally, then returned individually or as a single ZIP with one audit report for the whole set.