Remove Word DOCX Metadata Online — Free, Browser-Based

Why Word documents leak so much

A .docx file isn't really one file — it's a ZIP archive containing dozens of XML documents and any embedded images. Microsoft's "Inspect Document" feature operates at the Word application layer, which means it cleans some of the well-known fields and leaves quite a lot of others behind.

Our tool unzips the archive, parses the XML directly, and scrubs metadata at every layer where it hides — including the layers Word doesn't expose to its own inspector.

What our tool removes from DOCX files

Core properties (docProps/core.xml): creator, lastModifiedBy, lastPrinted, created, modified, revision, title, subject, description, keywords, category
App properties (docProps/app.xml): Company, Manager, Template, TotalTime, Application, AppVersion, page/word/character counts
Custom properties (docProps/custom.xml): CRM, DMS, and template-injected custom XML properties (the most overlooked metadata category in enterprise documents)
Tracked changes: w:ins (insertions accepted), w:del (deletions removed), w:moveFrom/w:moveTo, w:rPrChange, w:pPrChange
Comments: word/comments.xml, commentsExtended.xml, commentsIds.xml, and threaded discussion files
Embedded image EXIF: every JPEG inside word/media/ gets its EXIF, IPTC, and XMP stripped

What Document Inspector misses

Microsoft Word's built-in Document Inspector is a good first pass but has known gaps:

Embedded image metadata — Document Inspector does not strip EXIF from images inside the document. Our tool does.
Custom XML parts — many CRM systems and document management platforms inject custom XML that survives Inspector.
Template paths — the Template field in app.xml often contains a full filesystem path revealing your organization's network structure.
Comment threads — Inspector usually catches comments, but in some configurations the extended comment files (containing reply threads) are left behind.

How to use this tool

Drop your .docx file in the box above
Review the inspector — see every field that's about to be removed
Pick a preset (Privacy is the default; Maximum Privacy also wipes embedded EXIF)
Download the cleaned file
Optionally download the audit report with SHA-256 verification

Important: This tool removes metadata. It does not redact visible text. If sensitive information is written into the body of the document, you'll need to remove or redact it separately before sharing.

The anatomy of a .docx file

A modern Word document is not a single binary blob — it is a ZIP archive (the Office Open XML, or OOXML, format) containing a small filesystem of XML parts. If you rename a .docx to .zip and open it, you will find a structure like this:

my-document.docx
├── [Content_Types].xml
├── _rels/
├── docProps/
│   ├── core.xml      ← author, dates, revision, title
│   ├── app.xml       ← company, template, editing time
│   └── custom.xml    ← CRM / DMS injected properties
└── word/
    ├── document.xml  ← the actual text + tracked changes
    ├── comments.xml  ← reviewer comments
    └── media/        ← embedded images (with their own EXIF)

Metadata is spread across several of these parts, which is exactly why a single "remove properties" action in Word does not catch all of it. Our cleaner unzips the archive in your browser, rewrites the relevant XML parts, scrubs the document body of revision markup, strips EXIF from any images in word/media/, then repacks the archive.

How tracked changes are stored — and why "Accept All" is not enough

When change tracking is on, Word does not simply edit the text. It wraps every change in markup. An inserted phrase is stored inside a <w:ins> element and a deleted phrase inside a <w:del> element, each carrying an author name and a timestamp:

<w:ins w:author="jane.doe" w:date="2025-09-14T10:32:00Z">
  <w:r><w:t>confidential figure</w:t></w:r>
</w:ins>

Clicking "Accept All Changes" resolves the visible text, but depending on configuration the document can retain related revision metadata — formatting-change records (<w:rPrChange>, <w:pPrChange>), move operations, and the author/date attributes themselves. Our cleaner explicitly walks word/document.xml and removes every revision element: it keeps the content of insertions (treating them as accepted), discards deletions, and deletes the change-tracking attributes entirely so no author or timestamp survives.

The verification math behind the audit report

After cleaning, the tool computes a SHA-256 hash of both the original and the cleaned file. SHA-256 reduces a file of any size to a fixed 256-bit fingerprint, written as 64 hexadecimal characters. Because the function exhibits the avalanche effect — flipping one input bit flips about half the output bits — the before and after hashes look entirely unrelated, which is visible proof the file changed. The chance of two distinct files producing the same hash is on the order of 1 / 2¹²⁸, which is treated as computationally impossible, so a matching hash reliably identifies a specific file.

A worked example: before and after

A .docx stores its core metadata in docProps/core.xml. Here is what that file looks like in a typical document straight out of Word, and what remains after the Privacy preset runs. The left side is what anyone can read by unzipping your document; the right side is what survives.

Before — core.xml exposed

<cp:coreProperties>
  <dc:creator>Ama Mensah</dc:creator>
  <cp:lastModifiedBy>legal-review</cp:lastModifiedBy>
  <dcterms:created>2025-08-30T09:14:00Z</dcterms:created>
  <dcterms:modified>2025-09-14T17:04:00Z</dcterms:modified>
  <cp:revision>47</cp:revision>
  <dc:title>Settlement draft — confidential</dc:title>
</cp:coreProperties>

After — cleaned

<cp:coreProperties>
  <dc:creator></dc:creator>
  <cp:lastModifiedBy></cp:lastModifiedBy>
  <dcterms:created>2000-01-01T00:00:00Z</dcterms:created>
  <dcterms:modified>2000-01-01T00:00:00Z</dcterms:modified>
</cp:coreProperties>

The "before" version names the original author, the account that performed the legal review, the full creation and modification timeline, a revision count revealing the document went through 47 saves, and a title marking it confidential. The revision count alone can be telling — 47 revisions on a one-page letter signals it was heavily negotiated. After cleaning, the creator and editor are blank and the timestamps are reset to a neutral placeholder date.

Complete Word metadata field reference

Word documents scatter metadata across several parts inside the ZIP. This table covers each location, what it exposes, and how the cleaner treats it.

Location / field	What it reveals	Action
`core.xml` · creator	Original author's name or username	Removed
`core.xml` · lastModifiedBy	Who last saved the file	Removed
`core.xml` · created / modified	Exact creation and edit timestamps	Reset
`core.xml` · revision	Number of times the document was saved	Removed
`core.xml` · title / subject / keywords	Internal naming and tags	Removed
`app.xml` · Company	Organization name from the Office license	Removed
`app.xml` · Manager	Manager name if set in template	Removed
`app.xml` · Template	Path to the template, often a network share	Removed
`app.xml` · TotalTime	Cumulative minutes spent editing	Removed
`custom.xml`	CRM / DMS injected properties (matter IDs, client codes)	Deleted
`document.xml` · w:ins / w:del	Tracked insertions and deletions with author + date	Removed
`document.xml` · rPrChange / pPrChange	Formatting-change history	Removed
`comments.xml`	Reviewer comments with names and timestamps	Deleted
`word/media/`	EXIF/GPS inside embedded photos	Stripped
Document body text	Visible content you typed	Kept

How this has actually burned people

Case · Legal disclosure

The settlement offer that revealed the floor

A law firm sends a counterparty a Word document with "Accept All Changes" applied to the visible text. But the file still contained tracked-change history showing earlier, lower settlement figures that had been edited upward before sending. Opposing counsel recovered the deleted numbers from the XML and learned exactly how much room there was to negotiate.

This is the single most common way Word metadata causes real damage: "Accept All" makes the page look clean while leaving the negotiation history inside the file structure.

Case · Branding

The agency name in the client's report

A consultancy delivers a strategy document under the client's logo. The Company field in app.xml, populated from the consultancy's Office installation, still reads with the agency's name. When the client forwards the document to their board, the board sees who really wrote it.

Case · Source protection

The leaked memo traced to one laptop

An internal memo is leaked to the press. Investigators unzip the document and read the creator and lastModifiedBy fields, plus the Template path pointing to a specific department's network folder. The metadata narrows the source to a handful of people even though the visible text gives nothing away.

How Word's own "Inspect Document" compares

Microsoft Word includes a built-in tool at File → Info → Check for Issues → Inspect Document. It is genuinely useful and worth running, but it has real gaps that catch people out:

It operates at the application layer, so it removes what Word's interface knows about — but it does not reliably strip EXIF from images embedded in word/media/.
It can miss custom XML parts injected by third-party systems, which is exactly where enterprise tools store client and matter identifiers.
It requires you to remember to run it every time, on every document, before every send — a manual step that is easy to skip under deadline pressure.
The Template path in app.xml, which can expose your network structure, is not always cleared.

Our cleaner works on the file directly rather than through Word, so it catches the parts the application-level inspector leaves behind, and it does so the same way every time.

How to verify the file is clean

In Word: open the cleaned file, go to File → Info, and confirm the Author, Last Modified By, and Company fields under Properties are blank.
By unzipping: rename a copy of the file to .zip, open it, and inspect docProps/core.xml and docProps/app.xml in a text editor — the creator and company elements should be empty.
Check for custom.xml: confirm docProps/custom.xml is no longer present in the archive.
The audit report records a SHA-256 hash of the cleaned file as tamper-evident proof.

Frequently asked questions

Will cleaning change my document's formatting or content?

No. Text, styles, fonts, tables, and images are preserved. Only metadata, comments, and tracked-change records are removed. Accepted edits remain in the final text.

Does this remove comments as well as tracked changes?

Yes. The comment parts (comments.xml and the extended comment files that store reply threads) are deleted, and the comment reference markers in the document body are removed with them.

What is in custom.xml and why does it matter?

Enterprise systems — document management platforms, CRMs, contract tools — often inject custom properties into docProps/custom.xml. These can include internal matter numbers, client IDs, and workflow states. Word's own inspector does not always remove them; our cleaner deletes the part entirely.

Are images inside the document cleaned too?

Yes. Photos pasted into a Word document keep their own EXIF, including any GPS data. The cleaner strips EXIF from JPEGs stored in word/media/.

Does "Accept All Changes" in Word do the same thing?

No. Accepting changes resolves the visible text but can leave revision metadata, formatting-change records, and author/date attributes inside the document XML. Our cleaner removes the revision markup itself, not just its visible effect.

Will the document still open normally in Word and Google Docs?

Yes. The cleaner rewrites the metadata XML and repackages the archive using standard ZIP compression, producing a fully valid .docx that opens in Word, Google Docs, LibreOffice, and Pages.

Does it work on .doc (the old format) too?

The old binary .doc format is handled by the legacy Office path, which blanks the summary-information streams. For the best results, the modern .docx format is recommended.

Are hidden text or fields removed?

Hidden text is content rather than metadata, so it is preserved. If your document contains hidden text you do not want to share, reveal and delete it in Word before cleaning.

Is anything uploaded to a server?

No. The document is unzipped, scrubbed, and repackaged entirely in your browser using the JSZip library. It never leaves your device.

Can I clean a batch of documents at once?

Yes. Drop multiple files and each is processed locally, then returned individually or as a single ZIP with one audit report for the whole set.

Remove Word metadata.

Drop your Word document.