What metadata actually is
Every digital file carries hidden information about itself. A photo from your phone contains the GPS coordinates of where you took it, the make and model of your camera, the timestamp down to the second, the orientation, the exposure settings, and often the operating system version. A PDF contains your name, your computer's user account, the software that created it, every revision date, and sometimes the path of every file you embedded.
This data is invisible inside the application that displays the file — you don't see GPS coordinates when you look at a JPEG in Preview. But it is trivially extractable by anyone who downloads the file. Researchers, journalists, lawyers, HR teams, and stalkers all know this. Most users don't.
The standard metadata containers
- EXIF (Exchangeable Image File Format) — embedded by cameras and phones. Stores GPS, timestamps, camera info, exposure data, and thumbnail previews.
- IPTC — the journalism and stock-photography standard. Stores captions, keywords, copyright, creator, location names, and rights data.
- XMP — Adobe's Extensible Metadata Platform. An XML-based wrapper used by Photoshop, Lightroom, and most professional tools.
- OOXML core/app properties — for DOCX, XLSX, and PPTX. Stores author, last-modified-by, company, revision count, total editing time, and template paths.
- PDF Info dictionary & XMP — PDFs use both an old-style Info dictionary and a modern XMP packet. Removing one without the other leaves data behind.
- C2PA Content Credentials — cryptographically signed provenance data added by Adobe Firefly, Photoshop's Generative Fill, DALL·E, and several AI image generators.
Why browser-based processing matters
The single most important fact about a metadata removal tool is whether your file leaves your device. Most online tools — including the popular Metadata2Go, Pics.io, PDF24, and Online-Metadata.com — process files on their servers. You upload, they strip, they return a "cleaned" file.
This is a privacy contradiction. The thing you're trying to keep private (your location, your name, your edit history) gets uploaded to a server you don't control, processed by code you can't audit, and stored (officially or unofficially) for some period of time. You're trading one privacy risk for another, in the name of fixing the first one.
A privacy tool that uploads your file is not a privacy tool. It's an honor system.
removemetadata.tools is built on the opposite philosophy. Every file is parsed, inspected, scrubbed, and re-saved inside your browser tab. The file never travels over a network. We literally cannot see what you cleaned because the data never reaches us.
The technical stack, in plain terms
JavaScript handles the simple stuff natively
For JPEG, PNG, WebP, and SVG, the file structure is well-documented and not too complex. Our engine parses these formats directly in JavaScript: we walk the byte stream, identify metadata segments by their marker bytes, and rewrite the file without them. No external library needed, runs at near-native speed.
pdf-lib handles PDF
PDF files have a complex internal structure: a cross-reference table, an object graph, streams, and metadata in both an Info dictionary and an XMP packet. We use pdf-lib, a pure-JavaScript PDF library, to parse the document, clear the Info dictionary, remove the XMP stream from the catalog, and re-serialize the file cleanly.
JSZip handles Office files
DOCX, XLSX, and PPTX files are ZIP archives. We use JSZip to unpack
the archive in memory, rewrite the metadata XML files (docProps/core.xml,
docProps/app.xml), strip tracked changes from the document XML, scrub EXIF
from any embedded JPEGs, and repackage the archive.
What we strip, field by field
From images (JPG, PNG, WebP)
- GPS coordinates, altitude, GPS timestamp, GPS direction
- Camera make, model, serial number, lens info, firmware
- Date and time taken, date digitized, date modified
- Owner name, copyright, artist, software used
- Embedded thumbnails (which sometimes preserve cropped content)
- Editing history (Photoshop, Lightroom, mobile editor traces)
- C2PA content credentials and AI generation signatures
- IPTC keywords, captions, location names (unless preset preserves them)
From PDFs
- Info dictionary: Title, Author, Subject, Keywords, Creator, Producer, CreationDate, ModDate
- XMP metadata packet
- Embedded document IDs and instance IDs
From Office files (DOCX, XLSX, PPTX)
- Author, last modified by, company, manager
- Creation date, modification date, last printed date
- Revision number, total editing time
- Template path and template name
- Comments and threaded discussions
- Tracked-changes history, including rejected edits
- EXIF data inside embedded images
- Custom XML parts that store CRM, DMS, and template-generated metadata
What we preserve, by default
Total annihilation is rarely what users want. Strip the orientation flag from a JPEG and it might display rotated. Strip the color profile and it may display with wrong colors. Strip ICC data from a print-bound PDF and the printer might choke. By default, we preserve:
- Orientation tag — so images display the right way up
- ICC color profile — so colors render correctly
- DPI / resolution — so print sizing is preserved
- Format-specific structural data — anything required for the file to open and display correctly
If you want a total wipe, the Maximum Privacy preset will strip even these.
Verification you can trust
After cleaning, the tool generates an audit report listing every field that was removed, with SHA-256 checksums of both the original and cleaned files, so you have a tamper-evident record of exactly what changed.
If you want the full technical detail, our guide on how metadata removal works — the algorithm for each file format walks through the exact cleaning logic, with pseudocode, for every format the engine supports.
For users in regulated industries — legal e-discovery, journalistic source protection, HR compliance — this audit trail is essential.
Frequently asked questions
Does removing metadata change the look or content of my file?
No. Only the hidden descriptive data is stripped. The pixels in your image, the text in your document, the audio in your file — all preserved exactly as they were.
Is the cleaning reversible?
No. For images, the metadata bytes are deallocated and the file is re-serialized. For PDFs, pdf-lib runs a clean serialization that removes old metadata. For OOXML files, we rewrite the XML in place and recompress the archive.
Can metadata removal protect me from all tracking?
No tool can. We remove the metadata inside the file. Things that can still identify a file include visible watermarks, content-derived fingerprints (perceptual hashing), invisible steganographic marks, the IP address you used to send it, and the email metadata of any message that carries it. Treat metadata removal as one important layer of privacy, not the whole stack.
Why does Microsoft Office's Document Inspector miss things?
Document Inspector operates at the application layer — it removes data Word knows about. But DOCX is a ZIP archive of XML files, and the XML can carry data that survives an "Inspect Document" pass: EXIF inside embedded images, custom XML parts inserted by enterprise CRMs, and template paths.
What about C2PA Content Credentials on AI-generated images?
C2PA is a cryptographically signed manifest embedded in images by Adobe Firefly, Photoshop's Generative Fill, DALL·E, and several other AI tools. Our tool strips the C2PA packet completely. Note that this is a one-way operation — once removed, the signature cannot be reattached.
How big a file can I process?
Up to 500 MB per file, with no limit on the number of files in a batch. Larger files are practically limited by your available RAM, since processing happens in your browser tab.