Metadata gone.File intact.Nothing uploaded.
Technical · 14 min read

How metadata removal works: the algorithm for each file format

A format-by-format walkthrough of the exact cleaning algorithm — JPEG segments, PNG chunks, the HEIC item-location surgery, MP4 atom trees, ID3 tags, and OOXML XML — with pseudocode for each.

Read article →
PDF · 8 min read

Why PDF metadata is the most-overlooked privacy leak in your business

How the Info dictionary and XMP packet together expose your real name, your software stack, and often your company directory paths to anyone who downloads the file.

Read article →
Images · 6 min read

What your photos tell strangers: the GPS, the serial, the silent thumbnail

A walk through what's inside a default iPhone photo — and what an attacker can do with each piece of metadata if you don't strip it before sharing.

Read article →
Word · 7 min read

Tracked changes: the metadata that has cost careers

The famous "redaction failures" of US government documents and Fortune 500 contracts are usually about tracked changes that "Accept All" doesn't fully remove.

Read article →
Images · 8 min read

HEIC and iPhone photos: what your camera roll reveals

Every default iPhone photo is a HEIC file carrying GPS, your device serial, depth maps, and timestamps — and HEIC is technically harder to clean than JPEG. Here's why, and how to do it losslessly.

Read article →
Audio & Video · 9 min read

The hidden metadata in your audio and video files

Phone videos carry GPS in the container; MP3s can name the account that bought them; pro WAV files log the studio. A tour of ID3 tags, QuickTime atoms, and Vorbis comments.

Read article →
Provenance · 7 min read

C2PA Content Credentials: the new metadata in AI and camera images

A signed provenance trail is spreading through cameras, editors, and AI generators. What it records, where it lives, and the genuine trade-off between keeping and removing it.

Read article →
Excel · 8 min read

What spreadsheets leak: authors, comments, and hidden network paths

Excel carries a leak no other format does — defined names that embed the full path to files on your internal server, mapping your network to anyone who opens the workbook.

Read article →
Practical · 7 min read

A metadata checklist before you publish anything online

A repeatable, format-by-format routine for stripping metadata before you post a photo, upload a document, or share a file — plus how to verify it actually worked.

Read article →

Why we write these guides

Metadata is the part of a file that most people never see and rarely think about — and that is precisely what makes it dangerous. A photographer can crop a sensitive detail out of an image and still ship the GPS coordinates of where it was taken. A lawyer can redact a contract's visible text and still hand over the tracked-change history showing what was negotiated away. A consultant can deliver a deck under a client's logo while the file quietly names the agency that built it. None of this is exotic; it happens constantly, because the tools people use every day hide this data by default and rarely surface it.

These guides exist to close that knowledge gap. Each one takes a single format or scenario and explains, in plain language, three things: what metadata the format carries, why that specific metadata can cause real harm, and what you can actually do about it. We avoid fear-mongering and we avoid hand-waving — where a technical detail matters, we show you the structure or the math behind it, so you can verify the claim rather than take it on faith.

How to think about file privacy

The single most useful mental model is that file privacy is layered, not binary. Removing metadata is one layer. It is an important one, and for most everyday sharing it is the layer people are missing. But it sits alongside others: redacting visible content, controlling who receives the file, stripping identifying information from the channel you send it over, and understanding that some platforms re-add their own tracking after upload.

A helpful way to reason about any file you are about to share is to ask three questions in order. First: what can someone read on the surface? That is the visible content, handled by redaction. Second: what can someone read underneath? That is metadata, handled by tools like this one. Third: what does the act of sending reveal? That is the channel — the email headers, the IP address, the upload logs — handled by how and where you transmit the file. A guide that only addresses one layer while ignoring the others gives a false sense of safety, which is why every article here is explicit about where metadata removal helps and where it does not.

Who these are for

The guides are written for a general audience first — anyone who shares photos, sends documents, or posts files online — with enough technical depth that professionals in law, journalism, human resources, and security will find them accurate rather than simplified to the point of being wrong. You do not need to understand file formats to act on the advice, but if you want to understand why the advice is correct, the detail is there.

New guides are added as we cover more formats and scenarios. If there is a topic you would find useful — a specific format, a specific risk, a specific workflow — the contact page is the fastest way to suggest it.