What spreadsheets reveal
Excel workbooks carry many of the same metadata categories as Word documents — author, company, last-modified-by — but spreadsheets have additional categories that are particularly leaky:
- Named ranges with external file paths: if you ever pulled data from another workbook, the path to that file is often stored as a named range. This can reveal your entire network drive structure.
- External data connections: URLs and credentials for database connections, web queries, and Power Query sources.
- Cell comments and threaded comments: contain author names, timestamps, and discussion threads — sometimes containing internal decision-making that shouldn't leave the organization.
- Hidden sheets and hidden columns: not removed by this tool (those are content, not metadata) but worth checking manually.
- Embedded images: charts and logos pasted into workbooks retain their EXIF.
What our tool removes
- Core properties (creator, lastModifiedBy, created, modified, revision, title, subject, keywords)
- App properties (Company, Manager, Template, TotalTime, Application)
- Custom properties (custom.xml — often CRM/DMS metadata)
- Threaded comments and comment files
- EXIF from JPEGs embedded in
xl/media/
What our tool does not touch
- Cell values, formulas, formatting
- Chart data and chart styling
- PivotTables and PivotCaches
- Named ranges that are part of formulas (we don't break your workbook)
- Hidden sheets or columns (those are content; you must un-hide and delete manually)
How to use
- Drop your .xlsx file above
- Review the field list in the inspector
- Pick a preset (Privacy or Maximum Privacy)
- Download the cleaned workbook
- Optionally download the audit report
Inside an .xlsx file
Like Word documents, Excel workbooks are ZIP archives of XML parts. Rename an
.xlsx to .zip and you will find the workbook's metadata in the
same docProps/ folder, alongside the sheets themselves:
budget.xlsx
├── docProps/
│ ├── core.xml ← author, lastModifiedBy, dates
│ ├── app.xml ← company, manager, application
│ └── custom.xml ← injected enterprise properties
└── xl/
├── workbook.xml ← sheet structure, defined names
├── sharedStrings.xml ← all text values
├── comments1.xml ← cell comments (author + text)
└── media/ ← embedded images / logos
The named-range leak that's unique to spreadsheets
Spreadsheets carry a category of metadata that documents and presentations do not:
defined names with external references. When you pull data from another
workbook — even once, even years ago — Excel can store a defined name that records the full
path to that source file. In xl/workbook.xml it looks like this:
<definedName name="LastYear">
'[\\fileserver\finance\2024\Q4-actuals.xlsx]Sheet1'!$A$1:$M$60
</definedName>
That single line exposes your internal server name, the folder hierarchy, and the naming convention of confidential files — none of which is visible anywhere on the spreadsheet grid. It is one of the most overlooked disclosure routes in shared financial models.
What the cleaner removes, and what it deliberately keeps
The cleaner empties the author, company, and date fields in core.xml and
app.xml, deletes custom.xml, removes comment parts, and strips
EXIF from embedded images. It deliberately does not touch defined names that are
part of working formulas, cell values, or formatting, because altering those would change
your calculations. If a defined name points to an external path you want gone, remove that
reference in Excel first (Formulas → Name Manager), then run the cleaner.
The size relationship
Because metadata is a small fraction of a workbook's bytes (the bulk is in
sharedStrings.xml and the sheet data), cleaning changes the file size only
slightly. After unzipping, scrubbing, and re-zipping, the cleaned size is approximately:
cleaned_size ≈ original_size − metadata_bytes ± recompression_delta
The recompression_delta is small and can be positive or negative, because the
archive is re-compressed with standard DEFLATE settings that may differ slightly from how
the original was packed. Your data, formulas, and formatting are unchanged.
A worked example: before and after
Excel keeps its document properties in docProps/core.xml and
docProps/app.xml, exactly like Word. But the more interesting leak is in
xl/workbook.xml, where defined names can carry full external file paths. Here
is a workbook's metadata before and after cleaning, including a named range that exposes an
internal server.
Before — exposed
<!-- docProps/core.xml -->
<dc:creator>r.thompson</dc:creator>
<cp:lastModifiedBy>cfo-office</cp:lastModifiedBy>
<!-- docProps/app.xml -->
<Company>Northwind Capital</Company>
<!-- xl/workbook.xml -->
<definedName name="LY">
'[\\fs01\finance\2024\actuals.xlsx]P&L'!$A$1
</definedName>
After — cleaned
<!-- docProps/core.xml -->
<dc:creator></dc:creator>
<cp:lastModifiedBy></cp:lastModifiedBy>
<!-- docProps/app.xml -->
<Company></Company>
<!-- xl/workbook.xml (defined name kept;
remove in Excel if path-bearing) -->
The author username, the CFO-office account, and the company name are all cleared from
the property files. The defined name is shown here because it illustrates the most
spreadsheet-specific risk: that single line exposes a server name (fs01), a
folder hierarchy, and the existence of a confidential file. The cleaner does not
automatically delete defined names because doing so can break formulas — see the note below
on how to remove path-bearing names safely in Excel.
Complete Excel metadata field reference
| Location / field | What it reveals | Action |
|---|---|---|
core.xml · creator | Original author's name or username | Removed |
core.xml · lastModifiedBy | Who last saved the workbook | Removed |
core.xml · created / modified | Creation and edit timestamps | Reset |
core.xml · title / subject / keywords | Internal naming and tags | Removed |
app.xml · Company | Organization from the Office license | Removed |
app.xml · Manager | Manager name if set | Removed |
app.xml · Template | Template path (often a network share) | Removed |
custom.xml | CRM / DMS injected properties | Deleted |
| Threaded comments | Reviewer notes with author names | Deleted |
xl/media/ | EXIF/GPS in embedded images and logos | Stripped |
| Defined names w/ external paths | Server names and confidential file paths | Kept * |
| Cell values & formulas | Your actual data | Kept |
* Defined names are kept because deleting one that a formula depends on would break your workbook. Remove path-bearing names yourself via Formulas → Name Manager, then run the cleaner.
How this has actually burned people
The budget file that mapped the whole network
A finance team shares a budget workbook with an external auditor. Years earlier, someone
had linked a cell to last year's actuals on a shared drive. That link survived as a defined
name carrying the full path \\fs01\finance\2024\actuals.xlsx — handing the
auditor the internal server name, the folder structure, and the naming convention of files
they were never meant to know existed.
The "independent" analysis with a giveaway author
A model is presented as the independent work of one department, but the
creator field names an analyst from a different team, and the
Company field names an outside consultancy. The metadata quietly contradicts
the story the spreadsheet was meant to tell.
The cell comment that should have been deleted
A workbook circulated to investors still contained a cell comment reading "use the conservative number here, the real figure is 20% lower." The visible cells looked polished; the comment, attached with the author's name and a timestamp, told the opposite story.
How to remove a path-bearing defined name in Excel
Because the cleaner deliberately preserves defined names to avoid breaking formulas, here is how to remove a risky one yourself before cleaning:
- Open the workbook and go to Formulas → Name Manager.
- Look for any name whose "Refers To" value contains a path in square brackets, like
[\\server\folder\file.xlsx]. - If the name is not used by any live formula, delete it. If it is in use, replace the external reference with a local value or a copy of the data, then delete the link.
- Save, then run this cleaner to remove the remaining document properties.
How to verify the file is clean
- In Excel: File → Info → Properties shows Author, Last Modified By, and Company; confirm they are blank.
- By unzipping: rename a copy to
.zipand inspectdocProps/core.xmlanddocProps/app.xmlin a text editor. - Confirm
docProps/custom.xmlis gone from the archive. - The audit report records a SHA-256 hash of the cleaned workbook.
Frequently asked questions
Will cleaning break my formulas or PivotTables?
No. Cell values, formulas, PivotTables, charts, and formatting are preserved. Only metadata, comments, and embedded-image EXIF are removed.
Does this remove cell comments?
Yes. Both classic comments and threaded comments are removed, including the author names and timestamps attached to them.
Will it remove external data connections?
The cleaner removes metadata properties and comment parts. Defined names and data connections that are part of working formulas are left intact so your workbook keeps functioning; remove those manually in Excel if they reference paths you want gone.
Are hidden sheets or columns removed?
No — hidden sheets and columns are content, not metadata, so the cleaner leaves them in place. Unhide and delete them in Excel if they contain data you do not want to share.
Why doesn't it automatically delete risky defined names?
Because a defined name can be referenced by formulas across the workbook. Deleting one
blindly could turn cells into #NAME? errors. We flag the risk and explain how
to remove path-bearing names safely in Excel, rather than silently breaking your file.
Does it work on .xlsm files with macros?
The metadata cleaning applies to the document properties and comments the same way. Note that the macro code itself is content and is preserved; if a macro contains identifying information, review it separately.
Will charts and conditional formatting survive?
Yes. Charts, conditional formatting, data validation, and named styles are all part of the workbook content and are untouched by cleaning.
Are logos and images in the workbook cleaned?
Yes. EXIF is stripped from JPEGs stored in xl/media/, so a phone photo
pasted into a sheet does not carry its original GPS or device data.
Is anything uploaded to a server?
No. The workbook is unzipped, scrubbed, and repackaged entirely in your browser. It never leaves your device.
Can I clean many workbooks at once?
Yes. Drop a batch and each file is processed locally, then returned individually or as a single ZIP with one audit report covering all of them.
Want the deeper background? Our guide on what spreadsheets leak — authors, comments, and hidden network paths explains the defined-name network-path risk in detail, and the metadata removal algorithms guide covers exactly how the OOXML cleaning works.