We had a follow-up discussion this morning on our back-channel email list about metadata and the post on metadata I made yesterday.
We thought we'd let you see our discussion and invite you to join in.
Marty started it by saying:
Dennis:
Maybe you could recommend a specific meta-cleanser and provide a link?
Meta-cleanser - sounds like something an existential cleaning service might use.
Marty
p.s. The WSJ ran a piece on the UN report with the title "Will Bill Gates Topple Assad?'
+++++
Denise replied:
How do you reconcile the tension between the fact that on the one hand metadata can convey information you might not have wanted to convey, and it also enhances searchability and the richness of one's internal database? Is the solution just to ensure metadata is purged before things break free of the firewall?
+++++
Dennis replied:
Aye, there's the rub.
In the article on my blog and when I speak about this issue in more detail, I try to reconcile the two faces of metadata and emphasize that it can be quite a good thing. (e.g., "Metadata is not inherently evil. It is often quite useful.")
I'd almost think of metadata with the idea of DRM in mind. Internally and with clients, metadata, especially the collaborative info, is incredibly useful. However, in someone else's hands, it can be quite damaging. The "scrubbing" is almost like sending the doc out with limited rights and access to the "full" document, at least conceptually.
So, metadata can't be handled appropriately without considering who the audience is (or might be). You then start to think of the document not just as a document, but as a published product.
In the classic approach, you would "scrub" the document (or create a new clean version or create a PDF) just before you sent it to an opposing party or someone who should not see the metadata. I.e., handle it as a separate process before the doc leaves the firewall.
BTW, I think the good of metadata far outweighs the bad, and, frankly, it's not that difficult to deal with metadata in most cases, if you take the time to learn about it.
I'd mention a metadata scrubber, but since I'm not getting any royalties on the gigantic amount of business I'd send over to them by mentioning a product on this blog, I probably won't do that on the blog. ;-)
Microsoft has a free downloadable Remove Hidden Data tool for Office 2003/XP that some experts turn up their noses at because it doesn't clean EVERYTHING, but, if you are aware of what it doesn't do, most of the time you'll be fine with it Note that it's for Office 2003 and XP, but, seriously, why are law firms still using older versions of Office as we near the start of 2006?
.
Donna Payne has a cleaner called Metadata Assistant that's more or less become the de facto standard tool in legal. It's $79.There are a couple of others (E.g., EZClean.or Workshare Protect.)
Even with scrubbers, you still have the possibility of user error problems.
Both Tom and Ernie are very knowledgeable on these issues and probably have a few other pointers.
My main recommendation is to go into MS Word's properties and turn on what I think is called the "Show Hidden Data" setting (that's the one that will automatically show you the stuff in docs people send to you). Also very helpful is a setting that will pop up the properties window when you first save a document, so you can delete some of the standard automatically-generated metadata.
Here's an article George Socha and I wrote on metadata that's pretty good
- http://www.discoveryresources.org/04_om_electronic_discoverers_0405.html
+++++
Ernie replied:
I think that most explanations of metadata are laden with fear-mongering. Of course, this is probably called for since the greatest danger of metadata is not knowing that it is created in the first place. It's a very binary problem. If you know about metadata and know about the threat then the odds are you aren't going to make a mistake (note I said 'the odds are' and not 'you won't make a mistake').
I think many people don't want to understand the problem; they just want to avoid it. And for those people I would say this: make your document into a PDF using some tool that lets you 'print to PDF.'
Make sure that you have chosen to print only the document and not 'the tracked changes' or any similar thing.
Then after you 'print to PDF' open the document and see if the metadata is visible in the PDF document. If it is go back to step #1. If it's not. then feel free to send it.
If you are doing anything more complicated than that (i.e. redacting using an advance PDF function in Acrobat etc.) then read everything that Dennis has written on the subject of metadata and be afraid. Be VERY afraid....
+++++
Dennis replied:
I really agree that the basics of metadata aren't that hard to learn, if you just invest a little time. Once you start sending Word docs around, though, you really should know what you are doing or you're asking for trouble, just like the UN group.
+++++
Marty replied:
pls blog this thread
+++++
One in our occasional series of looks into our behind-the-scenes discussions.
1. MT on October 26, 2005 10:33 PM writes...
I suppose you could also print to your fax modem instead of to your printer and just fax the document, taking advantage of the automatic preview that fax software seems typically to give you. I've never used the "print to file" option and I don't know what kind of file this produces (printer-model specific?) or where it goes, but maybe that would be something safe to send as an attachment.
Permalink to Comment2. Vanessa J. on November 1, 2005 3:40 PM writes...
In my position as computer consultant, I deal mainly with attorneys & their firms and have an arsenal of anti-metadata tools, depending on the size of firm and type of data you send out (and to whom). If anyone wants to discuss, I can be reached at (516) 681-2266.
Permalink to Comment3. Dave on November 30, 2007 3:05 AM writes...
I know this comment is a bit late; and it's also a bit of a plug.
But we think we might have a better solution which is less prone to error. The solution is http://www.sendshield.com/
And it is built into outlook. The interface is such that as you attach a document, it immediately scans in the background while you type. This avoids delaying the user, and it is very subtle. Hopefully it makes addressing this problem loads easier.
Cheers,
Permalink to CommentDave