Monday, March 5, 2007

Why Vista file tagging has to suck

Windows Vista has the possibility to add tags (labels) to files, much like Gmail and nearly all photo management programs, for example. Gina Trapani on Lifehacker has written a nice article on how to use this feature.

However, you cannot tag each and every file: the file type has to support metadata. So you can tag Office documents, JPEG files, MP3s, but forget about tagging, say, txt files, TeX files, source code, or even files that do support tagging but are not supported by Vista (I suppose it is possible to add extra types through plugins, though).

I agree that this is a severe limitation, but I understand the decision and agree with Microsoft that this is the best way to do it. There are basically five options to implement file tagging:

  1. Store metadata in the file system. WinFS would support this, but this new filesystem (oh, sorry, “future storage”) was eventually dropped from Longhorn. But, in fact, NTFS also supports tagging, and in fact Windows XP already has an interface for this, but it seems that many people don't know this.
    However, you would only be able to tag files residing on certain filesystems, which would trash the tags if the file is copied to a USB drive (FAT32), a cd or dvd, uploaded, e-mailed, zipped, stored in Subversion, placed on a Novell or Samba network share, backed up, etcetera, etcetera. Vista could issue a warning if a copy or move action would destroy the metadata, but these warnings would be annoying, and also confusing to users who have never used the tagging to begin with.
  2. Store metadata in a database per filesystem. You would need one database per volume, let's say c:\metadata.dat. This would of course be a hidden and system file, so it would not get in the way.
    This approach has the advantage of working on every filesystem, including USB drives and network shares, but still wouldn't work if you zip or e-mail a file or burn it to a dvd. As long as the web, e-mail, cds and dvds don't support external metadata, we cannot ever expect this to be possible. With “we” I mean “we software developers”: the average user will be expecting his metadata to be retained!
    Also, this option has some implementation issues that need a lot of thought: the central database will get big, so you'll probably want some sort of caching, but in these mobile and Plug'n'Play days the OS cannot rely on a filesystem being available all the time. They could probably work something out, but it wouldn't be perfect in all situations.
    Moreover, what if the file is moved, changed or removed by a system that does not understand the metadata file? The database would get out of synch with the actual contents of the files and the filesystem, and things would basically become a mess.
  3. Store metadata in a database per directory. Like the desktop.ini files used to store folder settings, a metadata.dat file could be added to every folder that contains tagged files.
    This approach has the same advantages and drawbacks of the previous one.
  4. Store metadata in a file per file. When you save a file named index.html from Internet Explorer, it creates a directory index.html_files containing all dependencies of the HTML file (images, style sheets, Javascripts etc.). Windows treats the HTML file and its associated directory as a unit, based on their filenames. Something similar could be done for metadata: for every file.ext, add a hidden and system file named file.ext_metadata that is always copied and moved along with the file.
    As long as you use Vista's Explorer to copy and move files along, this will be fine. But again, even when working only under Vista, some programs will still drop metadata without notice: think of cd burning tools, backup tools or compression tools. All these applications would need to be updated for Vista, which will take time.
  5. Store metadata in the files themselves. This is the most localized approach, and it is the one that Microsoft decided to use.
    Its one big advantage over the previous methods is that metadata will never, ever get lost in a file transfer. If you tag a file, it will remain tagged for the rest of its life, no matter if you zip it, export it to punch cards, or send it to Jupiter and back. But... only some types of files can be tagged.

All solutions above will confuse users at a certain point. People will inevitably start to rely on tagging features, so we shouldn't treat tags as an ‘extra’ which can be discarded lightly. The first four options will allow you to tag anything (including directories, incidentally), or at least anything on certain filesystems. But using this approach, the tags may get lost in mysterious ways that the average user won't understand. Tags getting lost will lead to a lot of very unhappy people. And I haven't even started on the lock-in that results if the tag database format is not open.

On the other hand, some users (myself included) will not be very happy if they are only allowed to tag certain files but not others. Many people will not understand the technical reasons for this. But if we consider that the “average user” uses her computer for office work, digital photography or a music collection, we see that these kinds of files are all taggable, and she may never even notice this limitation.

In short, I'd rather be safe but a bit limited, than randomly and unexpectedly losing my tags. Software should treat the user's data as the most precious thing on Earth.


Anonymous said...

Thank you for a lucid account of the situation. The greatest error in this, in my opinion, is not including support for some important file types that have become de facto standards, such as pdf-files.

Joe Dubois said...

You forgot alternative streams.

Thomas ten Cate said...

Using alternative streams would be option 1, to store metadata in the filesystem. It would therefore only be supported by NTFS.

Andrea D'Intino said...

We're doing something similar to WinFS, storing meta-data inside a db, and you can tag any kind of file, plus you can auto-tag stuff based on rules... I hope I'm not to spammy: Tabbles.

Anonymous said...

Also, check out for both home and server solutions.