Sunday, August 26, 2007

OOXML: defective, but don't exaggerate

In a recent article titled Microsoft Office XML Formats? Defective by design, self-proclaimed file format expert Stéphane Rodriguez explains 13 reasons why Microsoft's Office Open XML (OOXML) format should not become an ISO standard. Although I completely agree with his conclusion, Rodriguez seemingly got carried away by his rant, uttering nonsense in some places. Still, most arguments make sense, so I'll cover only those that don't, below.

1) Self-exploding spreadsheets. Here, he modifies an Excel file manually, and is surprised that even this “simple” change breaks the file. He does not once refer to the specification to see whether the thing he changes may be dependent on things elsewhere in the file. So, probably, the file he created is not at all according to spec. Is it strange, then, that Excel goes boom? An Office document is a lot more complex than you would say at first sight, and the storage format is bound to reflect this.

2) Entered versus stored values. Clearly, the values get processed internally as (binary) floating-point numbers, which explains numbers like 1234.1233999999999 cropping up when converted back to the decimal storage needed for XML. If an implementation simply uses the IEEE floating point format, like pretty much any CPU does, no problem. Besides, the fact that Excel writes out (very slightly) inaccurate numbers does not mean that the OOXML standard is flawed, only Excel's implementation thereof.

6) International, but US English first and foremost. Another complaint of Rodriguez is that the numbers get stored in US English locale format (1,234.56), and not the localised format (e.g. 1 234,56). Also, formulas always use English function names, like SUM. Rodriguez claims that this canonicalisation makes processing more complex. Wait, what? You want us to go store things differently depending on how some people would like to view them on the screen? You think a file will be easier to process when we introduce diversity in locales?

12) Document backwards compatibility subject to neutrino radioactivity. So, Excel 2007 cannot properly import graphs from earlier versions. What else is new? Anyway, I do not see how that can be regarded as a flaw in OOXML; the quality of the Microsoft Office suite is something quite different from the standardisation of the OOXML format.

Unrelated to Rodriguez's rant, there's one gem I would not want to keep from you. The folks at Google discovered [PDF link], buried in the over 6000 pages of the OOXML spec, 51 pages of stuff like the following:


If anyone dares to say these specs aren't bloated, point them to part 4, section 2.18.4, pages 1632–1682. I honestly don't know whether to laugh or cry.

If you like, you can have a look at the standard yourself. It is known as ECMA-376. The specs can be downloaded in either zipped PDF format or in a mysterious format called DOCX. For the latter, alas, no fully functional implementation appears to exist.

Wednesday, August 8, 2007

Blink

Shortly after my previous post, Trust you gut, I read about the book Blink: The Power of Thinking Without Thinking by Malcolm Gladwell. Both my conscious and my subconscious mind told me to purchase it. It took me only two days to finish the book.

Gladwell makes essentially the same point as I did in my previous post. In his words, much of our thinking happens behind a “locked door”: you can only sense the outcome, but not where it came from. If you try to figure out why you took that particular decision, only nonsense will come out. If you try to follow the reasoning while your subconscious is deciding, your conscious thoughts appear to interfere with it, and the subconscious is essentially disabled.

This subconscious reasoning is not magic. It relies on subtle clues that are too numerous for your conscious mind to process, and on tacit knowledge that comes from previous experience. For example, I am quite good at troubleshooting computer problems, but only if I'm there to witness them. If somebody tells me “my computer is doing this-and-this, do you know how to fix it?” I can give some general pointers, but it's only at the keyboard that I really get the insights. Not only the text of an error message is meaningful to me, but also its looks, its responses and the precise thing I was doing at the moment it popped up. It's not magic, it's simply lots of experience. And no, I will not fix your computer.

Of course Gladwell took much more time for his research than I did for my blog post, and he also gives examples of situations where you shouldn't trust what your subconscious tells you. For example, when you are in a dangerous situation, your mind goes into a state that psychologists call “arousal”. It shuts down parts of the brain that are deemed non-essential at that moment, including the one that recognises emotions on human faces. This explains why police officers shot a man because they thought he had a gun, while in fact he thought he was being robbed and reached for his wallet in mortal fear.

Whether you like it or not, a lot of our thinking happens subconsciously. I think that we are not aware of over 99% of our own thinking, our consciousness being just a flimsy layer on top of that. I pulled that number straight out of my arse, of course. But my arse may well be a lot smarter than you'd think …