Tuesday, February 26, 2008

Visual Studio/C++: Requested the Runtime to terminate?

The Visual Studio/Visual C++ woes continue. This time, my C++ application gave the message “This application has requested the Runtime to terminate it in an unusual way.” and my application died. This didn't happen on my development machine, but on a different machine on which I ran a Release build.

Googling gave me scary stuff, and that I'd need a hotfix that was not publicly available—you need to contact Microsoft to get it, so I did, and received the hotfix within the hour, but didn't need to install it anymore.

Turned out that the problem was something quite different. My application threw an exception, because a file was not found on the other machine. The error message actually meant to say: “This application threw an unhandled exception.” Thank you, Microsoft.

So, if you ever receive this message, before you look any further: make sure you catch every exception you throw.

Friday, February 22, 2008

Cloudo

I just heard of Cloudo, an “internet operating system”. (Although this name is, strictly speaking, not incorrect, I'll call it a “web operating system” for clarity's sake.) There is no public beta of Cloudo yet, but have a look at this article to get the idea. It really looks much like a desktop OS (OS X, actually), except that it runs in your browser.

Admittedly, a decent web OS would be wonderful: to have your own environment, your own documents, your own applications, wherever you are. But Cloudo can impossibly be that OS. Why? Because it consists, like every other web application, of a steaming, smelly pile of HTML and JavaScript.

Oh, doubtless it is possible to write an OS in HTML and JavaScript. People wrote Lemmings in this kind of stuff. But like is also possible to build a house out of matchsticks and ducktape, that doesn't necessarily make it a good idea.

HTML was never intended for interactivity. JavaScript originated as a gimmick. Both are now being stretched, and stretched, and stretched even further… until they eventually break. Already it seems that even Moore's law has not been able to keep up with the growing complexity of JavaScript in the wild, even combined with ever faster JavaScript interpreters—web sites are getting slower, not faster[citation needed]. And using a markup language, instead of a proper UI language or toolkit, makes webapps feel more like interactive web pages than like real applications. I don't want to select the text in the title bar, people.

I'll be the first to admit that web applications have many advantages. They don't need installation, are available from everywhere, keep your data accessible from everywhere, don't need any maintenance, etcetera. But we are in desperate need for new technology for this. JavaScript is the C of the web—obiquitous, portable, but a huge and ever growing pain in the ass. Should we continue to build on that?

Also, Cloudo currently lacks a web browser. Nuff said.

Friday, February 15, 2008

Fun with aspell word lists

The GPL spell checking program aspell has support for many languages, including my native language, Dutch. Since it's open source, so are the dictionaries, which means that we should be able to extract a word list. And complete word lists for a language are simply fun to play with.

Unfortunately aspell is too advanced to use a plain text word list. But there is a way to dump it:

aspell dump master

This will print the entire word list for your default language. You can specify the language used with -l:

aspell -l nl dump master

The argument to -l is the ISO 639 language code (see man aspell for details). The argument master tells aspell to use the systemwide dictionary, not your personal wordlist. The dictionary must be installed on your system; on Ubuntu the Dutch language package is called aspell-nl.

When we run aspell dump master for Dutch we get something unexpected:

blaat/MWPG
bloeit/KU
bloot/G
blootte
blote/N

There are strange tags attached to the end of many words. These are affixes and they represent variations of that word. (Although there is an English affix file, no affixes tags are printed if we dump an English dictionary.) We can expand the affix tags into all possible variations by sending them through aspell expand:

aspell -l nl dump master | aspell -l nl expand

That's better:

blaat geblaat blaatten blaatten blaatte blaten
bloeit opbloeit uitbloeit
bloot gebloot
blootte
blote bloten

If we now pipe this through tr we get all variations on separate lines as well. Thus the final command to get a word list for any aspell-supported language becomes:

aspell -l nl dump master | aspell -l nl expand | tr ' ' '\n'

(Note that this breaks for words that originally contained spaces. The Dutch word list does not have these, though.)

Then I got interested in how these affixes work. Take blaat (to bleat) for example; it is followed by M, W, P and G. Looking in the affix definition file /usr/lib/aspell/nl_affix.dat there are some lines that define the meaning of these characters:

SFX M N 13
SFX M 0 ben b
SFX M 0 den d
...
SFX M 0 ten t
SFX M z zen z

SFX W N 7
SFX W 0 t [^t]
SFX W 0 te [kfstp]
SFX W 0 ten [kfstp]
SFX W 0 te ch
SFX W 0 ten ch
SFX W 0 de [^kfstp]
SFX W 0 den [^kfstp]

SFX P N 34
SFX P ad den aad
SFX P af fen aaf
...
SFX P at ten aat
...

PFX G Y 1
PFX G 0 ge .

So what does this all mean? SFX and PFX stand for suffix (ending) and prefix (beginning). The first line of each block gives the number of lines in the rest of the block; the Y or N before the number indicate whether the suffix may be combined with prefixes or vice versa.

The line SFX M 0 ten t simply creates the plural past form blaatten, by appending -ten if the original word ends in t. We also see suffixes for other cases in which consonant doubling is required.

The next suffix SFX W is more interesting. This one takes care of various conjugations, including the past tense and the past participle (‘voltooid deelwoord’). We see that -te is appended when the word ends in k, f, s, t or p, and -de otherwise. Any Dutch person will immediately recognize this as the dreaded ‘kofschipregel’ that is the cause of so many spelling errors. In this case it gives rise to the word forms blaatte and blaatten.

The suffix SFX P at ten aat takes care of the infinitive. Note that the double a has been replaced by a single one; the pronunciation remains identical. (Dutch works in mysterious ways…) This replacement is done by the third field on the line, that indicates the text to strip off the end of the word; so far, it has been 0, which means to strip off nothing. (The SFX M z is the exception; as far as I can tell, it is not used anywhere in the aspell dictionary.) We take blaat, which matches -aat, so we strip off -at and stick -ten in its place, resulting in blaten.

Finally, we have a prefix rule PFX G, which sticks ge- before anything, leading to the form geblaat (bleating, as in “the bleating of the sheep”).

The file nl_affix.dat also contains a list of general replace rules (for example, replacing g by ch and vice versa) and specific ones (kado by cadeau). These rules are used when suggesting possible corrections for a misspelled word.

So where was I? Oh yeah, building a word list. To play with. For fun.

Saturday, February 9, 2008

I ate a skunk for lunch

Faithful xkcd readers will know about the sudden change in Google results caused by the Dangers comic. The number of hits for "died in a blogging accident" has since risen from 2 to over 32,000.

The results were acquired simply by entering the corresponding Google query. This has one disadvantage: you have to know in advance which type of accident you're looking for.

I wrote a little Perl script to overcome this. It Googles for "died in a" "accident" and parses the first 1000 results (unfortunately, Google refuses to give more than that). The number of occurrences of “died in a _ accident”, with exactly one word in place of the _, is counted, and the results are charted using the Google Chart API.

So now we can see which accidents are really most common:

It appears that blogging is far more dangerous than I always thought!

The number of words we want in the results can be modified (e.g. “give me all results consisting of one, two or three words”), and they do not have to be sandwiched between two known phrases, but can also come before or after a certain phrase, as in “_ is an idiot” (which obviously requires that we specify a fixed number of words to take). There is also an ignore list to get rid of meaningless matches like “he” and “bush”.

The script can also tell us what people eat:

Fair enough. But:

A rat? A skunk? A hippopotamus?!

Unfortunately the script does not work as well as I hoped. The results often do contain both parts of the query, but on different parts of the page. These hits get in the way of the useful results, and because we're limited to 1000 results we cannot dig any deeper to find them.

The script is full of known and unknown bugs, not in the least because it's my first nontrivial Perl script, but I put it online for your enjoyment anyway. It is called accident.pl and requires Perl (obviously), WWW::Mechanize and URI. Run it without arguments to get a brief help text. Let me know what results you come up with!

Friday, February 8, 2008

Add three inches in a week!

For over a year, I've been the happy owner of a Dell 2407WFP 24" widescreen monitor:

Dell 2407WFP

But as of late, there has been a little problem with it:

It's as if there is a strip loose between the backlight and the LCD. Not a big problem, but slightly annoying.

So last Monday I mailed Dell customer support. I got a prompt response, recommending me to try resetting the monitor by holding down the power button for 20 seconds. Tried that; didn't help; mailed back. Tuesday, I got the reply: they were sending a replacement through UPS, to be arriving the next day. Wednesday morning: phone call from UPS. From Lithuania, of all places. Telling me in fairly decent English that the replacement was not in stock, and that I should contact Dell for an alternative, so I mailed them again. On Thursday I got a call from Germany. Broken English, not entirely clear on the purpose of the call, but apparently verifying my address. Oh well.

Now this Friday morning, I woke up to the sound of the doorbell. A man from UPS with a replacement monitor. I was surprised, since I hadn't heard anything from Dell in the meantime. Being careful, I wanted to see the replacement in action before the UPS guy left with my old one. Unpacked the monitor.

Dell 2707WFP

First thing I noticed: brushed metal, not black like my old one. Checked the model number on the back: 2707WFP. — Wait… 27?! And indeed, holding my old panel up against it, the new one was three inches larger. It worked right out of the box and had no dead pixels or other artifacts, so I let the UPS guy take my old screen away (sniff). He didn't even want the booklets or cables or cd-rom, which was lucky for me since I'd have a hell of a time finding all that stuff again.

Thinking that this replacement was too good to be true, I e-mailed Dell once more to verify that it was indeed correct and intended to be permanent. Got a confirmation within the hour.

Now that is what I call Customer Care.