The Typethinker: 2008

Sunday, December 14, 2008

XMonad with Ubuntu, dvorak, Pidgin and Skype

The last two days I have been playing around with XMonad, a tiling window manager for X, written in my favourite language Haskell. I now have a setup that is to my liking. It is intended for Ubuntu, using the dvorak keyboard layout, with key bindings that are a mixture of Gnome and XMonad default bindings.

The thing that I'm most proud of, however, is that I made a customized version of XMonad.Layout.IM that allows for multiple buddy lists. The buddy lists from Skype and Pidgin are placed on the right side of the F10 workspace, while chat windows are automatically placed in the remaining area.

I realize that the target audience (Ubuntu users, typing on dvorak, interested in alternative window managers, and using Pidgin and Skype) may be quite small, but in case my configuration file, or parts of it, are useful to somebody, I put it on the Haskell wiki.

This requires XMonad 0.8 or above. This version is in Ubuntu Intrepid. Ubuntu Hardy ships with an old version, so there you will need to install XMonad by hand.

Sunday, November 23, 2008

Full screen forms in .NET

Today I tried to make a form take up the whole screen (including the taskbar) in C# on the .NET platform. .NET WinForms does not offer functionality to do this, so I nosed around the web. All results that I found used platform-dependent P/Invoke calls to accomplish this goal. However, I wanted to keep my code platform-independent, so I created a pure, managed .NET solution.

First, we need some member variables to remember the window state, so we can come back out of full screen mode:

private FormWindowState m_previousWindowState;
private Bounds m_previousBounds;
private FormBorderStyle = m_previousBorderStyle;

The code to switch into full screen mode then becomes:

m_previousBorderStyle = FormBorderStyle;
m_previousWindowState = WindowState;
m_previousBounds = Bounds;
// Stay on top of everything else.
TopMost = true;
// Remove the window border.
FormBorderStyle = FormBorderStyle.None;
// We cannot change the Bounds of a maximized form.
WindowState = FormWindowState.Normal;
// Set the size of the form to the size of its screen.
Bounds = Screen.FromControl(this).Bounds;

The code to switch back simply restores all this:

TopMost = false;
FormBorderStyle = m_previousBorderStyle;
WindowState = m_previousWindowState;
Bounds = m_previousBounds;

It's really quite simple, and I don't know why people make it more complex than it should be.

Tuesday, November 18, 2008

gluLookAt documentation is wrong

Today I noticed a strange omission in the documentation for the GLU function gluLookAt. As the specification says, the function is designed to place the camera at a certain point in the scene, point it at a certain other point, and roll it such that a certain given vector points upward in the view.

First, the function computes a front vector, F, by subtracting the eye point from the centre point. This normalized version of this front vector is called f and is the vector that should be mapped to the −z axis. To find the side vector, the one that maps to the x axis, the cross product between the normalized front vector and the normalized up vector UP' is computed. Both vectors have unit length, but the user may have specified an up vector that is not perpendicular to the front vector, so the result s might not have unit length. If we used it like this, then the resulting matrix would include a scale component in the x direction, resulting in a scaled scene. Hence, the side vector s has to be normalized as well, which is exactly what the Mesa source code for gluLookAt does in line 134:

/* Side = forward x up */ cross(forward, up, side); normalize(side);

However, the gluLookAt documentation does not mention this! It says “s = f × UP'”, and follows by plugging this s straight into the resulting matrix M. If you implement the algorithm precisely as stated in that manual page, like I did, you will end up with an incorrect matrix.

Note that, after computing the side vector, the ‘official’ up vector is recomputed as the cross product between the side vector and the front vector. If you did it correctly and not follow the documentation literally, both have unit length. Since these are guaranteed to be perpendicular you should end up with a unit-length up vector that does not have to be normalized afterwards; and indeed the Mesa code does not do this. But if you did follow the documentation, then your up vector will also be wrong, resulting in a scene that is scaled in the y direction as well.

I am trying to get in touch with the OpenGL people about the problem in the documentation. I'm curious to see what will come of it.

Saturday, September 6, 2008

Blogger clients for Linux

Since Blogger is so annoying, I gave a couple of dedicated blogging clients a try. Here is a short summary of my findings. An important requirement for me is that the client runs on Linux (Windows is optional, but not required). WYSIWIG editing is not important to me, as long as there's some kind of preview feature. And there must be a way to post images.

BloGTK: A GTK program, written in Python. Last updated in 2005. Couldn't get it to work, obviously.
Drivel: Another Gnome program. It is not in the Ubuntu repositories and the most recent Ubuntu package is for Dapper, which is ancient and fails to work. Then I noticed that image upload is not supported, so I didn't bother trying to build from source.
Bleezer: A Java program, runnable from the web via Java Web Start. The password that you type into the program is shown in plain text on the screen so at that point I gave up. I do not want to use a program with such disregard for basic security and user interface conventions.
Gnome Blog: Sits in the way in your system tray. Not what I was looking for.
ScribeFire: A Firefox plugin. Ironically much better than all the standalone applications. For now this is my program of choice. The image upload does not work for me, however, so I filed a bug about that. ScribeFire seems to be pretty well maintained so I expect that they'll get back to me soon.

News flash: Blogger still sucks

Just need to rant again. Sorry. Some more things that are wrong with Blogger and have been around for ages:

When inserting an image, the <img> code is inserted at the top of the post, not in the place where the cursor is. You need to scroll all the way back up, cut the code, then find the place where you were typing in the annoyingly small box, and paste it there.
The following crap is Blogger's idea of good image code: <a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://…"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://…" border="0" alt=""id="…" /></a> The part where the cursor is specified and then immediately overridden is my favourite. I haven't looked at the “deselect gracefully” bit but I suspect it is equally awful. And don't these guys know that you can specify CSS in a separate file instead of repeating it on each and every element?
The preview does not use the actual CSS from the blog, so the formatting will come out all different from the preview.
When toggling to preview mode and back, you are thrown back to the beginning of the post, and you'll have to scroll back to the place where you were typing. Did I mention that that box is annoyingly small?

Tuesday, August 5, 2008

A picture on the title page in LaTeX

In LaTeX, there is by default no way to put a picture on the title page or cover page that is produced by \maketitle. Surprisingly, no package seemed to exist for this either. Until now, because I wrote it.

The package titlepic.sty, which can be downloaded from CTAN, is very simple and easy to use. Install it by putting it in your texmf tree and rehashing, or simply drop titlepic.sty in the same directory as your .tex source document. It works with the default document classes article, report and book.

Include it as normal, with \usepackage{titlepic}. Then, along with the usual \title, \author and \date, put a command like the following:
\titlepic{\includegraphics[width=\textwidth]{cover.jpg}} The argument to \titlepic will usually be an \includegraphics command, but it can actually be pretty much anything. The output produced by this argument will be typeset centered on the title page when you invoke \maketitle. (When you use the article document class, be sure to pass it the titlepage option, because articles do not have a title page by default.)

There are three optional arguments that control the vertical layout of the title page:

tt: Put both the title (and author, and date) and the picture at the top of the page, separated by a fixed amount of space.
tc: Put the title at the top of the page as with tt, but center the picture vertically on the page.
cc: Separate the title and the picture by a fixed amount of space, and center both together vertically on the page.

A full manual is also available. I hope this is of some use to someone. Enjoy!

Note: titlepic only works with the “standard” document classes article, report and book!. You may have some luck with other classes such as AMS, but no promises.

Sunday, June 29, 2008

LaTeX clever references

When you're referencing a section in LaTeX, you'd usually write something like
… as we saw in section \ref{sec:cake}.
But this is somewhat inconvenient. LaTeX knows that it's a section, right? So why the need to specify this? Worse, if you ever change it into a subsection, your reference will be wrong.

Luckily there's the command \autoref from the hyperref package. However, this too has some drawbacks, mainly that it does not provide a capitalised version. A better alternative is to use the package cleveref. It is not in the Ubuntu repositories, but you can simply download and extract the archive, then run latex cleveref.ins to obtain cleveref.sty and dump it in the directory along with your document.

Load the package with
\usepackage{cleveref}
and make sure it's the last package to load; that is, even laster that hyperref.

Using \cref the previous example becomes:
… as we saw in \cref{sec:cake}.
This will produce the text “… as we saw in section 3.”. At the start of a sentence you'd use the capitalised version \Cref:
\Cref{sec:cake} gives the recipe …
You can even write:
See also \cref{sec:cake,sec:lie,eq:recipe,thm:delicious}.
This produces “See also sections 2 and 3, eq. 5 and theorem 1.” Although I doubt that anyone would use this very often, it's still pretty cool.

You can customize the word that is printed before the number. For example, some people like them to be always capitalised. (I don't, but my supervisor does, and who am I to argue?)
\crefname{chapter}{Chapter}{Chapters} \crefname{section}{Section}{Sections} \crefname{subsection}{Section}{Sections} \crefname{subsubsection}{Section}{Sections} \crefname{figure}{Figure}{Figures} \crefname{table}{Table}{Tables}
Etcetera. Note that subsections and subsubsections are usually all referenced to as sections.

Finally, as a bonus, here's how you make it work with references to \subfloats from the subfig package:
\crefname{subfigure}{Figure}{Figures}
(Capitalise according to taste, or even write “subfigure” if you like.)

For more options, see the cleveref documentation.

Thursday, June 12, 2008

mplayer tricks

mplayer is my favourite movie player for Linux by far, but the thousands-of-lines long manual page can be a little daunting. Here I've compiled a list of some options that you may or may not know, that I find useful in day-to-day movie playing.

-cache 8192: Sets the cache (buffer) size to 8192 kilobytes. Useful when playing over a network connection that occasionally hiccups, like mine.
-fs: Start fullscreen playback right away. Especially useful when playing multiple movies in a single command, to prevent dropping back to windowed mode at each new movie.
-monitoraspect 16:10: If your monitor has non-sqare pixels (e.g. 1280 by 1024 on a 4:3 monitor), you can specify the physical aspect ratio of your monitor in this way.
-ao alsa: Use ALSA for sound playback. Should be the default in any decent distro, but in case it doesn't work, try this (or, heaven forbid, -ao oss).
-nolirc: Disable LIRC support. Gets rid of the annoying “Failed to open LIRC support.” message.
-subfont-text-scale 3: The default subtitle size is 5, but I prefer mine to be a little smaller.
-aspect 4:3: Overrides the aspect ratio of the movie. Some useful values are 4:3, 16:9 and 2.35.
-vf cropdetect: For movies with black bars around them. This detects the black bars in the image and prints out the correct argument to crop them off, such as-vf crop=656:288:0:0. One of mplayer's coolest features.

Options can also be put into a file ~/.mplayer/config if you leave out the initial - and replace spaces by = characters (e.g. monitoraspect=16:9). Saves a lot of typing!

Wednesday, June 11, 2008

Integrating Inkscape graphics in LaTeX

Getting good-looking diagrams and figures into a LaTeX document can be tricky. My favourite software (and I think it ought to be anyone's favourite) for drawing such figures is Inkscape. This post explains how to get text in the proper font into Inkscape, how to put equations into Inkscape drawings, and how to get those drawings out of Inkscape and into your LaTeX document.

It is a good idea to use the latest version of Inkscape, because the program is rapidly being improved all the time.

LaTeX font in Inkscape

To make your figure look good in its environment, you can use the same font family that the surrounding body text uses. In LaTeX's case: Computer Modern.

Installing fonts

As Computer Modern is written in the METAFONT format, it cannot be directly used in Inkscape. For that, we need the font in OpenType (OTF) format, preferred for Linux (and MacOS?) systems, or in TrueType (TTF) on Windows. (Side note: here's an interesting article on the differences between all the font formats.)

The BaKoMa font bundle provides the Computer Modern font in these and some more formats. Download it here, then extract to a temporary directory. Installation is as follows:

Windows: Open up Fonts in the Control Panel and drag-and-drop all files from the ttf directory into here.
Ubuntu Linux: Open a file browser (Nautilus) and navigate to fonts://, then drop the fonts from the otf directory here. You may need to run Nautilus as root using the command gksudo nautilus.
Non-Ubuntu Linux (and Ubuntu Hardy, because they broke it): Copy the files (as root) from the otf directory to anywhere you like inside /usr/share/fonts, then run sudo fc-cache -fv. (For a single-user installation, ~/.fonts might work, but no guarantees!)

Creating the figure

If you start Inkscape, new fonts with names like BKM-cmr10 should be available. Here, cm stands for Computer Modern, r means roman (normal body-text font) and 10 is the point size. Simply use this font for all the text in your illustrations to make them integrate seamlessly with the text in LaTeX.

Or, almost seamlessly. It seems that Inkscape (version 0.46) does something strange with the font size, or the BaKoMa fonts are too small to begin with. In any case, I find that using BKM-cmr10 at 12 points in Inkscape provides the best match to the default 10-point LaTeX body font. I personally prefer BKM-cmss10, the sans-serif version of Computer Modern, because it integrates nicely with abstract line drawings and with the surrounding serif body text, but if you use mathematics in you figures this is probably not an option.

LaTeX equations into Inkscape

Yes, it is possible to add mathematical symbols and equations to your Inkscape drawing! You can also use this for normal text, but it is more cumbersome than the font approach detailed above.

Installing textext

First, you need textext. Simply extract the two files from the archive into /usr/share/inkscape/extensions for a systemwide installation, and fix the permissions:
sudo chmod 644 textext.inx sudo chmod 755 textext.py
(For a single-user installation, ~/.inkscape/extensions should work.)

You'll need some extra packages for the script to work. On Ubuntu:
sudo apt-get install python-lxml sudo apt-get install pstoedit
On non-Debian based Linux distributions, install pstoedit and the Python lxml package in some other way. On Windows, see the textext web page for details.

Using textext

Now (re)start Inkscape, click Effects, Tex Text and type your LaTeX code! You can even load a preamble from a file to include additional packages such as amsmath. (Unfortunately the file must contain only the preamble, not an entire document.) The same problem occurs as with the previous approach: you need to set a scale factor of 1.25 to (approximately) match the font size of the LaTeX document. Close the dialog with the OK button or with Ctrl+Enter.

The equation (or other LaTeX text) is then placed as a group of shapes into Inkscape. To edit it (yes, that is possible!), select it and click Effects, Tex Text again. This feature is a little feeble, however: do not ungroup the text object, or else it will become uneditable.

From Inkscape to LaTeX

Exporting from Inkscape

When the figure is done, deselect all objects, then go to Document Properties and click Fit page to selection. This will adjust the page boundaries to fit exactly around all objects. Then save the figure to Inkscape SVG format (for later editing), but also save a copy as “PDF via Cairo”. Check the box to Convert text to paths, because otherwise the kerning seems to be messed up in the export.

Including in LaTeX

This is a simple matter of
\usepackage{graphicx}
in the preamble, and then placing the figure using
\includegraphics{diagram.pdf}
as usual. Do not use any of the scaling options of \includegraphics, since they will cause the text in the figure to scale as well, and it will no longer match the size of the surrounding body text.

Compile your LaTeX document using pdflatex (not normal latex, since that only handles inclusion of EPS files), and there you go!

Thursday, May 29, 2008

How to choose secure passwords

In response to the Debian ssh fiasco, I've decided to take a closer look at all my passwords and keys. There are six machines on which I regularly log in, probably about a dozen accounts on these in total, and of course around fifty web services that I have at some point registered with. All this has become quite a mess, and I'll try to clear it up as well as I can. This post may be the first in a series of hands-on security-related posts; but it may not.

Here's a recommendation on passwords that you hear often: “take a sentence that you remember easily, like a song lyric, then take the first letter of each word, and there's your password.” Works pretty well, as long as your sentence is long enough and you mix in some digits (like ‘4’ instead of ‘for’). However, if someone happens to know your taste in music (from, for example, Last.fm), a little patience and some brute force can still recover your password.

I think I can do better. I, too, start with a sentence, sometimes even a song lyric. But instead of replacing each word with its first letter, I replace it with something that, in my mind, is connected to that word, and is the first thing that comes to mind. If nothing comes to mind, I just take the first letter of the word.

For example, let's say my sentence is ‘Shall I compare thee to a summer's day’. ‘Shall’ reminds me of ‘shallow’ and therefore becomes ‘_’. ‘I’ remains ‘I’. ‘Compare’ reminds me of Perl's spaceship operator and thus becomes ‘<=>’. ‘Thee’ is the Dutch word for tea (although pronounced differently) and becomes ‘cU’ since that looks somewhat like a cup of tea. ‘to’ becomes ‘2’. ‘a’ remains ‘a’. ‘Summer's’ could be ‘^o's’ because in summer the sun (‘o’) is high (‘^’) in the sky. At ‘day’ I ran out of inspiration, so this becomes ‘d’, and I add a trailing ‘,’ for good measure. Hence a secure password, that I could remember (and, if necessary, reconstruct): ‘_I<=>cU2a^o'sd,’.

The reason why this is (hopefully) a little bit more secure than the naïve version is that it uses information that is only in my strange, illogical, twisted brain, and nowhere else. That, combined with a broad taste in music, should make brute force over song lyrics a lot tougher.

Sunday, April 6, 2008

Getting images out of Word

If you've ever tried to extract a picture from a Word document, you'll know what I mean. Copying and pasting into an image editor does not work: images come out scaled, depending on their size on the page in the Word document, and the colours are horribly mutilated.

But since Word is able to scale and display the picture properly, the data must be there. And it is possible to extract it. How? The following steps work in Word 2003, and allegedly in Word 2000 as well.

Click File, Save as Web Page….
Choose Web Page for the Save as type.
Save it anywhere you like.
Look in the folder DocumentName_files, and voila! Next to the JPGs that are actually used in the HTML, you'll find the original images (PNG, JPG or whatever), at the original resolution!

Credits go to the people in this forum thread that I found through Google. If you can't get it to work as stated above, the thread mentions some more things you can try.

Tuesday, April 1, 2008

New accessibility feature added to Windows

In response alternative operating systems like OS X and Linux becoming increasingly user-friendly, Microsoft has announced a patch for Windows to greatly improve the usability of Windows. Yesterday, a screenshot was published that shows some small but significant changes to the Windows login screen:

The new Windows feature, available as a patch for Windows 2000, XP and Vista, was developed in cooperation with the Microsoft Office team. A stripped-down version of the Office spell checker is included, so Office itself is not required to install or use the patch.

By default, the spell checker's dictionary only contains the password of the specified user account, but the list can be expanded to include others' passwords as well as dictionary words of your local Windows language. An optional feature (enabled by default) is to correct automatically for incorrect capitalization, thereby avoiding the annoying “Caps Lock is on” message. Some more features can be configured through the Group Policy editor, most notably password auto-completion.

“This feature is a great step forward in the accessibility of Microsoft Windows,” a Microsoft spokesman said. “Imagine users with a physical handicap trying to type in their well-chosen seventeen-character passwords, containing eight special characters, two of which are in the Supplementary Multilingual Plane of Unicode. This new feature will get them started with Microsoft Windows faster than ever before!”

To a voicing of security concerns, Microsoft responded: “The new feature actually makes Windows more secure. In the past, people used to write down their passwords on Post-it notes stuck to their monitor, where anyone could read them. Now they can just rely on the spell checker to get their passwords right.” Passwords are never shown on the screen, and the dictionary is stored in a file only readable by Administrator users.

All Windows 2000, XP and Vista machines with Windows Update enabled will receive the patch automatically this Tuesday.

Tuesday, February 26, 2008

Visual Studio/C++: Requested the Runtime to terminate?

The Visual Studio/Visual C++ woes continue. This time, my C++ application gave the message “This application has requested the Runtime to terminate it in an unusual way.” and my application died. This didn't happen on my development machine, but on a different machine on which I ran a Release build.

Googling gave me scary stuff, and that I'd need a hotfix that was not publicly available—you need to contact Microsoft to get it, so I did, and received the hotfix within the hour, but didn't need to install it anymore.

Turned out that the problem was something quite different. My application threw an exception, because a file was not found on the other machine. The error message actually meant to say: “This application threw an unhandled exception.” Thank you, Microsoft.

So, if you ever receive this message, before you look any further: make sure you catch every exception you throw.

Friday, February 22, 2008

Cloudo

I just heard of Cloudo, an “internet operating system”. (Although this name is, strictly speaking, not incorrect, I'll call it a “web operating system” for clarity's sake.) There is no public beta of Cloudo yet, but have a look at this article to get the idea. It really looks much like a desktop OS (OS X, actually), except that it runs in your browser.

Admittedly, a decent web OS would be wonderful: to have your own environment, your own documents, your own applications, wherever you are. But Cloudo can impossibly be that OS. Why? Because it consists, like every other web application, of a steaming, smelly pile of HTML and JavaScript.

Oh, doubtless it is possible to write an OS in HTML and JavaScript. People wrote Lemmings in this kind of stuff. But like is also possible to build a house out of matchsticks and ducktape, that doesn't necessarily make it a good idea.

HTML was never intended for interactivity. JavaScript originated as a gimmick. Both are now being stretched, and stretched, and stretched even further… until they eventually break. Already it seems that even Moore's law has not been able to keep up with the growing complexity of JavaScript in the wild, even combined with ever faster JavaScript interpreters—web sites are getting slower, not faster^{[citation needed]}. And using a markup language, instead of a proper UI language or toolkit, makes webapps feel more like interactive web pages than like real applications. I don't want to select the text in the title bar, people.

I'll be the first to admit that web applications have many advantages. They don't need installation, are available from everywhere, keep your data accessible from everywhere, don't need any maintenance, etcetera. But we are in desperate need for new technology for this. JavaScript is the C of the web—obiquitous, portable, but a huge and ever growing pain in the ass. Should we continue to build on that?

Also, Cloudo currently lacks a web browser. Nuff said.

Friday, February 15, 2008

Fun with aspell word lists

The GPL spell checking program aspell has support for many languages, including my native language, Dutch. Since it's open source, so are the dictionaries, which means that we should be able to extract a word list. And complete word lists for a language are simply fun to play with.

Unfortunately aspell is too advanced to use a plain text word list. But there is a way to dump it:

aspell dump master

This will print the entire word list for your default language. You can specify the language used with -l:

aspell -l nl dump master

The argument to -l is the ISO 639 language code (see man aspell for details). The argument master tells aspell to use the systemwide dictionary, not your personal wordlist. The dictionary must be installed on your system; on Ubuntu the Dutch language package is called aspell-nl.

When we run aspell dump master for Dutch we get something unexpected:

blaat/MWPG
bloeit/KU
bloot/G
blootte
blote/N

There are strange tags attached to the end of many words. These are affixes and they represent variations of that word. (Although there is an English affix file, no affixes tags are printed if we dump an English dictionary.) We can expand the affix tags into all possible variations by sending them through aspell expand:

aspell -l nl dump master | aspell -l nl expand

That's better:

blaat geblaat blaatten blaatten blaatte blaten
bloeit opbloeit uitbloeit
bloot gebloot
blootte
blote bloten

If we now pipe this through tr we get all variations on separate lines as well. Thus the final command to get a word list for any aspell-supported language becomes:

aspell -l nl dump master | aspell -l nl expand | tr ' ' '\n'

(Note that this breaks for words that originally contained spaces. The Dutch word list does not have these, though.)

Then I got interested in how these affixes work. Take blaat (to bleat) for example; it is followed by M, W, P and G. Looking in the affix definition file /usr/lib/aspell/nl_affix.dat there are some lines that define the meaning of these characters:

SFX M N 13
SFX M 0 ben b
SFX M 0 den d
...
SFX M 0 ten t
SFX M z zen z

SFX W N 7
SFX W 0 t [^t]
SFX W 0 te [kfstp]
SFX W 0 ten [kfstp]
SFX W 0 te ch
SFX W 0 ten ch
SFX W 0 de [^kfstp]
SFX W 0 den [^kfstp]

SFX P N 34
SFX P ad den aad
SFX P af fen aaf
...
SFX P at ten aat
...

PFX G Y 1
PFX G 0 ge .

So what does this all mean? SFX and PFX stand for suffix (ending) and prefix (beginning). The first line of each block gives the number of lines in the rest of the block; the Y or N before the number indicate whether the suffix may be combined with prefixes or vice versa.

The line SFX M 0 ten t simply creates the plural past form blaatten, by appending -ten if the original word ends in t. We also see suffixes for other cases in which consonant doubling is required.

The next suffix SFX W is more interesting. This one takes care of various conjugations, including the past tense and the past participle (‘voltooid deelwoord’). We see that -te is appended when the word ends in k, f, s, t or p, and -de otherwise. Any Dutch person will immediately recognize this as the dreaded ‘kofschipregel’ that is the cause of so many spelling errors. In this case it gives rise to the word forms blaatte and blaatten.

The suffix SFX P at ten aat takes care of the infinitive. Note that the double a has been replaced by a single one; the pronunciation remains identical. (Dutch works in mysterious ways…) This replacement is done by the third field on the line, that indicates the text to strip off the end of the word; so far, it has been 0, which means to strip off nothing. (The SFX M z is the exception; as far as I can tell, it is not used anywhere in the aspell dictionary.) We take blaat, which matches -aat, so we strip off -at and stick -ten in its place, resulting in blaten.

Finally, we have a prefix rule PFX G, which sticks ge- before anything, leading to the form geblaat (bleating, as in “the bleating of the sheep”).

The file nl_affix.dat also contains a list of general replace rules (for example, replacing g by ch and vice versa) and specific ones (kado by cadeau). These rules are used when suggesting possible corrections for a misspelled word.

So where was I? Oh yeah, building a word list. To play with. For fun.

Saturday, February 9, 2008

I ate a skunk for lunch

Faithful xkcd readers will know about the sudden change in Google results caused by the Dangers comic. The number of hits for "died in a blogging accident" has since risen from 2 to over 32,000.

The results were acquired simply by entering the corresponding Google query. This has one disadvantage: you have to know in advance which type of accident you're looking for.

I wrote a little Perl script to overcome this. It Googles for "died in a" "accident" and parses the first 1000 results (unfortunately, Google refuses to give more than that). The number of occurrences of “died in a _ accident”, with exactly one word in place of the _, is counted, and the results are charted using the Google Chart API.

So now we can see which accidents are really most common:

It appears that blogging is far more dangerous than I always thought!

The number of words we want in the results can be modified (e.g. “give me all results consisting of one, two or three words”), and they do not have to be sandwiched between two known phrases, but can also come before or after a certain phrase, as in “_ is an idiot” (which obviously requires that we specify a fixed number of words to take). There is also an ignore list to get rid of meaningless matches like “he” and “bush”.

The script can also tell us what people eat:

Fair enough. But:

A rat? A skunk? A hippopotamus?!

Unfortunately the script does not work as well as I hoped. The results often do contain both parts of the query, but on different parts of the page. These hits get in the way of the useful results, and because we're limited to 1000 results we cannot dig any deeper to find them.

The script is full of known and unknown bugs, not in the least because it's my first nontrivial Perl script, but I put it online for your enjoyment anyway. It is called accident.pl and requires Perl (obviously), WWW::Mechanize and URI. Run it without arguments to get a brief help text. Let me know what results you come up with!

Friday, February 8, 2008

Add three inches in a week!

For over a year, I've been the happy owner of a Dell 2407WFP 24" widescreen monitor:

But as of late, there has been a little problem with it:

It's as if there is a strip loose between the backlight and the LCD. Not a big problem, but slightly annoying.

So last Monday I mailed Dell customer support. I got a prompt response, recommending me to try resetting the monitor by holding down the power button for 20 seconds. Tried that; didn't help; mailed back. Tuesday, I got the reply: they were sending a replacement through UPS, to be arriving the next day. Wednesday morning: phone call from UPS. From Lithuania, of all places. Telling me in fairly decent English that the replacement was not in stock, and that I should contact Dell for an alternative, so I mailed them again. On Thursday I got a call from Germany. Broken English, not entirely clear on the purpose of the call, but apparently verifying my address. Oh well.

Now this Friday morning, I woke up to the sound of the doorbell. A man from UPS with a replacement monitor. I was surprised, since I hadn't heard anything from Dell in the meantime. Being careful, I wanted to see the replacement in action before the UPS guy left with my old one. Unpacked the monitor.

First thing I noticed: brushed metal, not black like my old one. Checked the model number on the back: 2707WFP. — Wait… 27?! And indeed, holding my old panel up against it, the new one was three inches larger. It worked right out of the box and had no dead pixels or other artifacts, so I let the UPS guy take my old screen away (sniff). He didn't even want the booklets or cables or cd-rom, which was lucky for me since I'd have a hell of a time finding all that stuff again.

Thinking that this replacement was too good to be true, I e-mailed Dell once more to verify that it was indeed correct and intended to be permanent. Got a confirmation within the hour.

Now that is what I call Customer Care.