The Typethinker

Monday, March 16, 2015

QR codes on the BBC Micro, Epilogue: KryoFlux to the rescue

In which I once more revel in nostalgia and still fiddle with bits, but do not talk about QR codes at all.

So with the BBC Micro quite dead, but my floppy disks and possibly the disk drive still intact, I looked around on the internet what other options I had. I found a few products which promised to control old floppy drives and convert the result to something more modern. In the end, I decided to buy the KryoFlux package, mostly because of its extensive documentation and active support forums. The KryoFlux is a small circuit board with a floppy drive cable connector on one side and a USB connector on the other.

But I also needed another disk drive, because the Beeb's drive didn't have the right plug. Or so I thought. So I found a list of drives known to work with the KryoFlux and searched for them on eBay. These turned out to be really easy to find, and on the second auction I was the highest bidder at the shocking price of €16,50 (including shipping).

A few weeks later, both the KryoFlux and the disk drive were sitting on my desk, ready to be hooked up. I followed the instructions to connect everything up, and installed the somewhat arcane command line tool. Then I typed dtc -c2 to detect how many tracks the drive can access. The disk drive made a noise, its head moved around, and eventually the tool printed maxtrack=83. That's quite a common number, so everything seemed to work so far.

Fortunately, when my dad brought the BBC Micro over, he also included all of his old floppy disks. They mostly contained teaching material that nobody cared about anymore, so I had plenty of disks to experiment with and possibly ruin. I popped one of these into the drive and figured out the command to make a raw image.

The KryoFlux was explicitly designed for software archival and forensics, and is able to read floppy disks at a much lower level than most hardware. What you get out of it isn't bytes, it isn't even bits, but it's the raw magnetic fluxes coming directly from the disk surface. This is great, because even if you can't salvage all of your data right away, you can still save off everything that's on the disk before it degrades even more, and do your rescue operation offline. It's also possible to decode the data on the fly, so that unreadable sectors can be retried automatically.

And it comes with a nice GUI tool to visualize the spectrum coming off the disk:

In the process, I learned a thing or two about how floppy disks work. You probably know that the disk surface contains magnetic patterns. The data is arranged in 80 (or 84) tracks, which are laid out as concentric rings, track 0 on the outside, track 79 (or 83) on the inside. The disk drive positions the head above the track it wants to read, spins around the disk at 300 rpm, and the resulting fluctuations in the magnetic field result in an electrical current in the drive's head. Essentially, the output of the floppy drive is just an analog waveform, and it's up to the controller to make sense of it.

There are different ways to encode data in such a wave. The disks I was looking at use FM, which – indeed – is Frequency Modulation. It's really simple: one frequency is used to encode a zero, another frequency is used to encode a one. To allow the drive to figure out where one bit ends and the next starts, clock bits are inserted into the stream: each other bit is a clock bit, and they are all set to 1. So, for example, the bit sequence 0110 would be encoded as 10111110. (The problem with this is that you might lose synchronization if you get a long run of just 1s. That's why MFM was invented.)

Now you can probably make sense of the spectrum shown in the screenshot above. The 4.0 µs band shows the zeros, the 8.0 µs band shows the ones. When you zoom in, you can even somewhat make out the individual bits:

Fun fact: The magnetic flux was sometimes also used to write special copy protection patterns to the disk, which couldn't be reproduced by normal byte-based disk copy tools. This article shows a gorgeous example near the end.

So with the help of the KryoFlux tool, I was able to create a series of files (one per track) that contained these raw waves straight off the disk. But the tool doesn't stop there; it also lets you decode the FM (or other) format into bytes, in a proper disk image. And it gives you some indication of how well it succeeded.

As it turned out, not very well. Decoding the FM stream would consistently get me for the first dozen or so tracks on the disk. From there on out, it got slightly better, with the occasional track showing OK*, and near the end of the disk all tracks were supposedly OK*. Except they weren't. When running strings on the resulting disk image, I got stuff like this:

This test consists`of two perts. Part One is about the works you have read
in class with yo}r teakher. Part Two is about the books you have read on
your own. You only have to answer questionw0

It got better near the end of the disk, but still not good enough. To preserve actually working software, we need every last bit to be correct.

I tried with two more disks, including some that did work in the BBC Micro only weeks before, but both showed the same pattern. I suspected that the drive I bought on eBay might be faulty. Before buying yet another drive, I asked on the support forums if anyone had a clue what might be wrong, but didn't get much useful info.

I left the project alone for a few weeks, and eventually decided to modify the BBC Micro's original drive so I could connect it to the KryoFlux. The data cable looked like a regular old flatcable, but the power connector was all weird. I opened up the drive case so I could see how it was wired up, but lo and behold! On the inside, the weird power cable ended in a standard molex plug that connected to the drive!

Encouraged, I went through the setup procedure again with this drive. Would it still work, or did it break along with the Beeb itself? I connected the data cable, and nothing interesting happened. I connected the power cable, and no smoke came out. I sniffed the drive, but it didn't emit that typical fried-electronics smell. So far so good.

Then I ran the KryoFlux calibration tool again. The drive head moved! It seemed to bump up against something near the end, but the tool still reported maxtrack=83. Oh well, it's probably fine. Let's read a disk.

And yes! Yes! The tool printed a slew of OKs! Let's look at the image content:

This test consists of two parts. Part One is about the works you have read
in class with your teacher. Part Two is about the books you have read on
your own. You only have to answer questions about the books you have read.

Now that is proper English!

Would the entire disk be okay? In particular, would we be able to read the catalogue that stores where files reside on the disk? To check that, we have DFS Explorer. And it looks like this:

Oooh yeah. Actual files! Readable ones, too. Just to make sure that everything was working properly, I made another image of the same disk and compared the two. Some bad sectors seemed to throw it off, but the good sectors did not seem to have any differences.

I could also load the disk into the emulater BeebEm, but that seemed to lack the Wordwise word processing ROM, so I couldn't view any of the files. Time to try with a disk containing actual programs; mostly games I used to play (but didn't write).

Track 0: OK. Track 1: OK. Track 2: OK. And so on… this disk seemed absolutely fine! Excited, I loaded it into BeebEm, and got a scary message: "WARNING - Incorrect disc type selected? This disc file looks like a double sided disc image. Check files before copying them." Indeed, I could get a disk catalogue but everything I tried to load gave me Bad program.

Eventually, I figured out that naming it spel.dsd (for double sided disk, as opposed to spel.ssd or spel.img) fixed things. I was in business! Look at all the cool stuff on this disk!

The CATALOG:

The exciting game MUNCHING MOUSE, where you run from left to right in the shortest time possible, collecting the cheese:

The even more exciting game BAT'N'BALL, which… well, you guessed it:

The actually somewhat interesting YELLOW RIVER KINGDOM, where you assign your villagers to (a) defend the dyke to prevent flooding, (b) work in the fields to grow rice, or (c) protect the villages from thieves:

And on the other side of the disk, I rediscovered one of my childhood classics: CASTLE OF NIGHTMARES. Damn, this game is hard. I can't seem to be able to record uncorrupted video with BeebEm, but here's the music:

Hmm, maybe a bit repetitive? How about this great tune instead? The soundtrack of my childhood!

Another great program I recovered was "Speech!", a wonderfully retro speech synthesizer that sounds like this:

Now I really want to use that in a game some day.

Speaking of games, what about this '80s version of Kerbal Space Program?

I once wrote a PC clone of that in QBasic when I was 13. And a sequel to that clone. And a mouse-driven GUI level editor for the sequel.

There were also a few files I couldn't readily identify. They were neither text files nor BASIC programs. Looking at the hex dump, I noticed there were some repeating patterns: a streak of 0xff bytes, then some other bytes, then another streak of 0xff bytes, and so on. Could these files be… images? And how would you store an image on the BBC Micro, if not by writing the contents of the video memory directly to disk? We can reverse that with a simple *LOAD command!

Copyright © yours truly, year unknown. Notice how IKON, the drawing program, didn't even bother to remove its own UI elements from the saved file.

So far, I'd only converted compilation disks, with programs probably typed in from magazines and such. Most of the old disks were still 100% readable, and although some had bad sectors, even they were largely still usable. Eventually, this gave me the courage to put in a disk with my own programs. Apparently, I once started work on a Breakout clone, imaginatively named BREAKOUT. The ball bounces and the paddle moves, but blocks you hit don't disappear and the game crashes when you die.

There was also a Mastermind game that I was sure I couldn't have written, until I encountered my name in the comments. Go me for commenting stuff! But it appeared to give me contradictory clues, so maybe that wasn't finished either.

I even dabbled in cryptography! Can you figure out the algorithm?

And apparently I had some database program cunningly disguised as MARIO, which was password-protected; look at how I obfuscated the password (didImentionthatthingaboutspacesbeingoptional?):

170IFLEFT$(T$,9)=CHR$(74)+CHR$(5*13)+CHR$(79-1)+CHR$(68)+CHR$(80-1)+CHR$(59+10)+CHR$(70-2)+CHR$(70-1)+CHR$(76)THENPRINTTAB(12,13)"Wachtwoord correct":T=TIME:REPEATUNTILTIME-T>200:GOTO240

Never mind that if somebody could read this code, they could also just replace it by GOTO240 and have instant access – but it might have kept out non-programmers who could still type LIST (i.e. my brother). I don't think I ever even put any data into this…

I did have some trouble with bad sectors during the salvage operation, but eventually, bumping it up to a hundred retries or so, every last bit came off of the disks I cared about. Even then, one disk gave me trouble: ADFS Explorer would read it just fine, but the BeebEm emulator reported Bad directory instead of giving me a listing. Diving into the internals of the ADFS filesystem showed why: the directory's sequence counter is written to disk twice, once in the directory header and once in the footer, and the two must match. A one-byte change in a hex editor later, and this last disk image too was perfectly usable. I don't know if it was bitrot, or if the disk had been corrupted previously during a write operation.

I'd like to put all my old stuff up on GitHub at some point, but the disk images contain some private information (e.g. address stickers, letters I typed up for other people) so I'll have to curate them first. But the bits are safe now, which is what counts!

Friday, January 23, 2015

QR codes on the BBC Micro, Part 6 of 6: An unexpected ending

In which I convince myself that this plan would have worked.

So, with all the code written, and a pixel-perfect QR code on the screen, the question was once again: will it scan?

Disappointment. The ZXing Android app wouldn't recognize the code, even though it had little problems with its twin on the flat TFT screen of my PC.

Since the pixels were, as far as I could tell, identical, I figured this must be due to image quality. The code was displayed on an old fish-bowl style CRT monitor, which means it's bulging a little. Perhaps this was throwing off alignment of the reader. Another hypothesis was blooming: white pixels tend to “glow” around the edges, possibly bleeding into black pixels and making them be registered as white by the reader.

I tried to remedy both of these by doing some postprocessing. I started by taking a photo of the screen using my Canon PowerShot S90 digital camera, which is a tad better than the camera in my Nexus 5 phone. Then I opened that in Gimp, sharpened it, and applied the Lens Distortion filter to make a mostly straight-looking copy.

Then I tried reading from this image using Zbar, a command line barcode reader. It wouldn't recognize it. I also tried libdecodeqr following this AskUbuntu question, but no dice. Time to hack up my own.

Based on previous success with the app, I grabbed the ZXing source code from GitHub and wrote a Java program to load the PNG file and feed it to the library. The idea was that this would let me debug things and figure out why it wasn't recognizing the QR code.

The first problem was that ZXing was throwing a NotFoundException without a stack trace. It turns out that the library is optimized for mobile, memory-constrained devices; there is only one instance of the exception that gets reused every time, and stack traces are suppressed. This was easily fixed by hacking up the library code. Now at least I got an answer: it was failing to detect the alignment patterns, the three big squares on the corners.

Diving into the code, I added some print statements to output the location of potential alignment patterns detected in the image. It found four: the three correct ones, and a spurious one in the bottom right corner. This candidate was chosen as the correct one for reasons that eluded me, but I worked around it by removing any candidate found in the bottom right quadrant of the image. Remember, I wasn't trying to build a general-purpose QR code reader; I just wanted to see if this particular code could be scanned before I put all my eggs into its basket.

With the correct alignment patterns, the library got a lot further: it correctly read the version and format info, and subsequently threw a ChecksumException. I printed the data bytes that it had read, expecting mostly ff bytes with the occasional error. Instead, I got this sort of garbage:

99 99 00 66 85 00 00 00 60 00 7f fd 00 00 9d 1e 40 ff 1b 00 49 00 02 ff ff 00 00 02 60 40 ff 00 f7 ff 00 4f 00 00 00 ff 99 00 f6 00 00 00 01 00 00 9a 00 00 00 00 9d 00 ff 37 ff 00 ff ff 00 ff 00 ff 00 ff ff 00 ff 40 ff 40 ff ff 00 ff ..

Clearly, no amount of error correction could have fixed that mess. I figured that, in order to make this thing robust and reliable, I would have to resort to larger pixels: representing one module in the QR code as 2×2 pixels on the screen of the Beeb. Doing the math, I worked out that I could use at most a Version 24 QR code, able to contain slightly over 1 kB of data. Not as much as the 3 kB of the Version 40, but still good enough.

Unfortunately, I had been writing much of my code under the assumption that the version would not need to be changed, so there were hard-coded numbers all over the place. I turned back to the BBC Micro and started replacing them by variables. First, with the correct values for Version 40 so I could verify that the output didn't change; then I would replace them by the appropriate values for a Version 24 code, and change the pixel-plotting routine to output 2×2 pixels instead. I was halfway through this process when disaster struck.

When saving the program to tape as a matter of routine, the computer unexpectedly printed:

>SAVE"QR"
RECORD then RETURN
QR 0A
Index
>

“Index”? Excuse me? I rewound the tape, wanting to try again, but found that the machine was no longer responding to my keystrokes. Somewhat nervous, I pressed the BREAK key, essentially rebooting the OS, and typed OLD to recover the program from its original memory location. Phew, that worked, it was still intact. I tried saving again, but this time, the counter didn't even start running. I really didn't want to hard reset the machine, because I assumed that the failed save would have corrupted the tape, so the computer's RAM was the only place where the QR program still lived.

A few more tries, a few more reboots, nothing helped. Until, suddenly during a SAVE command, the screen got garbled and seemingly random pixels scrolled across the screen. I pressed BREAK again, expecting to end up at the BASIC prompt again, but instead was dumped into the ViewSheet spreadsheet program. I had no idea that this machine even had that ROM chip installed. The fact that I booted into it seemed like a severe case of memory corruption. I guessed, correctly, that *BASIC would get me back to BASIC mode, but by now OLD didn't work anymore, and my program was definitively gone from the computer's memory.

Now what? Let's cold-boot the machine and see what can be recovered from tape. But it seemed that the poor old thing was too far gone for that… instead of booting, it gave a continuous tone from its speaker, didn't show anything on the screen, and didn't respond to the BREAK key. I tried powercycling it literally hundreds of times, each time getting a slightly different tone and sometimes getting the keyboard LEDs to light up, but I couldn't get the machine to boot anymore. It seemed that my trusty old pal had finally given up the ghost.

I had hoped to make the QR code program send itself across to the PC as a maiden voyage. I'm quite confident that (perhaps with even larger pixels) I could have made it work. Then I could post a YouTube video of a data transfer in progress, to help explain to my family what I've been up to in the past few weeks. I could publish the code on GitHub, annotate it, explain it in detail on this blog. Maybe it would even be useful to someone, although I didn't expect it to. I don't think any of that is going to happen anymore.

But of course, this QR code program was just a means to an end. The point of this project was to transfer all the old programs I wrote as a kid. Those programs are still there, sitting on a stack of 5 1/4" floppy disks on my desk, so close and yet so far away. If the machine would have worked again, I would have typed a simpler program, too small to need saving, to try and transfer them via sound signals. That, too, isn't going to happen anymore. Even if I could fix the computer itself, the disk drive also seems to be broken since the machine won't even boot when it's plugged in.

Fortunately, there is another way: a controller board like the KryoFlux can be used in conjunction with a PC style 5 1/4" drive to rescue data off of virtually any type of disk. I just ordered a KryoFlux board, and placed a bid on eBay on a compatible floppy drive. Fortunately, working drives for PCs seem to grow on trees still. The game is not over.

But for my old BBC Microcomputer, it is. The very device that launched my career as a software engineer will never run again. When I'm done mourning it, I'll probably take it to an electronics recycling facility. May it rest in peace.

Thursday, January 22, 2015

QR codes on the BBC Micro, Part 5 of 6: Developing on the Beeb

In which I marvel at how programming has changed in the past 30 years and how it has remained the same, and fret over spurious hardware failures.

After having written a program on the PC to generate QR codes, it was time to do the same thing on the BBC Micro. During a long train journey, I had already written most of the assembly code on paper, so it would be a simple matter of typing it in, and writing the surrounding glue code in BASIC.

I switched on the machine and… nothing happened. Just a slight “pop” from its speaker. A few more tries got me the “boooo” tone normally followed by a cheerful “beep!”. Another few tries, and it went “boooo-beep!” as normal, showed me a prompt and responded to my keystrokes, then froze after a few seconds. Another reboot, and it worked happily for hours, until I turned it off and went to bed.

This was a repeating pattern over the next few days, as I wrote and debugged my program. Every time I wanted to turn the machine on, it took more attempts to get it to work. But every time, it eventually worked. On the last day, it took so long to get the machine working that I just left it on overnight.

But when it worked, it worked well. I found that I loved the predictability of this machine. There is 32 kB of RAM in a single address space, no virtual memory, no processes, no threads. Just your program and the OS – which mostly gets out of the way while your program is running, so you are in full control of the machine.

Compared to modern machine architectures, this is wonderfully simple. There is no distinction between the interpreter, the shell and the editor; the BASIC prompt serves all three purposes just fine. It acts as a REPL, so you can quickly try out oneliners and see if they have the desired effect. If you have broken out of your program with the Escape key or the END statement, all variables remain intact for your inspection, essentially giving you non-resumable breakpoints. The prompt also serves as a shell, so you can save and load programs, list and manage files, and so on.

But the most interesting thing compared to present-day systems is that there is no standalone editor; the prompt is your editor. To make this work, every line in a BASIC program has a number. When you type a line that starts with a number, it gets added to the program in the appropriate place. When you use a line number that already exists, it replaces that line. The LIST command (abbreviated L.) is used to show the current program, or a range of line numbers that you pass to the command.

But what if you don't want to add or replace an entire line, but just make some change to an existing line? This is what the COPY key is for. Using the arrow keys, you can control a secondary “copy source” cursor, which can be positioned anywhere on the screen. When you press COPY, the character under the copy cursor is copied as if you typed it on the keyboard, and both the regular and the copy cursor are moved to the right. This enables you to copy lines from a listing while changing them. It also enables interactive development: you try running a line of code, and when you got it to work, you prefix it with a line number and COPY it into your pogram. Your screen essentially doubles as the equivalent of the clipboard in modern operating systems. But when something scrolls off the top of the screen, it's gone forever, so this mechanism isn't all-powerful.

So far, I've talked about BASIC. But I couldn't write the entire program in BASIC, because it's too slow. It takes about 15 seconds to simply invert all screen pixels, something a machine code program can do in much less than a second – say, 100 times faster. My back-of-the-envelope calculations made it very clear that a 100× slowdown would be unacceptable. But how do we write and edit machine code?

It turns out that BASIC fulfils yet another purpose on this machine: it acts as an assembler. When you write assembly instructions between the characters [ and ], upon interpreting these, BASIC generates the appropriate machine code instructions at the address indicated by the variable P%. The great thing is that it's still “just BASIC”, and you retain the ability to use BASIC expressions to compute values inside the assembly code. This makes for a really powerful system where you get the best of both worlds: the speed of assembly, but the expressive power of BASIC as a kind of preprocessor.

All in all, I was pleasantly surprised by how easy and fast it was to develop directly on the BBC Microcomputer. Of course the development cycle isn't as fast as on a modern system, but it comes surprisingly close. It took me just a few evenings to type in and debug most my hand-written assembly routines. After writing each routine, I wrote a few simple unit tests to make sure it was working, but I threw them away after they passed to save space.

I started by drawing the structural elements of the QR code using BASIC MOVE and DRAW commands, making the parts white where no data would go. Then I turned to assembly to “snake” the data through it. Then some more BASIC to cut out the black pixels of the structural elements, and finally a simple assembly routine to invert all pixels on the screen, because when drawing a black-on-white QR code on the white-on-black screen, I found it easier to think “inverted”. Then some more assembly code to prepare the data codewords by shifting each data byte down by half a byte.

Then, one evening, when I just needed to implement the error correction calculations to complete the program, the computer wouldn't turn on at all anymore. No “boooo” sound, no “pop”, not even the usual slight hiss, no keyboard LEDs, nothing. Retrying did not help. I started eliminating possible causes. First I unplugged the monitor. No dice. Then I unplugged the tape deck. No dice.

Then I unplugged the disk drive. “Pop”, the machine said. A few more tries, and it was running again. Turned it off, plugged the disk drive's power cable (but not its data cable) back in, turned it on, nothing. So it seemed that the power cable was somehow messing with the boot process. Maybe a short circuit in the cable, or in the drive?

I decided to worry about this later. First I wanted to see if I could make a scannable QR code containing some useful data. I put the final pieces of code in place and rendered a QR code containing the byte 255, repeated 2953 times. I did the same on the PC and checked by hand that the pixels looked about right.

A bunch of 0xff bytes in a QR code on the Beeb's screen

Once more, the question was: will it scan?

Wednesday, January 21, 2015

QR codes on the BBC Micro, Part 4 of 6: Prototyping in C

In which I wrap my head around algebra and write code for myself, not the machine.

The preliminary feasibility study turned out positive, the hardware was sufficiently cooperating, so I was cautiously optimistic that this thing was possible. Time to learn what QR codes are all about.

The QR code standard is behind a paywall at ISO, but it's trivial to find online elsewhere, and before long I had a 124-page document on my screen. I began to read. For an introduction that's easier to digest, I recommend this guide at Thonky.com.

So, there are various sizes of QR code. I already knew that. The standard talked about various alphabets, including numeric, alphanumeric and Kanji, but fortunately, raw bytes are also among the options. Then it describes in detail where all the standard parts go: the big corner squares, the smaller inset squares, and the barely noticeable timing patterns along the top and left. So far so good, I can code that.

Then it starts getting complicated. The stream of codeword bits has to be snaked through the matrix and in between the other parts in a very particular way: left-up-right-up until you hit the top, then left-down-right-down until you get back to the bottom. Sometimes not all these modules are part of the timing or alignment patterns, and you need to skip them. At the leftmost timing pattern, we skip the entire column. A lot of fiddly edge cases, but doable.

As a final step, we have to apply a XOR mask to all data bits, to make sure no patterns appear that look like the timing and alignment patterns and might throw off the QR code reader. The standard specifies eight different masks, and a QR encoder is supposed to try them all. Then it should choose the one that minimizes some penalty computed from patterns that appear in the output. I decided to just go with a single type of mask first, the one that was easiest to compute, and hope for the best.

But the really hard part is actually generating the codeword stream. The data is broken up into blocks, and for each block we have to compute a set of error correction codewords. In the case of the 40-L QR code, there are 19 blocks of 118 codewords, and 6 blocks of 119 codewords each; for each of these blocks, we have to compute 30 error correction codewords. The error correction codewords are simply the coefficients of the remainder polynomial when we divide the data polynomial by a particular 30th-degree generator polynomial with coefficients in the GF(256) Galois field. Wait, what? The words reminded me of a class on error correction codes I took in university, but I had all but forgotten the details. How do you do polynomial long division? What is a generator polynomial? What is a Galois field?

I won't explain it all in detail here, but again, Thonky.com has you covered. To summarize: polynomial long division is just like regular long division; if you divide by a polynomial of degree n, you get a remainder polynomial of degree n – 1, at most. The generator polynomial can be calculated somehow, but I didn't need to know, because it was listed in an appendix. This particular Galois field is a field with 256 elements, and if we represent those by 8-bit numbers, addition can be done by the XOR (exclusive or) operation. For multiplication, we have a log and antilog table, which let us do multiplication as addition, preceded and followed by table lookups. Nice!

So, time for some code. I decided that this stuff was complicated enough that I wouldn't be able to learn it, re-learn 6502 assembly, wrestle with memory limits, and debug the code at the same time, so I wanted to build a prototype on the PC in C99 instead. It was a simple matter of programming (ahem) that I spent some free evenings and weekends on. I wrote it very carefully, in the same way I'd have to do on the Beeb, checking output manually at every step. This would have been perfect for TDD (test-driven development), but I wouldn't have that luxury on the Beeb due to memory constraints, so I refrained from adding too many tests.

A few weeks later I had about 350 lines of C code that took on its standard input the ASCII text of the book I'm currently reading, and produced a PNG file that looked like this:

Output of the C program

The code is up on GitHub if you want to take a look. You can see some repeating patterns; this is because it's ASCII text, where not all bits are equally well-trodden. But now again that big question… will it scan? Let's see!

ZXing Barcode Scanner recognizing the above QR code

Oooh yeah! Hole in one! Victory!

So, how big is this code? Remember that the BBC Micro has only 32 kB of RAM, about 16 of which we can actually use in the screen mode we're targeting. I estimated that I'd need about 5 kB of scratch memory to generate the QR code. The C source weighs in at 11.5 kB, almost exactly the space we'd have left on the Beeb, but of course that doesn't mean much. The binary program on the Beeb might be larger because of the limited instruction set and lack of standard library, or smaller because it doesn't contain semicolons and whitespace.

A better comparison would be the compiled C code; the .o file is only 8.7 kB. But it would also contain headers and such, a symbol table, maybe even debugging symbols; how much executable code is actually in there? We can find out using objdump -d, which gives a nice disassembly of the executable code, next to the raw bytes so I could count them with a simple shell pipeline:

$ objdump -d qr.o|egrep '^ *[a-f0-9]*:'|cut -c11-32|wc -w
4762

So that's 4.7 kB of x86-64 code, with its weird variable-length instruction set. On average, these instructions were 3.7 bytes long, whereas the 6502 instructions on the Beeb are 1-3 bytes each (instruction plus up to 16 bits of operand), so on average probably about half the size. That was the good news. The other good news was that gcc probably didn't produce the smallest code possible. The bad news was that an x86 processor can do 32-bit arithmetic, whereas on the Beeb you have to hand-roll everything that's more complicated than 8-bit addition. Even 16-bit addition and multiplication are not supported by the hardware. So the comparison could go either way.

So why not cross-compile this C code to target the 6502 processor, and type it into the Beeb? Maybe I could have. But I felt it would be more trouble than it's worth; one typo and the program is ruined, but if I hadn't written the assembly myself, there'd be no way I'd be able to debug it. Also, I wasn't sure that a compiler targeting such an old CPU would be any good; the only one I tried didn't even get past the parsing phase, and choked on my C99-style variable declarations. I also had some experience with writing tiny code and felt up to the challenge.

Tuesday, January 20, 2015

QR codes on the BBC Micro, Part 3 of 6: Storing progress

In which I fight an uphill battle with barely functioning hardware so I don't lose my precious bytes.

As I started looking into the standard for QR codes, and browsed some tutorials (this one is excellent), it dawned on me that generating QR codes is by no means simple. Putting in all the fixed elements (like the three corner squares and smaller internal squares) is easy enough, but the data stream needs to be snaked through this in a very particular (and peculiar) manner, and even coming up with the data stream requires a fair amount of algebra. This wasn't going to be a Sunday afternoon project.

That meant I needed some way to save my code on the Beeb while I was working on it. Remember, it has no internal hard drive; typical storage options are 5¼" floppies and cassette tapes. Tapes are a bit of a pain to work with: your programs are stored much in the same way as songs, so you need to remember or write down where on the tape they are stored, and fast forward/rewind to the right location. It's also pretty easy to accidentally overwrite (part of) an existing program.

So let's look at the disk drive first. I knew that it worked for reading floppies, otherwise this whole project wouldn't have gotten off the ground, but I realized I hadn't actually tested writing. I dug up a disk with some unused data on it, and tried to format it. Formatting will also check whether the disk is still okay, so I ran less risk of losing my work.

This turned out to be a no-go. The *F80 and *F40 commands found in some old manual both resulted in the error “Bad command”. The DFS Manual I found online offered *FORM, but it didn't work for me either. Maybe the disk ROM installed in this particular machine is of a different breed, I don't know. (Yep, that's how you installed “device drivers” lacking an internal disk drive. You would use your fat fingers to open up your £1000 computer and poke an extra chip into it. Those were the days.)

Anyway, lacking a format command, I figured I'd just delete the existing files. How hard can it be? *DELETE oldfile should work, right? Nope, Disk read only. Is it? These old floppies have a small notch on the side, and you can cover it with a sticker to make the disk write-protected. There was no such sticker on this one, but I tried another one just to be sure. Disk read only. Another? Same problem. I tried adding and removing stickers to make sure I hadn't gotten the logic backwards, but that did no good either. According to the computer, every disk I threw at it was unwriteable.

The write-protect indentation in a 5¼" floppy disk.

At this point, I wasn't sure where the problem was, but I had a few hypotheses. How would you build a device to detect whether the notch is covered? I'd either have a little spring-loaded lever that moves into the indentation when the disk gets inserted, or I'd have an optical system with a light and a photoresistor to detect whether there is line of sight between the top and the bottom. In the former case, the spring might have become too weak over the years to overcome the friction. In the latter case, there might be dust inside the drive, blocking the light's path.

Carefully, because this drive was critical to the entire project, I unscrewed the cover. There was a little black block of plastic around the notch, but without any mechanical parts. At the back, I could see tiny slits where I presumed the light would go through. I couldn't see any obvious dust, but I blew into them anyway. It made no difference.

The optical sensor that detects write-protected stickers on the disk.

The drive identified itself as a Canon MDD-221. Online, I found a couple of maintenance manuals and even circuit diagrams. There were three wires on the flatcable that went to the sensor, labelled G, 5 and WA. I guessed that G would be the ground and 5 would be the +5V line to power the LED. That left WA for detecting the photoresistor's resistance. This was on the PCB on the bottom of the drive, whereas most action takes place on the top one; a green wire also marked WA went from the bottom to the top. Armed with this knowledge, I found the right lines in the circuit diagram, and confirmed that there were indeed an LED and photoresistor inside the black box. The WA line went into pin 9 of an IC labelled J4, and from another pin of that IC came a line marked W. PROTECT. It took a while to realize that the other end of that line just went straight to the flatcable between the disk drive and the computer. In other words, if I could wire the right pin to be either high or low voltage, I could override the write protection mechanism.

I'm not an electrical engineer. I can unscrew things and look at them, maybe even poke at them with a multimeter, but deep understanding of how such circuits work is beyond me. Besides, I didn't want to experiment and risk breaking my last and only disk drive. So at this point I chickened out, put the drive back together, and dug up the old tape drive and an unused cassette tape. I had used the tape drive before, and it worked fine for loading in most of my old games, like Hopper and Asteroids.

Attempt to make the tape drive read back what it just wrote.

But somehow, the tape drive also thwarted me: it was as though the data was never written. Neither LOAD nor *CAT (abbreviated *.) would show it to me, even after waiting long enough for three copies of my tiny test program to have passed by on the tape.

I flipped the tape around to try the other side. It contained my Elite save game (I think I made it to Dangerous level back in my youth), but fast-forwarding beyond that, I managed to successfully save my test program and load it back in again. Hooray!

I also noticed something else: this time, during saving, the RECORD light of the tape drive lit up. I didn't recall seeing that before. Maybe I didn't press the RECORD button hard enough together with PLAY? I flipped the tape again to keep my Elite savegame safe, rewound, and tried saving again. This time, it worked like a charm, so I decided to use this side exclusively from now on.

So, armed with a read-only disk drive and a single side of a tape to store my code in progress, I set out to discover the secrets of the QR code.

Monday, January 19, 2015

QR codes on the BBC Micro, Part 2 of 6: Proof of concept

In which I learn about weird video memory conventions and run a small-scale experiment.

Before diving in and writing assembly code to generate QR codes on the BBC Microcomputer, it seemed wise to check whether those could be read from the screen at all. After all, it's an old monochrome 50 Hz interlaced CRT monitor, and although the pixels are quite large by modern standards, they are far from stable, crisp squares. The picture flickers, and bright pixels tend to bleed into darker areas. Would my phone's camera and QR app even be able to recognize the code?

So I took the example QR code from this excellent guide, and started to convert it to a bunch of bytes. This is a 1-Q code, the smallest variant, easy enough to type in by hand once I've got the raw bytes printed on my PC screen. It contains the text “HELLO WORLD” in alphanumeric encoding.

Using the Gimp to downscale the image, remove the border and convert it to a PGM file was easy enough. PGM is an image format that supports an ASCII-based variant, so it's trivial to read such a file in a C++ program.

The idea was to have the C++ program pack the pixels into bytes, as a monochrome image at 1 bit per pixel. The Beeb's display memory is a part of its main memory, so we can put things onto the screen simply by writing the bytes into this memory.

So where exactly is this display memory? This depends on the screen mode. The Beeb has 8 screen modes, 0 through 7, each with its own characteristics: resolution, colour depth, and memory usage. The largest possible QR code is 177×177 modules (“pixels”), so I'd need at least that much resolution, but I didn't care about colours and would be fine with monochrome (1 bit). Moreover, the pixels must be more or less square; mode 2 in particular has very squashed pixels. It seemed like mode 4 would be my best bet: 2 colours (black and white), at 320×256 pixels, taking up 10 kB of the computer's 32 kB of RAM. These 10 kB are mapped into the addresses &5800…&7FFF. (On the Beeb, & is used to denote hexadecimal numbers).

But here the fun begins. You'd expect the pixels to be stored linearly in memory: the first byte contains the 8 leftmost pixels in the top row, the next byte contains the 8 pixels right to those, and so on. But on the BBC Micro, things work differently: the first 8 bytes go down to form the first character cell, then we jump back up and to the right to the second character cell, and so on:

The memory layout for graphics mode 4.

I guess it works this way to make displaying characters faster, because they can be written into 8 contiguous bytes of memory. This mapping will probably cause me headaches down the road, but once you know the system it's easy enough to code for. After a while I had a BASIC program to poke the right bytes into the right place in memory:

10MODE4
20!&5A90=&BABA82FE
30!&5A94=&00FE82BA
40!&5A98=&FA5ACA13
50!&5A9C=&D8AB4AD2
60!&5AA0=&E8E808F8
70!&5AA4=&00F808E8
80!&5BD0=&B42BBDDE
90!&5BD4=&82FE00DF
100!&5BD8=&58130FCE
110!&5BDC=&A46689EE
120!&5BE0=&C00070D0
130!&5BE4=&B87840F8
140!&5D10=&82BABABA
150!&5D14=&000000FE
160!&5D18=&E742B8D2
170!&5D1C=&00000050
180!&5D20=&3018A038
190!&5D24=&00000010

I typed it in, fixed the typos, and tada! A QR code! On 30 year old hardware!

A QR code displayed on the Beeb's monitor. Maybe the first ever?

Now the big question… will it scan? I launched the ZXing Barcode Scanner app, pointed it at the screen, waited for the camera to focus, and… nothing. I fiddled with the brightness and contrast knobs on the CRT, but to no avail. The scanner showed brief blips of partial recognition everywhere but in the actual code.

Would this project be doomed to fail before it even got properly started? Let's try one more thing. The QR code specification says that it's legal to invert the colours on a code, like we've done here: what is normally dark is here shown as light, and vice versa. However, this is an addition to the 2005 edition of the spec. Perhaps it's not implemented? So let's invert the screen and find out:

200FORI%=&5800TO&7FFF:?I%=?I% EOR &FF:NEXTI%

A quick explanation for the uninitiated is probably in order. It's just a for loop from &5800 to &7FFF, inclusive. The loop counter is I%, one of 26 resident integer variables, presumably a bit faster than a plain I. Spaces are optional in most of BBC BASIC, which means you can write stuff like FORI and it will recognize the FOR keyword just fine. (This also means that you are in for a surprise if you name your variable ANDY, DIVA or COST.) For each of these addresses, the ? operator is used to query the byte (“peek”), EOR is used invert it by xor'ing it with &FF, and ? is used again to store it back.

So I waited for some 15 seconds while this loop ran and inverted the screen, and ended up with a properly uninverted (outverted?) QR code.

An uninverted QR code displayed on the Beeb's monitor.

The big question again… will it scan? And after lowering the brightness on the monitor to reduce blooming, the answer is a resounding YES! The code was recognized almost instantly, and the text HELLO WORLD appeared on my phone's screen.

ZXing Barcode Scanner correctly recognizing the above code.

Even at a distance sufficient to scan a screen-size QR code, it still worked quickly and reliably. So far, so good!

Sunday, January 18, 2015

QR codes on the BBC Micro, Part 1 of 6: The beginning

In which I revel in nostalgia and explain what this series is all about.

On a shelf in my dad's study, there is an old photograph, made some time in the late '80s. In it is my dad, with 3-year-old me sitting in his lap, looking at this very device:

My first computer, the BBC Micro.

Some of you might recognize this beauty as a BBC Microcomputer. One of the best things to come out of the '80s, the BBC Micro (or “Beeb” for short) became hugely popular in the UK, and to a lesser extent in the rest of the world. They were used at the school where my dad worked, and that's how this specimen ended up in our house. I'm not exaggerating if I say it has been a defining influence on the rest of my life.

But at some point, the PC took over the world, and our house. At first the Beeb was moved to my bedroom, but it eventually ended up in a box in the attic when I got a PC of my own. The reason it's now sitting on my desk again is that my parents are moving house, and would rather have thrown it away than move it to a different, much smaller attic. I couldn't let that happen. Having moved all my possessions between four countries I'm not one to get attached to inanimate objects, but this little machine is special.

On the other hand, in the digital world, I'm a bit of a hoarder. My fileserver still contains my university coursework, the Word documents that I wrote for school assignments at the age of 14, and the source code of the games I made on my first PC and tried to sell to people on floppy disks. But the record becomes harder to access beyond that: my very first programs were written on the BBC Micro, and are stored on these old 5¼" floppy disks, which don't fit into my PC's USB 3.0 port or its BluRay drive.

Sadly, these old floppy disks don't have eternal life. Nor does the Beeb. Nor its disk drive; these drives are notoriously fragile, and a replacement might be hard to find. So I set out to salvage the earliest work in my programming career and back it up onto more modern media. It would be the completion of the earliest records of my digital life.

From the low-level software point of view, a floppy disk contains just a series of bytes. If we transfer all the bytes and put them into a file on the PC, we have created an image of the floppy: an exact replica, which can be loaded into an emulator like BeebEm and used as if it were the real thing. And since the image is just a file, it can be backed up on any of the many storage media available today and in the future.

But as I mentioned, my PC does not have a floppy drive. The PC I had before this one didn't either. The one before that might have had a drive for the “modern” 3½" not-so-floppy disks, not the 5¼" that the BBC Micro uses. Old IBM hardware, like 286 and 386 PCs, typically did use these floppies, but I think they were in an incompatible format. Even if not, the flatcable that connects the computer to the drive is definitely different.

So my best bet seemed to be to let the BBC Micro itself read the disks into memory, and somehow transfer the data to the PC. Easier said than done. The Beeb predates the Wifi era and even the Ethernet era. It does come with its own type of networking interface, called Econet, but of course no modern machine has any clue how it works.

Another option would be the 5V analogue I/O port. I once used this to transfer data between the Beeb and my TI-83 graphical calculator, so I knew it was possible. Most PCs built in the 90's came with an RS-232 serial port, but the voltage levels are different. I would need to build some kind of adapter, but I'm not an electrical engineer, and I was terrified of blowing up my only remaining Beeb by accident. Moreover, my PC lacks such a serial port to begin with.

So I started considering more arcane options. The first thing I thought of was sound. The BBC Micro comes with a pretty cool 4-channel audio chip, and even from BASIC you can make it play 256 different pitches. That's one tone to transfer one byte – great! On the other end, we could do an FFT to detect the pitch, and map it back into a byte. However, the minimum duration of such a tone is 1/20th of a second, so the maximum transfer rate would be 20 bytes/s. At that rate, copying a single 180 kB floppy disk would take over 2.5 hours of listening to noisy bleeping and blooping. Not fun, but definitely doable! I was encouraged.

However, there is a drawback. We need an exact copy of the signal. Misinterpret a single byte, and the program ceases to work. Worse, if we lose or gain a byte halfway the stream, the entire alignment of the data is thrown off and the floppy image becomes unreadable.

So at the very least, we need a checksum. CRC32 is relatively easy to implement and might work well enough to detect errors. But checksums are just a class of error-detecting codes, so while they can tell me that the transfer went wrong, they cannot tell me where or how, and that would be 2.5 hours of bleeping and blooping down the drain.

What we really need here is an error-correcting code. The most well-known is the Reed-Solomon code, used for data transfers from the Voyager space craft, and more recently also on CDs, DVDs and Blu-ray discs. But there's another place where RS codes are used.

A QR code containing the text “HELLO WORLD”.

The ubiquitous QR code, used on posters, milk cartons, websites and even tattoos, is little more than a bunch of 1s and 0s encoded as black and white squares, with Reed-Solomon error correction codes on top, and some markers thrown in to help the scanner. Would it be possible to display a floppy's contents on the screen as a series of QR codes, scan each of them with a smartphone, and assemble them at the other end into a disk image?

Let's see. Most QR codes are fairly small, as they contain just a web address or some such. But they can actually grow to the “Level 40” monster code of 177×177 squares, which (at the lowest error correction setting) can hold 2953 bytes of arbitrary binary data. That's about 60 QR codes to transfer the contents of one floppy. At a conservative rate of one QR code per minute, I'd be able to reliably transfer an entire floppy in an hour.

A level 40 QR code from Wikipedia. I haven't checked whether it contains anything NSFW.

Encouraged, I did some back-of-the-envelope time and memory calculations. About 4 kB for the QR code data bits, and maybe another kB for scratch data. In screen mode 4, we have about 16 kB to play with, so that leaves 11 kB for the BASIC program and its assembly output. A tight squeeze, perhaps, but not impossible.

On to calculating time. Maybe it takes 1000 clock cycles to generate one pixel, so at the 2 MHz that the Beeb runs at, we should be able to generate a QR code in 16 seconds. Even if I'm off by an order of magnitude, that's still fine.

So let's do this!