DD9 Kaypro Edition

Deep Dish Nine goes Retro!

Sometime in the early 2000s I acquired a Kaypro 2/84 computer (side note: the picture in the wikipedia article with the mis-matched floppy drives is actually my Kaypro!), and I’ve been meaning to do something ‘interesting’ with it ever since. It’s a nice Z80-based ‘luggable’ computer running the CP/M operating system. General stats:

  • Z80 Processor @4 MHz
  • 64KB of RAM
  • Upgraded to use the ‘Advent TurboROM’
  • Upgraded to dual DS/DD 5.25″ Floppy drives (360KB/disk)
  • 9″ green CRT supporting 80 columns x 25 rows (or 160×100 pixels in ‘graphics mode’!)
  • Two RS-232 serial ports supporting speeds up to 19.2Kbps (one of which is tricked out with a sweet Wifi232 module, the other unused)
  • One Parallel Port (currently unused)
  • Unpopulated internal areas to support a real-time clock, and a 300 baud modem!

The TurboROM also adds support for the following (if you can find them, and/or figure out how to wire them in):

  • Up to 2 RAM Disks of 256KB, 512KB or 1MB each
  • Support for up to 4 floppy disk drives (180KB, 360KB or 720KB)
  • Support for up to 2 Hard Drives of up to 56MB each

A Kaypro 2/84 in all its green-phosphor glory

All in all it’s a pretty neat machine, and it’s an interesting example of the early computer world before Apple and IBM-compatible machines killed off all the competition. One of the nice things about a Z80 CP/M machine like this is that there’s actually a semi-viable software ecosystem lovingly archived and available fore free on the internet. You can find lots of ‘productivity’ software (spreadsheets, text editors, etc.), terminal programs, text-based adventure games like Zork and even development software like compilers. My recent forays (1, 2) back into game development got me thinking – how hard would it be to write a ‘graphical’ game for my Kaypro? I stumbled across one or two drawing programs for the Kaypro, but almost nothing else that actually took advantage of its limited graphics capabilities.

I finally decided I was going to make an attempt to fill this niche and see what this old dinosaur can do. Unfortunately my free time is a bit more limited these days, so rather than start from scratch I decided to see if I could port the “Deep Dish 9” game I developed for my Arduboy. On paper the Arduboy and the Kaypro aren’t actually *that* far off in terms of specs, so it seemed like a natural fit. I also wanted to actually develop my game *on* the Kaypro, so after stumbling across this article (and this one), I decided that TurboPascal 3.0 would be my weapon-of-choice.

A surprisingly modern language

My Wifi232 (along with a copy of Kermit on both sides!) makes moving files between my linux laptop and my Kaypro a snap, so I whipped up a new floppy with a copy of Kermit and TurboPascal on it, and got to work. The fantastic thing about TP is that not only is it a perfectly reasonable programming language, the software is a full IDE that fits in 26KB! It includes a great text editor as well as a compiler with nice debugging features – seriously, programmers these days could learn a thing or two about writing compact software. Imagine a full IDE that fits comfortably in your CPU’s L1 cache?? That was actually one of the biggest surprises of this experience – the edit-compile-debug loop that anyone writing software is so familiar with is really nice on this machine. It’s probably better than my ‘modern’ programming experience when I’m working on less-retro things. Sure compiling can take a few seconds (probably <15s or so), although this would be mostly alleviated by a RAM Disk, but the 80-column screen is nice and spacious, the mechanical keyboard is nice and clicky, and the mono-tasking nature of working on a computer with 64KB of RAM means no distractions. I had always assumed that writing software on the Kaypro would be extremely unpleasant at best, so this was a nice surprise.

The Turbo Pascal startup screen

Getting back to the actual porting process – I started by literally translating the source code from C++ to TurboPascal. It turns out very little re-factoring was necessary to make this happen, although I had to stop frequently to look up the right syntax (some of which feels a bit odd by modern standards). TurboPascal feels like a decently ‘modern’ language, though, and it didn’t take very long to get comfortable with it.

Working on the DD9 code

I started working on this a few weeks after my daughter was born, so it worked well as a quick hobby project that could be worked on during nap times (one more advantage of the Kaypro – it boots up in about 4 seconds!).

Me trying to type quietly on a mechanical keyboard

Once I had the core of the game in place and began testing, it immediately became apparent why there aren’t more graphical Kaypro games. The straight port of DD9, which runs comfortably at 30 FPS on my Arduboy (re-drawing the screen from scratch between frames!), ran at an achingly slow 0.25 FPS. Eek! The Z80 runs at 4 MHz, vs the 16 MHz of the Arduboy, and the CRT is controlled by a 6545A-1 “CRT Controller” chip vs the 128×64 SPI display used by the Arduboy, but I didn’t expect the performance discrepancy to be that bad. How do these machine’s actually compare?

  • CPU: I’m not super familiar with the Z80, but it appears that it takes somewhere between 3 and 18 clock cycles to execute a single instruction. The Atmega 32u4 in the Arduboy, on the other hand, is both 4X faster and executes about 1 instruction per cycle, giving it an advantage of somewhere between 12X-72X. The core of the DD9 engine started out as the N-body simulator used by HANS, and was originally designed for accuracy and simplicity, not speed. The code makes liberal use of floating point and makes no effort to reduce the number of calculations needed per iteration.
  • Display: The Arduboy has an OLED display connected via SPI with an SSD1306 controller (like this one), that can probably push pixels at close to the 4 MHz SPI clock rate. The Kaypro has a 6545-1 CRT Controller, which is really designed to control a text-based terminal, with a few fancy upgrades. For a variety of reasons, which I’ll explore next, using the Kaypro’s graphics is extraordinarily slow. Whereas the Arduboy could probably push a million+ pixels a second, the Kaypro is probably closer to a few 10s of pixels a second. That makes creating an interactive, graphical game significantly more challenging.
  • RAM: They Kaypro actually has a bit of an advantage here – it’s got 64KB for both code and data, whereas the Arduboy has 32KB for code and 2.5KB for data. There are ways this could be used to improve the performance (some of which I mention later), but I mostly neglected them for a ‘quick & dirty’ approach to optimization

To understand why the performance is so terrible, it’s import to understand how the Kaypro actually managed to create its ‘graphics mode.’ The CRT controller is really designed to drive an 80-column x 25 row character display. The CRT controller has 2KB of dedicated character SRAM used to for the display, as well as an additional 2KB of ‘attribute’ SRAM that corresponds 1:1 with the characters. Each character supports an extra ‘attribute’ like underlining, half-brightness, blinking, etc. The CRT controller indexes into the character and attribute RAMs as it drives the CRT, and the output of those RAMs index into a character ROM that effectively holds the font for the display. The Kaypro’s display is actually pretty great for the early 1980s – the effective display output when driving text is something like 640×400 pixels – the text is crisp and legible on the 9″ monochrome display. To create the ‘graphics’ mode, the Kaypro can treat each ‘character’ is a 2×4 block of pixels. It uses the upper 128 character values (since all of ASCII lives in the lower 128 values) to control 7 of the 8 pixels directly in a 1:1 manner, and then covers the rest of the possibilities by using the ‘inverse video’ attribute, which inverts the light/dark values for the whole character block. Voila! 160×100 graphics bolted onto something originally designed to emulate an ADM3-A terminal.

The Fenton Heavy Industries logo – artisanally crafted with ‘draw_line()’ calls

This would actually be pretty reasonable if it weren’t for the very, uh, cost-sensitive way in which it were designed. The main issue is that both the CRT controller and the CPU need to access the character and attribute SRAMs. Additionally, the CRT controller runs at a max of 2 MHz vs 4 MHz for the CPU. The best way to do this would be to use dual-ported SRAMs, and allow both devices to access the RAMs whenever needed, but those were expensive. The 6545 supports other high-ish performance modes, like using alternate phases of the clock to access the RAM (but requires fast RAMs or a slow clock). Kaypro went with the cheapest method, which was to not even allow the CPU to have direct access at all. To write into a location, the Kaypro writes the data to a latch, then writes the address to the 6545, and then polls until an ACK comes back. The 6545 sneaks in CPU-requested accesses during the horizontal/vertical blanking periods, which right away limits you to something like 20,000/second. To make the graphics convenient for programmers, Kaypro added ROM routines to set/clear pixels and draw/clear lines. Unfortunately, each of these routines needs to do a read-modify-write on both the character and attribute RAMs for every pixel it touches! I ported the ‘drawSlowXYBitmap()’ function from the Arduboy library to the Kaypro, as that’s what I used to display the logo graphics in DD9, and it is slow indeed! The function is a pretty tight loop of calling the setPixel() routine, and you can watch the logo materialize. I didn’t benchmark it carefully, but I basically took that as a strong sign that I should minimize how many pixels I touch each frame.

Optimizing!

Now that we’ve framed what a miserably slow computer this is, how did I go about making it semi-playable? First, I decided that I needed to minimize the number of floating point operations. I’m both lazy and pressed for time, so I didn’t move to full fixed-point math, but there were a number of optimizations available. The main one was moving from a full n-body simulation to just computing the forces between the individual planets and the sun, and leaving the sun’s position fixed at the center of the screen. The inter-planet forces are pretty much rounding errors on the timescale of the game (90 days), so it effectively made no difference to the game mechanics and reduced my floating point requirements for calculating forces from O(N^2) to O(N).

Next, since all of the planets have circular orbits, they effectively maintain a constant distance from the sun. Since we already reduced our problem to just computing the forces between individual planets and the sun, we can go a step further and just compute the magnitude of that force once for each planet. Now, for each timestep all we need to do is compute the directional components of the force based on the position of planets, which is much less work than computing the distance and magnitude of the force as well.

The main game screen – optimized for the Kaypro!

Finally, since the screen updates are so ridiculously slow, we need to reduce those to a bare minimum. On the Arduboy, everything is so fast that I actually blank the screen and re-draw it from scratch on every frame. On the Kaypro, I divide everything into ‘static’ and ‘dynamic’ elements – the sun and ‘frame’ elements are just drawn at the start of the game and never updated again. For ‘status’ elements that dynamically update, I converted them to text (which is both quite fast to update and very high resolution). For the actual dynamic pixel-drawn elements (the ship, planets and direction indicator), every update involves clearing the previously drawn pixels and then re-drawing whatever new pixels are needed. The movement of the planets is sufficiently slow that I calculate the new location after every timestep, and only update the screen if the actual pixel location changes. As a further simplification, the triangle-shaped ‘ship’ from the Arduboy game was replaced with a single pixel and a separate ‘direction-indicator’ in the console. The BIOS has draw and clear line routines which are significantly faster than repeatedly calling the set / clear pixel routines, so the indicator line is implemented using these. The direction indicator also changes in 45 degree increments rather than the ~1 degree increments used on the Arduboy, in order to make things sufficiently ‘interactive.’

What did all of this effort get us? As I mentioned earlier, the direct port of the Arduboy code updated at about 0.25 frames/sec. A few rounds of optimization got us to the 3-4 frames/sec range, which is actually playable for a slow game like DD9.

Optimizations Not Taken

The above modifications got me sufficiently close to declare victory for the purposes of this project, but there were a number of optimizations I considered along the way. Among those:

  • Hardware upgrade #1 – This would have aided development more than the actual game, but SD-card floppy emulators exist, and they sound like a huge improvement over the DS/DD 360KB 5.25″ floppy drives I was using. I actually lost a floppy drive during development and had to limp across the finish line with only one!
  • Hardware upgrade #2 – There were modifications to let the Kaypro use an 8 MHz Z80 CPU (vs the 4 MHz stock speed), which probably would have made a significant difference for the math bits, if not the display. I actually considered taking this to an extreme and replacing the stock CPU with an FPGA containing a very fast Z80 implementation along with all on-die RAM. I’ve seen people running Z80 cores at ~80 MHz, which would have certainly sped things up, but I decided that this would probably have violated the spirit of the project too much.
  • Hardware upgrade #3 – Along the same vein of #2, I considered using an FPGA to build an ‘accelerator’ for the Kaypro. Conveniently, the Kaypro has a couple of empty DIP sockets on the motherboard that wouldn’t actually make this terribly painful (I actually wired an FPGA implementing a simple SRAM into the socket where the real-time clock belonged on a rainy afternoon a few years ago, just to test this concept). I’ve got all of the RTL for my Cray functional units lying around, so building some kind of unholy “Kraypro” hybrid crossed my mind a few times. Taking it in a different direction, building an 8-bit vector subsystem for the Kaypro also sounds like a super fun project (that, alas, will probably never see daylight).
  • Software optimization #1 – The obvious solution to my bad floating point performance is to move to lower-precision fixed-point math. Being pre-IEEE754, Turbo Pascal uses a 6-byte ‘real’ format, which I think consists of a 32-bit mantissa + 15-bit exponent + sign. Knowing that the Z80 can only operate on 8 bits at a time, and that each operation takes multiple cycles, it’s not hard to imagine why doing division with real numbers is super slow. Moving to 32-bit fixed-point math would probably have sped things up enormously and not been too difficult.
  • Software optimization #2 – Write my own display-update code. By burning 4KB to keep a shadow copy of the character and attribute memory in main RAM (less than that, actually, since I only do dynamic pixel updates to part of the screen), I could enormously reduce the number of required CRT accesses. As mentioned earlier, the BIOS routines need to do two read-modify-write operations per pixel, and each operation can only take place during a CRT blanking period. If I had shadow copies of the character and attribute memories in RAM, I could potentially modify up to 8 pixels with a single access (a 32X improvement!). If I were more serious about my Kaypro graphical gaming experience, this is probably the right way to go, particularly because it would be portable to other people’s machines (unlike hardware upgrades). This could probably be written as a TurboPascal library that could be re-used by different programs, and would open up a lot more possibilities for this elderly computing device. Anyone reading this want to take a stab at it?
  • General hardware upgrades – As I mentioned earlier, the TurboROM has support for RAM Disks and hard drives, as well as a real-time clock, and there is an unused serial port that could accommodate a serial mouse. Significant improvements could also be had by designing an improved drop-in replacement for the CRT controller. This quickly devolves into a ship-of-theseus argument frequent among vintage computing enthusiasts, but as a general computing platform, there is a lot of low-hanging fruit in terms of upgrades if you’re not a purist. Having a fully tricked-out Kaypro would be a really fun project, particularly for eventually writing a full GUI-based OS for it.

Conclusions

I definitely found the answer to my question about why so few graphical Kaypro programs exist. The Kaypro’s graphics are awful – it’s a text-mode machine with graphics bolted on as a box-checking exercise. That being said, the development experience was surprisingly nice and it was a lot of fun to go through the exercise of actually making a functional game for a machine slightly older than me. Have you got an ’84-series Kaypro collecting dust somewhere? Want to test your skill with pizza delivery and orbital mechanics? Bring it out of retirement and grab a copy of Deep Dish Nine! Source and executable are included.