Exploring Kaypro Video Performance

My Crowning Achievement

For no particularly good reason, I’ve mostly thrown my retro-computing lot in with my humble Kaypro 2/84. It was a fairly common Z80-based machine for running CP/M, which means it’s designed for text, not graphics. It has a crisp 80-column, 25-line CRT, and a nice 8×16 font stored in a ROM. It even sports an interrupt-capable serial port that can run up to 19.2 kbps, making it surprisingly pleasant to use as a terminal. It’s most comfortable role is in ‘business’ applications (spreadsheet, database, word processing), along with the occasional game of Zork or other Infocom text adventures. There are a handful of ‘arcade-style’ games that mostly stick to text-mode (sometimes using the misc. characters stored in ROM beyond the normal ASCII set), but nothing like the libraries of nostalgia-inducing games for machines like the C64 or Atari 800.

As I explored in my “Deep Dish 9” port, the Kaypro did support what they optimistically referred to as “medium-resolution” graphics. In what can mostly be viewed as a marketing exercise to check the ‘graphics’ box when the industry started moving in that direction, the Kaypro can (ab)use its character mode display to provide decent-ish 160×100, 3-color graphics (black, green and . . . less green). There is a (sadly, now only available on the internet archive) nice short write-up on the Kaypro’s graphics support, as well as another very fun write-up of someone that wrote a commercial drawing program called SCS-Draw for the Kaypro in the 1980s that really made heavy use of the Kaypro graphics, and includes quite a bit of detail about how it works.

A screenshot from SCS-Draw stolen from the above website

The Kaypro’s graphics support manages to be both incredibly slow and awkward to access, with the latter mostly being the cause of the former (thanks to helpful BIOS routines). The Kaypro’s display is generated by a SY6545A-1 “CRT Controller” chip with 2KB character RAM, 2KB attribute RAM, 4KB Font ROM and a custom logic array to tie everything together. The SY6545A-1 is driven by an 18MHz pixel clock, and the 2048 bytes of character RAM are organized as an 80×25=2000 character array (with the attribute RAM lookup done in parallel). The font ROM maps the 8-bit value in the character RAM to an 8×16 bitmap (256 values x 16 bytes/character = 4096 bytes). ASCII only uses the lower 128 values, so the ‘upper’ 128 values are used to rather inefficiently treat each character cell as a 2×4 block of pixels, with bit 0 in the upper left corner, bit 1 in the upper right corner, etc. all the way down to bit 7 in the lower right corner. The observant reader may have noticed that we can’t clear bit 7 without selecting from the ‘lower’ 128 values in our character set, which map to our ASCII letters (and some other misc. symbols). Attribute RAM to the rescue! Each character can be ‘inverted’ (by flipping which pixels are green and which are black), so we can access the full 2×4 pixel power of each character by optionally inverting the character (which forces the lower-right pixel high), and then setting the rest of the pixels accordingly. Did I mention that this is slow? It’s slow.

                          -----------
                          |  2 |  1 |
                          -----------
                          |  8 |  4 |
                          -----------
                          | 32 | 16 |
                          -----------
                          |128 | 64 |
                          -----------
                      Kaypro Pixel Layout

Why is this so slow? To set an arbitrary pixel in memory, one must read out not just the current character value, but the attribute value as well, compute the new values and then write them back as necessary. As a simplification to the system design, the only way to access the character and attribute RAM is via the SY6545A-1 chip. To make things even better, the only way to access the SY6545A-1 chip is by updating a pointer register to the actual value you want. So, to recap, if we want to set a pixel, we must do the following:

1) Set SY6545A-1 pointer to the lower VRAM pointer byte.
2) Update the pointer with the low bits of our address.
3) Set the pointer to the upper VRAM pointer byte
4) Update the pointer with the high bits of our address.
5) Poll the SY6545A-1 status bit until a video blanking period has occurred
6) Read the value from the latch
7) Repeat steps 2-6 for the character RAM
8) Calculate the new values for both bytes
9) Update the bytes as needed

Bios Blues

The Kaypro BIOS has helpful routines for setPixel(), clearPixel(), setLine() and clearLine(), which are very easy to use, but every bit as slow as you would think. Using the BIOS routines, I wrote a simple program in Turbo Pascal to clear the screen and then set every pixel on it (16,000 in all), which took 53 seconds to execute. That corresponds to a fill rate of 300 pixels/sec, or 0.02 frames/sec if you were to update the entire screen that way, which is pretty much unusable for games. Using setLine() to fill the screen instead of setPixel(), I reduce the fill time to a mere 7 seconds – giving me a fill rate of 2,285 pixels/sec, or 0.14 frames/sec. My last option is ‘video mode’, which can be enabled with an escape sequence, and then blocks of 8 pixels can be set by writing a 2-byte sequence to the terminal. This doubles my performance vs using setLine() to fill the screen, getting my fill rate up to 4,570 pixels/sec, or nearly 0.3 frames/sec! If I want to do something more interesting with the video, clearly I’m going to have to get more intimate with the actual hardware.

With the Kaypro firmly being viewed a CP/M machine, I’ve found surprisingly little software written ‘to the metal’ that bypasses the BIOS routines for graphics, but it’s possible to just dig into the SY6545A-1 datasheet once you understand how things are laid out. So what can we do with a little effort?

Down to the Metal

Datasheet in hand, I wrote some Turbo Pascal routines to see how quickly I could just fill video RAM with a given character. A nice feature of the SY6545A-1 is that the VRAM pointer automatically increments when you access it (to eliminate the need to update it for sequential accesses). If we’re just filling a single character, we don’t need to do a read-modify-write, so we can (in theory) just reset the pointer and write one character per blanking interval. To update 2000 characters, we would need, at minimum, 2000 blanking intervals. Surprisingly, the SY6545A-1 updates the screen at 50 Hz (not 60 Hz, as one might expect for a ‘regular’ CRT in North America), and with 25 character rows * 16 lines/character, that gives us 20,000 horizontal blanking periods + 50 vertical blanking periods per second, so we should be able to hit at least 10 frames/second this way. Turbo Pascal itself starts to be the bottleneck if we go with a ‘safe’ implementation, but we can indeed hit close to 10 frames/second this way.

Turbo Pascal has a nice facility for including in-line machine code (although there is no assembler, so you need to assemble it yourself) – I used an online Z80 assembler/simulator to code a tight machine-code loop that does my character-fill routine, and it is indeed much faster. A modified version is also able to copy an array in memory to the VRAM at a blistering ~25 frames/second – half the refresh rate of the display!

Now that I was confident that the machine wasn’t utterly useless when it came to graphics, I started to think about what else I might be able to coax the SY6545A-1 into doing. The datasheet shows you that you have quite a bit of control over the display, although we can’t change our 18MHz pixel clock without breaking out a soldering iron. One of the more annoying things about the lackluster video performance is that you can’t avoid the ‘tearing’ effect that occurs if you’re updating the video RAM as the screen itself updates. Pushing the limits of 8-bit video capabilities was often about ‘racing the beam’ – trying to generate your image before the electron beam lights up the pixels you’re working on. You get lots of unpleasant glitching as the screen is updated, and you frequently see a left-to-right, top-to-bottom ‘fill’ effect as the screen updates.

Unfortunately, the above numbers are a bit optimistic, since they’re just doing a block update of the character RAM (but not the attribute RAM), so they only work if we aren’t modifying the lower left pixel in each block (or the performance would be roughly cut in half again). Is there any way around this?

Reformatting!

One way of addressing the performance bottleneck might be to explore some alternative video modes. What different modes might be available to use? We can’t change our pixel clock, and we can’t change our VRAM amount, but we *can* ‘reformat’ our display if we’re willing to throw out some information. To really make use of bitmapped graphics, avoiding the double read-modify-write imposed by the Kaypro scheme is essential. Conveniently, the SY6545A-1 lets you choose how many ‘lines per character’ it draws – in our case 16, which matches our Font ROM. We treat each character as a 2×4 block of pixels, which means each pair of pixels gets mapped to 4 lines, with the physically lowest 4 lines (lines 12-15 of each character) mapped to the upper 2 bits of each VRAM byte. Using those bits (the physically lowest 2 pixels on the screen) requires keeping the ‘inverse video’ bit in our attribute RAM in sync with our character RAM – so what happens if we just tell our SY6545A-1 that our font is only 12 lines per character? Voila! Each character is now a 2×3 block of pixels rather than 2×4 – our screen is now 160×75 pixels – *but* we can update all six pixels with only a single write. Even better, we can now update the entire display buffer using the auto-incrementing pointer feature, rather than having to bounce back and forth between character and attribute RAM. We still can’t avoid the tearing effect when we update, but we have much better performance if we’re really trying to push pixels on this thing.

What else could we do? The SY6545A-1 also lets you re-organize how many rows of characters and how many columns per row. Our Kaypro nominally has an 80×25 character display, but anything is fair game as long as it doesn’t require more video RAM or a different pixel clock. When we moved to our 6-pixel format in the previous paragraph, we also ‘shortened’ the drawn image on the display by 25% (since each character was now only 75% as many lines as it had been). The 80×25 format actually wastes 48 bytes of VRAM (since our VRAM is 2048 bytes and 80×25=2000). Our CRT can draw up to 25*16=400 lines, so we can safely reorganize our VRAM into 32 rows (32 x 12 lines = 384 lines total) of 64 columns, fully utilizing our 2048 bytes of VRAM and transforming our 160×75 pixel display into a more pleasant 128×96 display, which also happens to more nicely match the 128×64 display on an Arduboy.

So where are we at? Our original 160×100 pixel, very slow display is now reconfigured into a much faster 128×96 pixels, which also has more convenient striding for manipulating data. We lost 1/4 of our resolution, and gained some of it back with our reorganized screen, but we unfortunately still have the tearing problem due to slow-ish screen updates.

Double the Buffering = Double the Fun!

Is there some magical way we could double-buffer, so that we get crisp screen transitions? There is! Just throw out more data. If we tell the SY6545A-1 that we’re happy with only 16 lines of 64 characters, we now get a probably-still-usable 128×48 pixel display, and half of our VRAM is free *and not displayed.* This means we can update it at our leisure, and then wait for a vertical blanking period to change the ‘start of frame’ pointer. As a ‘bonus’, updating the video display now just means doing a block copy of 1KB of data, which takes half as long.

Scrolling the Kaypro Logo at 50 frames per second!

With my optimized assembly routines, I can now update my display from an array in memory at the CRT’s native rate of 50 frames/sec! This corresponds to a much more pleasing 307,200 pixels/sec – over 1000x faster than filling the screen with the SetPixel() BIOS routine, and a 67x speedup vs using “Video Mode.”

Pixel Fill ModeFrames/Second
160×100 SetPixel()0.02
160×100 SetLine()0.14
160×100 Video Mode0.30
160×100 Turbo Pascal Block Copy (est.)5
160×100 ASM Block Copy (est.)12.5
128×96 ASM Block Copy25
128×48 ASM Block Copy50

Kaypro video performance across different VRAM access modes

The Kaypro’s CRT actually uses a relatively slow-decaying phosphor, so at 50Hz, there is a fair amount of ghosting between frames that one might be able to utilize for temporal dithering. This definitely feels like this opens up some interesting possibilities for the Kaypro that I hadn’t really thought possible. The TurboROM in the Kaypro natively supports a 2MB RAM Disk (which I was actually able to test with an FPGA, although my setup was a bit flaky). Doing full motion video is probably possible – 2MB would naively give me ~41 seconds of 50 fps video with 1KB frames – which would have been an incredible demo to advertise a Kaypro with in 1984!