As part two (see previous attempt) of my ongoing series in ‘computational necromancy,’ I’ve spent the last year and a half or so constructing my own 1/10-scale, binary-compatible, cycle-accurate Cray-1. This project falls purely into the “because I can!” category – I was poking around the internet one day looking for a Cray emulator and came up dry, so I decided to do something about it. Luckily, the Cray-1 hardware reference manual turned out to be useful enough that implementing most of this was pretty straightforward. The Cray-1 is one of those iconic machines that just makes you say “Now that‘s a super computer!” Sure, your iPhone is 10X faster, and it’s completely useless to own one, but admit it . . you really want one, don’t you?
The Cray-1A Architecture
Now, let’s get down to specs – What is this bad boy running? The original machine ran at a blistering 80 MHz, and could use from 256-4096 kilowords (32 megabytes!) of memory. It has 12 independent, fully-pipelined execution units, and with the help of clever programming, can peak at 3 floating-point operations per cycle. Here’s a diagram of the overall architecture:
It’s a fairly RISC-y design, with 8 64-bit scalar (S) registers , 8 64-bit/64-word vector (V) registers, and 8 24-bit address (A) registers. Rather than a traditional cache, it uses a ‘software-managed’ cache with an additional 64 64-bit words (T registers) and 64 24-bit words (B registers). There are instructions to transfer data between memory and registers, and then register-to-register ‘compute’ instructions.
One of the coolest aspects of this machine is that everything is fully pipelined. This machine was designed to be fast, so if you’re careful, you can actually get one (or more) instruction every cycle. This has some interesting implications – there’s no ‘divide’ instruction, for instance, because it can take a variable amount of time to finish. To perform a divide, you need to first compute the ‘reciprocal approximation’ (something we *can* do in exactly 13 cycles, it turns out) of the denominator value, and then perform a separate multiply of that result with the numerator.
The vector instructions are particularly cool. A vector Add operation might take only 5 cycles to start producing results (remember, each vector can hold 64 values, so it takes 5 + 64 cycles to finish adding). Why wait for it to finish though? We can take the result output from the adder, and “chain” it straight into another vector unit (say a multiplier). And *that* only takes another 10 cycles or so, so we can chain that result into yet another unit (say, reciprocal approximation). Now, rather than waiting for the first operation to finish, we’re computing up to 3 floating point calculations per cycle. Clever programmers could sustain about 2 floating point operations per cycle, or 160 million instructions per second.
The Hardware
The actual design was implemented in a Xilinx Spartan-3E 1600 development board. This is basically the biggest FPGA you can buy that doesn’t cost thousands of dollars for a devkit. The Cray occupies about 75% of the logic resources, and all of the block RAM.
This gives us a spiffy Cray-1A running at about 33 MHz, with about 4 kilowords of RAM. The only features currently missing are:
-Interrupts
-Exchange Packages (this is how the Cray does ‘context-switching’ – it was intended as a batch-processing machine)
-I/O Channels (I just memory-mapped the UART I added to it).
If I ever find some software for this thing (or just get bored), I’ll probably go ahead and add the missing features. For now, though, everything else works sufficiently well to execute small test programs and such.
The Software
When I started building this, I thought “Oh, I’ll just swing by the ol’ Internet and find some groovy 70′s-era software to run on it.” It turns out I was wrong. One of the sad things about pre-internet machines (especially ones that were primarily purchased by 3-letter Government agencies) is that practically no software exists for them.
***** If Anyone has any Cray-1 software, please contact me!! If you work at one of the National Labs, please take a look!****
After searching the internet exhaustively, I contacted the Computer History Musuem and they didn’t have any either. They also informed me that apparently SGI destroyed Cray’s old software archives before spinning them off again in the late 90′s. I filed a couple of FOIA requests with scary government agencies that also came up dry. I wound up e-mailing back and forth with a bunch of former Cray employees and also came up *mostly* dry. My current best hope is a guy I was able to track down that happened to own an 80 MB ‘disk pack’ from a Cray-1 Maintenance Control Unit (the Cray-1 was so complicated, it required a dedicated mini-computer just to boot it!), although it still remains to be seen if I’ll actually get a chance to try to recover it.
Without a real software stack (compilers, operating systems, etc.), the machine isn’t terribly useful (not that it would be all that useful if I did have software for it). All of the opcodes and registers for the Cray-1 are described in Base-8 (octal), so I did at least write a little script to translate octal machine code into the hexadecimal format that Xilinx’ tools require. All of my programming so far has just been in straight octal machine-code, just assembling it in my head. I have started work on re-writing the CAL Assembler, but that may take awhile, as it employs some tricky parsing that I’m having to teach myself.
Makin’ it look pretty
What’s the point of owning a Cray-1 if it doesn’t *look* like a Cray-1?? Unfortunately, the square-shaped FPGA board isn’t conducive to actually making it the traditional “C” shape, but I think it turned out pretty cool anyway. My friend Pat was nice enough to let me use his CNC milling machine to cut out the base pieces (and help with assembly). It’s a combination of MDF, balsa wood and pine. There was also a healthy dose of blood, sweat and tears (and gorilla glue) involved.
Some random photos from the build process:
The base after being glued together
The top C-section being glued together
The pieces before painting
The wife assisting with some detail work
The pleather couch-seat really ties the whole thing together
Finally, Computer Engineer Barbie has an appropriate place to sit down!
This is awesome! How can I build my own?
This is very much a work-in-progress, but if you’d like to join in the fun, feel free! All you need is a copy of the RTL (almost all Verilog-2001) and a Spartan-3 1600 or equivalent FPGA board. The code is likely riddled with bugs and questionable implementation choices at this point, so on the off-chance anyone actually downloads this, feel free to lend a hand and send me any bug fixes you might make!
*UPDATE*
I finally had some more time to work on this. The updated code includes faster implementations of the multiplier units (it runs up to 50 MHz on my Spartan-3E!), as well as support for context-switching (“Exchange Packages”). There is still no support for I/O channels, but the 8-bit memory-mapped UART was replaced with full 64-bit UART.
As well as improved hardware, this release also includes a lot of progress on the software front. It includes a more-or-less complete implementation of the CAL assembler, re-written in Python, as well as a utility for generating Xilinx-friendly memory initialization files. Writing in CAL is *way* easier than writing in octal machine code. Code is also included for a simple BASIC-like language I’ve started playing around with (basically useless, as it doesn’t output valid Cray-1 assembly yet, but possibly interesting to look at).
Finally, an archaic DOS-compatible Cray X-MP simulator surfaced! It’s a single-processor simulator (the X-MP models where essentially 2-4 processor Cray-1s), so it is essentially just a Cray-1 sim, but it does work pretty well if you just want something to play with.
Get it here! Cray1_r2.zip
**UPDATE**
The project is now hosted on Google code: The Cray-1X Project












Pingback: Finally finished! » NYC Resistor » Electronics, Hacking, Classes, and Workspace.
Pingback: Build Your Own Miniature Cray SuperComputer - How-To Geek ETC
Pingback: Interesting Reading #565 – Huge ePaper screens, Superefficient LCDs, Amazing BEAR robot, Backscatter vans see all and much more… – The Blogs at HowStuffWorks
Pingback: Justin Blanton | Homebrew Cray-1A
Pingback: The CPU Shack Museum » Blog Archive » Homebrew Cray-1A – 1976 vs 2010
Pingback: Tiny Cray-1 courtesy of an FPGA - Hack a Day
Pingback: Hard Drive Archaeology – And Hackerspaces | Internet Archive Blogs
Pingback: Programmable Planet - Max Maxfield - Floating-Point Performance in FPGAs
Hi Chris,
I found another bug in the cray code!
In float_recip.v, cur_j is never assigned.
I haven’t studied the code very carefully, but I suppose that the
unit also needs the cip as input in order to figure out its operands.
Jonas
Awesome build! It’s inspired me to get off my lazy butt and finish some projects
That s**t Cray
i scrolled to the comments immediately just to see if this comment was here.
Epic project is EPIC! Well done, absolutely brilliant.
Pingback: Infracritical » Back to the Future?
CDC and Honeywell used disk drives made by their joint venture Magnetic Peripherals, Inc. If Cray used drives from the same source, you may be able to read the disk pack with an easier-to-find Honeywell minicomputer drive. We had 20 surface packs that looked like the one in your photo.
Pingback: Cray Computers and Space Exploration
I don’t know if this would be any help, but
“The Last Starfighter (1984)
The first movie to do all special effects (except makeup and explosions) on a computer. All shots of spacecraft, space, etc were generated on a Cray X-MP computer.”
With an IMDB pro membership, someone can get the contact information for the special effects and optical effects companies.
They might know where code is.
Still love this build,
Good luck.
Have you tried getting in touch with the British History Museum in London? They have a Cray in their computing section. Whether they also have manuals or software is unknown.
Quite possibly the coolest hack ever. For an encore performance, consider immersing the FPGA board in Fluorinert coolant and overclocking that bïtch!
Pingback: Lazy Reading for 2012/10/14 – DragonFly BSD Digest
Hi Chris. Very nice project. Don’t know whether this may help you, but you could also try the peolpe (and machines) at http://www.cray-cyber.org, and maybe at rus.uni-stuttgart.de (Computing/Computer Science? Center of the University of Stuttgart, Rechenzentrum der Universität Stuttgart) who (if I remember right) were owners of a Cray-1 and/or Cray-2 back in my student days. Kind regards, Jörg