Cloning NES Tetris In x86 16 Bit Assembly Part 1

Well the title says it all. Today I’ve something really special for you: A Tetris x86 assembly clone! Okay, ‘clone’ might be a bit exaggerated but we will get close to the original. Anyway, I’m certain you’re aware we’re going to work an a very low-level, so this is surely not for the faint hearted. Nowadays we’re pretty spoiled with all those high-level programming languages & compilers doing the dirty work for us. But just imagine: any console game out of the 80s to the mid 90s era has been coded in some specific assembly language.

Don’t be afraid though – we won’t get too deep. As inclined this is not meant to be an exact replica of the game nor some sort emulator. We’re just going to replicate the look, the music and the gameplay, to keep things simpler.

Prerequisites:

  • At least some basic knowledge of assembly. A mid-level knowledge of (x86) assembly and DOS would be even better.
  • Borland’s Turbo Assembler & Linker
  • An actual computer running MS-DOS, equipped with a VGA & Adlib card or a modern PC running the DosBox Emulator. If you decide for the later, there’s a pretty comfortable solution available: Visual Studio code using the Masm-Tasm extension. Give it a try – it’s awesome!

As I just mentioned the look, let’s have a look at what the actual game looks like:

tetrisScreenshot

Isn’t it beautiful? It might not be too obvious in this case but – again – like almost any game out of it’s era it’s tile-based. That means that for example the background composed out of those individual grey tetromino blocks isn’t just a single plain bitmap stored in the ROM – no! It’s made up of repeating smaller tiles, which are 8 × 8 pixels in this case. The reason for this is simple: memory usage. Yeah, back then CPUs were slow and memory was limited. So hard- & software developers alike had to come up with ways to make things more efficient. As we want to feel the pain of those developers, we’re going to remake it using tiles too.

So the first step is getting the actual tiles. The easiest solution is running the game using the famous FCEUX NES emulator and use it’s PPU viewer. PPU is short for Picture Processing Unit, the NES’s piece of hardware responsible for graphics.
If we do this, the PPU view will present use something like this:

tetrisPPUViewer

Don’t be irritated by the colours. The NES handles colours based on a colour palette. If you right-click on either the left or the right view, you can cycle through the game’s available colour palettes. In this case it’s correct yet though. We’re just using the background tiles in the middle of the left view, the chars and three blocks from the right view.
You can see those highlighted here:

tetrisPPU2

Now we need to grab the needed tiles and arrange them side-by-side as a single 600 × 8 pixel image.

nesTetrisTiles

That wasn’t too hard, was it? The next thing we need to take care of is the display resolution of the NES. Depending on the region it’s either 256 × 224 or 256 × 240. The famous DOS VGA mode 13h we’re going to use has a resolution of 320 × 200 though. No matter what, we do have too much vertical pixels we need to fit into 200. So we need to crop & pan the in-game screen like this:

tetrisCropPane

But – hmhm – wait – this is a complete bitmap again! How do we turn it into an individual tile composition? Right! We need a tile editor and do it by hand. The most powerful utility you can find out there is called Tiled.
Though placing the actual tiles is a bit tedious the general process is quite easy:

  • Click File -> New
  • Map Size (Width: 40, Height: 25)
  • Tile Size (Width: 8px, Height: 8px)
  • Click OK
  • In the Layers panel, add an Object Layer
  • In the Object Layer’s properties panel add the previous image
  • In the Tilesets panel add the 600 × 8 tiles strip above
  • start placing tiles!

If everything went well – and you didn’t go insane – the finished project should look like this:

nesTetrisTiled1

At this point we might have been satisfied but I wasn’t. The empty black area to the left and right hurt my eyes so I decided to fill it up with some more tetrominos – free style.

nesTetrisTiled2

You might have noticed that it’s missing the numbers and the complete statistics section. The reason for the later is simple: we’re not going to replicate this part of the game. The numbers – score, level, lines – however will be dynamically added during the game and don’t need to be part of the ‘base’ game screen.

One final step is missing: we need to export the actual map data. To do this;

  • go to File -> Export as…
  • give it a filename and choose CSV files for the filetype

Now we have a nifty little file containing nothing but numbers and commas.

Those numbers will later be used to directly reference an individual tile from the tile map – which in turn is just another very long (600 x 8 = 4800) sequence of bytes stored in the Data section of the assembly source file.

As you’re looking at all those numbers that should go into Data section, you might be asking yourself, is it really worth the hassle? Can’t we just store the background as a byte sequence itself? Yes, we could have but as I said this is a way so save memory. Well let’s see if we save memory anyway.
The screen dimesions are 320 x 200, each pixel would need a single byte.
320 x 200 = 64000 bytes
The pixel data of the tiles is 600 x 8 = 4800 and the tile map itself 40 x 25 = 1000
So 4800 + 1000 = 5800 bytes
64000 – 5800 = 58200
Cool! We indeed saved 58200 bytes!

We almost come to the point we can actually start coding! Unfortunately we have to look into something extremely important before.
If you remember, I said we’re going to use the VGA mode 13h. In the 90s this was a popular choice for DOS games for compatibility reasons and because unlike other video modes it allows to directly access the video RAM – and even more importantly – in a linear way. This wasn’t the case with other video modes.

The video RAM starts at memory location A000:0000 and goes up to A000:F9FF. This gives a range of FA00 bytes which is 64000 in decimal. This number is no coincidence of course. If you remember the display resolution – 320 × 200 – and actually multiply those two numbers, what do you think do we get? Yes! It’s 64000!
So every value inside the FA00 range directly maps to an on-screen pixel. Now I bet you’re used to reference a single pixel by it’s horizontal and vertical screen position.

screenPixel

As we’re at a very low-level with assembly there is no such thing as a x and y position. We’re going from left to right and line by line all the way from the top to the bottom.
So the pixels from x:0 y:0 to x:319 y:0 map to the memory region A000:0000 to A000:0139.
If we want to get the memory location of the pixel in the illustration above, we need to add it’s y position multiplied by 320 to x.

x + y * 320 == 90 + 60 * 320 == 19290 == A000:4B5A

Keep this in mind – we’ll need to use it a lot.

The other important thing to talk about is the way we actually need to give a pixel a colour. If you’re familiar with newer high-level programming / scripting languages or maybe even your favourite painting application, you’re used to directly giving a single pixel a red, green, blue and perhaps an alpha value. For example 0xff0000 for a red colour. VGA mode 13h offers 262144 different colours but those can’t be on-screen at the same time. Instead it’s organized into a palette of 256 colours.
There’s a way to alter the colours of course but the standard palette looks like this:

mode13hPalette

This gives us colours ranging from 0 to 255 and those are the actual numbers we need to write to the video RAM. Wanna have a yellow pixel at x:0 y:0 ? Look up the map above, yellow is near the end of the first row, so it’s value is 14 and we need to write 0eh to A000:0000.

As a side note: the individal colours inside this palette are of course also based on red, green and blue values – they just don’t range from 0h to ffh – it’s 0h to 40h. If you calculate 40h^3, you get the magic number of 262144.

There’s another reason I’m mentioning this. If you remember the tile map we created a bit earlier is actually a PNG file. Now we could of course convert this to a 256 colour or less GIF image and write an assembly GIF loader to get our tiles but that’s not what we’re going to do for two reasons:

  • the GIFs colour palette won’t match the standard VGA palette so we would have to modify either the GIF or the colour palette
  • we’re not going to write a GIF loader – we directly embedd the pixel data that make up the tilemap into the sourcecode

If we go back and take a look at the tile map, we can see there’s not too much different colours – it’s just six. Luckily we have perfect and almost matching equivalents in our standard VGA palette.

tetrisColours

The first value in parentheses is the RGB value while the second corresponds to the colour in the palette. Don’t be irritated by the 0x prefix and the h suffix. Both are ways to mark a hexadecimal number but mean the same thing. It’s just that in the colour world the 0x notation is more popular while the other is used almost in any assembly – that’s why I use both.

There’s a bigger problem though. How do we get the palette values for each pixel of the map? It’s 600 x 8 pixels! Well, we could sit down and indeed write down all the values by hand – but even for my taste this is a bit too much. We need some kind of alien technology – JavaScript to the rescue!

Yes, I wrote a little JavaScript tool that loads the PNG version of our tile map, checks each pixel and writes the corresponding colour table value to the console. A looong array of numbers.

Here’s the tool for you to check out:

That’s it for the dirty work – we can finally start coding!
The bare-bones assembly source file we’re about to fill looks like this initially.

As with assembly language we’re dealing with hex numbers a lot e.g. for referencing registers it’s easy to get lost sooner or later. So it’s best to give yourself a reminder what a specific value is good for. This is where the EQU directive comes into play. It’s pretty similar to #define in C – it let’s you define a constant.
VIDEOMEMORY EQU 0a000h ;assign the value 0a000h to the symbol VIDEOMEMORY

The LOCALS @@ directive gives us the power to define ‘scoped’ labels. Without it, we wouldn’t be able to have a label name e.g. MyLoop: in more than one procedure.

In the .DATA? section we define uninitialized variables – thus variables that don’t have a value yet.

The .DATA section however lists initialized variables. This data gets written directly to the resulting .EXE file. We’ll use it for e.g. the map and tiles data.

Now there’s the .FARDATA? section. There goes uninitialized variables too but it’s not the same as the .DATA? section. What?

Time for a nifty side-story.
Programming something for a 8086 – 80286 wasn’t that easy. It is equipped with a 20bit address bus – simply speaking a channel that transmits data between e.g. the CPU and the RAM. With 20bits, it’s able to directly access 1048575 (2^20 – 1) bytes of memory. Up to the 80386DX though the CPU just had 16bit registers – so the maximum directly accessable memory lowered to 65535 (2^16 – 1) bytes. This led to the invention of Memory segmentation. Instead of using a single register to refer to a memory location, two are used: one that holds the segment and another that holds the offset inside the segment.
If you remember as we talked about the video RAM you’ve seen this address:
A000:0000 – A000 is the segment, 0000 the offset.

So: DATA, FARDATA? and CODE are completely independent segments. The contents of DATA? gets written to DATA by the assembler.

Logically a single segment just can store up to 65535 bytes (16bit, 0h-0ffffh) so things start to get tricky if you have data bigger than this – as is the case with our game! For performance reasons we need to write the onscreen pixels to a offscreen buffer first. As the screen is 320 x 200 pixel, we need 64000 bytes just for the buffer. That’s almost a whole segment. So the whole purpose of the .FARDATA? section is reserving bytes for the videoBuffer.

…end of nifty side-story.

The last important section is .CODE. Now guess what this is meant for.

As you can see there’s a lot of CALL instructions. Those are equivalent to function calls in any other language though it’s called a procedure in assembly. We’ll define those procedures in a bit – let’s look at this two lines first:
MOV AX,@DATA
MOV DS,AX

The DS register (Data Segment) should point to the starting address of a DATA segment. As we could have multiple of those, we tell it to use @DATA – a reference to the .DATA & .DATA? segments.

Let’s define our first procedure switchVGA. You surely noticed the descriptive names I always try to give.

Nothing magic in there – it just switches from standard DOS textmode to VGA mode 13h using DOS interrupt 10h.

Time for the first really complicated looking procedure paintMap:

As I indicated in my nifty side-story, the videoBuffer and the .DATA section are inside two different segments. Since we want to move pixel data from .DATA to videoBuffer we need to store it’s address into the ES register (Extra Segment). The DS register points to the .DATA section yet.
MOV BX,SEG videoBuffer
MOV ES,BX

The rest of the procedure – though really confusing looking – is really not more as the assembly equivalent of a nested for-loop. For convenience I even named the Labels ForLoopA and ForLoopB.
Essentially this procedure goes over the bytes stored at map. This is the data output of Tiled.
Let’s make this more concrete. Here’s the second line of the map data:

The fact that it’s the second line is quite important yet. As each tile number represents a 8×8 pixel screen area we know that it needs to be put 8 pixels from the top (1 x 8). Likewise, if it would have been the third line 3 x 8 = 24. Anyway, let’s decide for the 7th element – 49. Aah, it’s the 7th tile horizontally and a single tile’s height is equal to it’s width: 8 pixel. So 7 x 8 = 56.
Cool, the target tile has to go to x=56 and y=8 but which tile? Now the 49 comes into play. If you remember, we arranged the tiles from the original NES Tetris side-by-side as a single 600 x 8 PNG image, which will be stored as a byte sequence inside the code. Since each tile is exactly 8 pixels wide, we can obtain it’s offset by simply multiplying the tile number by 8 – 392 – which will be stored in the bitmapOffset variable.

LEA SI,map
..
MOV AL,BYTE PTR DS:[SI]
MOV BX,TILEWIDTH
MUL BX
MOV bitmapOffset,AX

This variable, alongside vramOffset which holds the target screen position is then consumed by the drawTile procedure.

This procedure doesn’t do anything too magical either. The DS and SI register point to the location of a tile’s pixel data and ES & DI to the target screen position. So all we need to do is copy the data from one to the other. As we want to do this as efficient as possible, uhm at least as efficient as we can go with 16bit x86 assembly – we don’t do it byte per byte – we’re utilizing the MOVSW instruction, which copys two bytes a time – 16bit.

Awesome! Now we have the complete in-game screen stored inside videoBuffer but this won’t display anything yet. Patience young padawan (yes, I know this is not an actual quote from some well-known movie)!

There’s two more procedures to look at, foremost waitRetrace:

Wait…what? Retrace? Huh? This is something most of you surely never ever heard of as it involves hardware which is almost obsolete nowadays. Few of you might remember though. Before those fancy, slick LCD panels we have today, there was the Cathode Ray Tube, which served as a PC monitor or television set. While todays monitors directly access each individual pixel at the same time, back then there was a single electron beam going down from the top-left of the screen to the bottom-right – line by line. As soon as it reached the bottom-right, it had to move back to the top-left to draw the next frame. This doesn’t happen instantly, so the movement and the time it takes to travel back is the vertical retrace. You’re never too old to learn but why do we have to care? While we can directly put pixel data to the video memory, this also doesn’t happen instantly. For our 320×200 screen area there are 64000 bytes to fill. So it might happen that we’re writing to the video memory, while the monitor is still busy painting the previous frame. This will result in heavy flickering. To make sure we are in sync with the refresh rate of the monitor, we have to wait until it’s in the vertical retrace phase and write to the video memory right after.
As i said, writing to the video memory is kind of slow, that’s why we wrote the bytes to the offscreen buffer videoBuffer using the paintMap procedure.
Now that we know the monitor is in vertical retrace, it’s time to copy the bytes from the buffer to the video memory. This is taken care of using moveToVideoRam:

Just like the drawTile procedure it copys two bytes a time from one memory location to another: the VIDEOMEMORY.
So finally we have all the procedures we need to draw something on screen.

I’m sure you’re keen on seeing something now but don’t want to glue all those pieces together by hand. Don’t be afraid, I’ll give you the source file shortly.
This is a good time to entrust you to Visual Studio code and the Masm-Tasm extension. It has been a life-saver during writing of this article. Visual Studio Code is a great code editor and the extension also redirects messages from the assembler to it. Hell, there’s even syntax highlighting! Just imagine having that back in the 90s!
If you’ve installed both and opened the source file, you just have to right-click your screen and select: Run ASM code

vsCodeASM

Young padawan – now is the time – here is the source file: nesTetris1.asm

I don’t expect something to go wrong, so you should see the following screen:
dosBoxNesTetris1

Be prepared for the next article in this series! We will look into replicating the game’s background music using the Adlib sound card. This might be the journey of your life!

Leave a Reply

Your email address will not be published. Required fields are marked *