Cloning NES Tetris In x86 16 Bit Assembly Part 2: The Music

In the last part of this series, we created the impressive looking static background of NES’s Tetris with ‘just’ a little bit over 200 lines of assembly code – sarcasm added.

Today we’re looking into another important part of the game: the background music.
The game featured three selectable tunes. Unfortunately those weren’t as catchy as that particular tune composed for the GameBoy port of Tetris. Nevertheless, Hirokazu Tanaka did a good job! We aren’t going to work on all three tunes, one is enough. I decided for ‘Music 2’ and here’s a first impression:


With a total of five channels, including two squarewave, one triangle, one noise and one DPCM, the NES was able to output some decent music.
In contrast, a standard IBM compatible PC that time lacked all of this. Okay, we must be fair, a PC was meant to be a work horse and not an entertainment system. So all it had to offer was a built-in speaker which was able to play a single tone. To give you a better picture, here’s the title tune of Prince of Persia:


Lovely, isn’t it? You won’t believe, it can even get worse. A popular trick in the good old days to compensate for the lack of additional playback channels was faking chords by playing arpeggios instead. The Commodore C64 made extensive use of this – and it sounded quite good there. On the PC however things didn’t get better with this technique. So here’s part of the intro music to LHX Attack Chopper:


I bet you now have the feeling I had back then. I felt like just having two options:

  • damn all your friends with their game console’s or Amiga’s glancing down at you
  • turn off the sound

Luckily remedy was on it’s way and was called: the Adlib soundcard which was released in 1987. Honestly it wasn’t the first try to take care of the primitive PC sound. A few years earlier the Tandy Sound Chip saw the light of the day. With a total of 4 simultaneous voices – 3 squarewave and a noise channel – its sound was close to that of a NES. Its soundchip, the TI SN76489 deserves a special note. It originated from the TI-99/4 homecomputer and it’s followup the 4a was my first computer and I must say I really loved it’s sound. For PCs though the Tandy sound wasn’t a big success. Nevertheless it’s soundchip found use in some other 8bit game consoles. Let’s get back to the Adlib. With it’s whooping total of 9 channels and FM synthesis capability, it set the standard what modern PC games sound like – for many years. I must confess I never had an Adlib. I’ve entered the PC sound world with a Soundblaster (thanks dad), which was equipped with the same Yamaha OPL2 soundchip as the Adlib – thus fully Adlib compatible.
If you’ve rigged your PC with either of those, things started to sound quite different – not to say like a whole new world.
Here’s the title track to Prince of Persia once more, played back by an Adlib this time:

Can you hear the difference? Yeah, the Adlib was definitely a great invention and put PC music to the next level. I remember the words of one of my friends as I introduced him to Prince of Persia: ‘Wow – that sounds great!’.
Unfortunately – again like almost anything at that time – the Adlib was not that easy to program. We’ll get into that shortly, there’s other stuff we need to look into first, most importantly:

Where do we get the notes to the Tetris music?

Of course it’s stored in the game ROM – in it’s own proprietary format. There are ways to rip the bytes but it’s very cumbersome as the music is not stored in a general way and varies from game to game. This lead to the development of the Nintendo Sound Format .nsf but getting the notes out of it ain’t that much easier. Since our goal is not to create a 1:1 replica of the game I decided for something different: a plain Midi .mid file re-creation of the music. The Adlib can’t handle it directly but we can easily read out the notes.
The folks over at have a version of music b available for download.

Generaly speaking and simply put, a Midi file is just a sequential collection of Midi Events – commands to control another device like a synthesizer.
In a human-readable way it would be something like this:
-at time 0.01 I want a Piano sound with a toneheight of A4 at the highest possible volume
-at time 0.50 I want to stop the Piano sound
-at time 2.00 I want a Piano sound with a toneheight of C4 at the highest possible volume

Essentially this it what is written inside a Midi file, of course in a more abstract way. I won’t get into all the details. It’s important though that the start & end times are not fixed times, instead it’s a delta time based on the previous Midi event.
The things we are interested in are the toneheight, the start time and the duration thus in pseudo code we need to do something like this:

open midi file
get midi events
create notes array
while(midi events available)
get midi event’s delta time and fill notes array with 0h up to this time
if midi event is Note On -> put tonheight in notes array
if midi event is Note Off -> put 080h in notes array
The toneheight we read out from the Midi file is not an actual tonheight from a musical point of view. It’s a hex number ranging from 0h to 7fh, 0-127 decimal. Each value maps to a specific tonheight.
Here are some examples:

Hex value Note name Frequency
18h C1 32.70hz
48h C5 523.25hz
54h C6 1046.50hz

As you can see, a hex value of 65h represents a frequency of 1046.50hz. Easy you might think,  we just need to transfer that frequency to the Adlib somehow, right? Wrong!
As I hinted, the Adlib was a torture to program. For some reason the engineers decided to split up the frequency into a frequency number and a block number. The frequency number is a 10bit number up to 1023, which does not actually represent a frequency in a musical sense and the block number is a 3bit number up to 7.
Furthermore the lower 8 bits of the frequency number and the 2 remaining upper bits have to go into two different registers. The block number is sometimes referred to as the octave bit. Though that’s not entirely true, we can use that to our advantage. If you look at the table above, you notice that C6 is exactly double the frequency C5. This is true for any 12 semitone interval. We can do something similar with the octave bit. If we determine 12 frequencies as some sort of a base frequency for the notes c0 – b0, we just need to increment or decrement the ocatve bit by 1 to get one octave up or down.

Note C C# D D# E F F# G G# A A# B
342 363 385 408 432 458 485 514 544 577 611 647

Say we want to have a F5, from the above we can see that we need a frequency number of 458 and block number 5. Cool, we have two numbers now but what are we going to do with these?
458 in hex is 1cah. The lower 8 bits 0cah, go into the 0a0h register and the upper bits 01h go into the 0b0h register. The remaining 6 bits inside the 0b0h register are reserved for the 3bit octave number and one bit is used to activate a note.
So the 0a0h and 0b0h registers have to look like this:


We need to understand how to stuff individual bits into the 0b0h register and where those 20h and 14h numbers do come from!
Let’s start with the easier. To set 5 bit of the register to 1, we need to write 20h to the register. Why 20h? The binary representation of that number is 00100000, as you can see bit 5 is 1!
The remaining upper bits of our frequency number are 1h – in binary it’s 00000001

So if we take both numbers and do a bitwise OR operation on those we get 00100001.
The result of a bitwise OR operation is 1, if one of it’s source bits is 1. Two 0 for example are always 0.

You might have guessed, it’s almost the same for the octave bit. We want to set it to decimal 5, which is 5h. It’s binary form is 00000101 – uhm – wait – it needs to be 00010100 ! Yeah, indeed the bits need to be shifted to the left two times. Mathematically this equals a multiplication by 4 – hence you know where the 14h value comes from.
Again we need to do a bitwise OR and finally have the complete value of 00110101, 35h, which goes into the 0b0h register.

Like in the first part of the tutorial I had to write my own tools to generate the frequency table and the actual note data. It’s kind of special so I won’t post the code this time – you need to satisfy with the results.

And here’s the notes data for the first of the three channels the Tetris music actually uses.

While your looking at the notes data, you might be wondering why we do waste that much bytes with zero values. Indeed that’s not very efficient but it makes things a bit easier. In the original Midi file each Note On/Off event had a time value attached. In our bytes ‘array’ above, the position of an element is the actual time. We just need to use a timer to look at a position, if it’s a note play it and increment the position. Next time the timer fires we look at the next position and so on.
The elements aren’t arbitrarily spaced of course. The Midi file had a resolution of 96 ticks per beat – think of it as the relation between minutes and seconds: 1 minute equals 60 seconds -> 1 beat equals 96 ticks. If we carry over that logic to our music, to go from one beat to the next, we need 96 numbers. That’s a bit too much and we don’t need that precision, so I divided it by 8, which gives us 12 ticks per beat. The whole song is 88 beats at 150 Beats Per Minute, roughly 35 seconds. At 12 ticks per beat we need 1056 values – per one of it’s three melody channels.

Speaking of timers – this brings us to the next important thing: how to set up a periodic timer!

Starting with the Intel 8080 processors and later with the x86 as well, PCs were and still are equipped with a Programmable Interval Timer. Back then it was an extra chip soldered to the mainboard while today it’s embedded into the southbridge.
This PIT runs with a frequency of 1193182hz. Now you might come to the conclusion, that based on this frequency it counts from 1193182 to zero and every time that happens it sends a signal, meaning a second has passed. That’s just partly true. The PIT, just like the original 8086 being a 16bit processor, is equipped with – yes – 16bit registers. The maximum number that fits into a single 16bit register is 65535, that’s also the maximum we can count down from.
So the frequency of the PIT is divided by 65535 which gives us roughly 18.2. If we furthermore divide 1 second by 18.2 we get around 54.92ms.
Now we could set up a timer that fires every 54ms but is this fast enough?
Let’s examine the Tetris music. It’s BPM value is 150 and a single beat is subdivided into 12ticks.

1000ms / (150bpm / 60s * 12ticksPerBeat) = 33.33ms

Uhoh, that’s a difference of 21.59ms, our song will play way too slow! What should we do?
Surely you spotted the word Programmable in Programmable Interval Timer. Yes, we can re-program it do run at a different rate.
Doing it is actually quite easy. The 65535 we subdivded the chip’s frequency by needs to replaced by a lower number. If you look carefully at the equation above, we’re almost there. We just need to scrap the 1000ms at the beginning.

1193182 / (150 / 60 * 12) = 39772.7 = 9b5ch

To re-programm the timer chip, we need to utilize it’s ports 43h and 40h respectively. So something like this would work out of the box:

There’s a catch though. The PCs system clock is chained to the PIT too. If we simply speed up the timer, the clock will do so too. The usual workaround is setting up an own interrupt handler, which keeps track of the time passed and calls the original interupt handler every 54ms. That’s what we’re going to do too – even though a bit simplified.
The original interrupt should fire every 54.9ms, while our speedy devil fires every 33.3ms. If we divide 54.9 by 16.6 we get 1.6 -> and there’s the problem: we would need to call the original interrupt every 1.6 invocations but of course we can’t call it in-between!
Let’s set up a new variable timerCount. The first time the new interrupt handler is called, we know the timer chip count up to 39772 – (9b5ch) and we store it to timerCount. The second time it’s called another 39772 ticks have passed but if we add it to timerCount it will overflow as it just can store a value up to 65535 (0ffffh). The CPU tells us about this by setting the overflow flag to 1. So as soon as the overflow flag is set, it’s time to call the original interrupt handler. Conveniently the overflow (14008==2 * 39772 – 65535) is kept inside timerCount, so th next time the new interrupt handler is called it will add 39772 to 14008 and so on.

The OUT instruction sends an End Of Interrupt to the Programmable Interrupt Controller.
What pops straight into the eyes is pushRegisters and popRegisters. It’s a macro call. As we’re limiting ourself to 8086 code, there is no PUSHA or POPA instruction for saving the processor’s registers. It was not part of the instruction set prior the 80286. We might need to save all registers in many places so I’ve wrapped it inside a macro.

Let’s recall, what did we actually want to do with this? Yeah, we wanted to feed the Adlib card with two numbers. If you scroll back a little bit, you’ll discover it’s 0cah to the 0a0h and 35h to the 0b0h register.
To access the Adlib, we need to utilize it’s I/O ports at 338h and 339h. As we constantly need to send something, it’s best kept in it’s own procedure.

With assembly there’s no such thing as a parameter you can send to a procedure. Instead you’re either using registers or variables, populated just before the call. I decided for the AX register. AL holds the Adlib’s target register and AH the actual value.
If we want to send cah to 0a0h, we need to do it like this:

Pretty easy, isn’t it? I split it into two MOVs for readability. Sometimes it’s a bit hard to keep track of what a single register is supposed to do.
The above could also be written like this:

If we take a closer look at the sendAdlib procedure, we realize there are two labels called Delay. If we examine what it does, it appears it’s just reading something from Adlib’s 339h port but doesn’t actually use it.
I’m not sure if I’ve already mentioned it (okay, I’m fully aware I did): the Adlib was a bit hard to program. The problem is, the time the Adlib entered the market, it was all 8086 processors out there. Later on with the 80286 and 80386+, clock cycles were much faster and it might happen that you send something to the Adlib while something previously sent wasn’t processed yet. So these two loops of nothing but waiting make sure the Adlib had enough time to process previous data.
As a side note: We’re using DosBox, which emulates the OPL2 chip of the Adlib completely in software. I’m not sure if they’ve replicated this behaviour too. You surely need the wait states on the real hardware though.

Anway, all we have done so far just accomplishes a single task: select a frequency for a single channel. The Adlib offers 9 of those which essentially means that we can have 9 different sounds playing at the same time. There’s another mode of operation which offers 6 sound channels and 5 percussion instruments but we won’t use it.
Let’s think about something. By sending 0cah to Adlib’s 0a0h register, what channel did we actually set? It’s targeted by the lower 4 bits of 0a0h.

0a0h  Channel 1
0a1h  Channel 2
0a2a  Channel 3
0a3h  Channel 4
0a4h  Channel 5
0a5h  Channel 6
0a6h  Channel 7
0a7h  Channel 8
0a8h  Channel 9

Wanna give channel 5 a frequency instead of channel 1? Send 0cah to 0a4h.

To make the Adlib output something incredible it’s not enough to set a frequency for a channel though. The actual sound is generated using two operators. In FM synthesis the output of a modulator is sent to the input of a carrier, thus the carrier is modulated by the modulator.

Consequentely each channel has two operators. To access those operators we need to utilize the Adlib’s registers again.
Altogether it would be the registers 20h-35h, 40h-55h, 60h-75h, 80h-95h, 0e0h-0f5h

Don’t be afraid though. We aren’t going to cumberly play with numbers to define a certain sound. I’ll introduce you to a powerful tool just in a bit. I just want to make sure you understand the deal with operators.
Let’s take a look at the first range of registers: 20h-35h. Two of those are used per channel – and this register set among other things controls if a particular channel’s modulator or carrier operator should use vibrato and tremolo.
The way which register maps to what channel is a bit, uhm, confusing.

  • the modulator for channel 1 is register 20h, the carrier 23h
  • the modulator for channel 2 is register 21h, the carrier 24h
  • the modulator for channel 3 is register 22h, the carrier 25h

This logic is only valid up to channel 3. Why? As you can see we can’t use register 23h as the modulator for channel 4, it’s the carrier of channel 1 yet. So for channel 4 we need a higher register number for the modulator. Unfortunately it ain’t the last carrier register (25h) +1, which would be 26h and comprehensible at the same time. I don’t know the reason but Yamaha’s engineers decided to leave a gap in between. The first modulator register for channel 4 is 28h. It’s always true though that the carrier is the modulator’s register +3.
Here’s the complete register layout;

Channel # Modulator Register Carrier Register
1 20h 23h
2 21h 24h
3 22h 25h
4 28h 2bh
5 29h 2ch
6 2ah 2dh
7 30h 33h
8 31h 34h
9 32h 35h

Yeah, definitely confusing. Luckily the logic above is valid for the registers 40h-55h, 60h-75h, 80h-95h, 0e0h-0f5h as well.
But there are 3 registers left that need to be filled: 0a0h, 0b0h and 0c0h. We already discussed registers 0a0h and 0b0. All three directly control parameters for a specific channel, so the lower 4 bits 0c0h – 0c8h refer to channels 1-9.

Let’s sum up – to setup an instrument for channel 1, we must:

  • set a modulator and a carrier using register 20h and 23h
  • set a modulator and a carrier using register 40h and 43h
  • set a modulator and a carrier using register 60h and 63h
  • set a modulator and a carrier using register 80h and 83h
  • set a modulator and carrier waveform using register 0eh and 0e3h
  • set channel properties using register 0a0h
  • set channel properties using register 0b0h
  • set channel properties using register 0c0h

There is a reason that except for the 0a0h and 0b0h registers, I didn’t tell you what all the other registers are supposed to do. It’s simply because we aren’t going to mess with those parameters ‘by hand’. Instead we’re utilizing a third-party DOS tool called: SBTimbre.
Here’s a screenshot:


It’s a powerful tool which let’s you play around with all those parameters and ultimately create an instrument preset which can be used with an Adlib card.
To use this instrument, we first need to click on File -> Export to SBI , which will save our preset as a .SBI file – short for SoundBlaster Instrument. Unless we’re going to write an assembly .SBI parser for the game this file would be pretty useless. It’s usefulness will become more obvious if we open the .SBI file inside a Hexeditor.
Take a look at this screenshot:


We can see the File header from byte 0 to byte 3 [SBI.], followed by 32 bytes which are reserved for the instrument name [square2   Created by SBTimbre]. In the screenshot above it’s the blue section. What we are interested in is the green section which will always start at offset 24h and in our case 11 bytes long. [f2 50 14 80 e3 e1 67 36 04 02 0e].
Those values map directly to specific Adlib registers!
If we want to use the 11 bytes sequence as an instrument for channel 1 of the Tetris tune, we need to write it to the following Adlib registers:

Register 20h 23h 40h 43h 60h 63h 80h 83h 0e0h 0e3h 0c0h
Value 0f2h 50h 14h 80h 0e3h 0e1h 67h 36h 04h 02h 0eh

That’s it! Of course we could have made things more comfortable by writing another JavaScript helper tool that parses the .SBI file and prints out the proper values but I was too lazy this time. Come on, it’s just 11 values. Should I? Okay, I’m persuaded, here it is:

If you don’t have your own .SBI file yet, try this:

This will give us the following:

Now we need a procedure that reads out a byte from the variable squareLead and writes it to the corresponding byte in instrumentRegisters – which holds the proper Adlib register the value should go to. That’s the purpose of setInstrument:

Prior calling the setInstrument procedure, the SI register needs to be populated with the offset to the instrument’s variable and the CL register with the channel number. As we’re going to utilize three channels and two different instruments, it will have to look like this:

Just as a side note: The two instruments have misleading names. Initially I’ve talked about the NES’s sound capabilities which include a square and a triangle sound generator. The Adlib has neither of those. It’s just capable of a sinewave and three variants. I tried to mimic the sound and remind myself what tone it’s supposed to produce – thus the name.
Furthermore, by default the Adlib chooses the sinewave. In fact, the be able to use one of the other at all – what our instruments actually do – we must enable that feature somewhere before. To do this, bit 6 in register 1h needs to be one: 00100000 == 20h

With all this in place, there’s just one last thing left: a procedure that reads out the note data for the three channels and sends it to the Adlib!

Just like the setInstrument procedure, the playChannel procedure expects certain registers to be populated before calling:

  • DI – the song’s position, a value between 0 and 1056==the song’s length. It’s the same for all three channels.
  • CL – the channel (0-3)
  • BX – the offset to a channels variable inside the .DATA segment (channel1, channel2, channel3)

Let’s have a look at the first few bytes of channel1, to understand what’s going on.
channel1 DB 56h,0h,80h,0h,0h,0h,56h,0h,80h …

Suppose the song’s position (held in DI) is 0 – channel1 at DI has a value of 56h. That’s neither 80h nor 0h, so the first two comparisons aren’t true and we arrive at the NoteOn label. Here it looks up the frequency for 56h inside frequencyTable and stores the value in register DX. Afterwards we’re finally indeed feeding the 0a0h and the 0b0h registers the lower and upper bits of DX respectively! Everything I told you about the frequency and block number seems to be true.
Another example: say DI would have been 2. Byte 2 in channel1 is 80h. We determined that 80h equals a note off Midi event thus the second comparison is true and we land at the NoteOff label. No need to look up a frequency there, we just want to stop a previous note. This would be done by setting bit 5 to zero while keeping other bits and send it to register 0b0h.  We don’t need a block number though so sending 0h has the same effect.

That’s incredible! But what actually triggers the playChannel procedure? Nothing, as of yet. That’s the job of the updateMusic procedure:

It simply calls playChannel for each of the three channels, increments the songPosition and in case it reached the end, it’s reset to 0. That’s as simple as it can be!
No on is calling updateMusic yet though. For simplicity I moved the call inside the timerInterrupt procedure. Remeber, it’s supposed to fire an interrupt every 33.33ms – the rate at which our song should update.
For the real game I might going to change that logic eventually. For the moment it’s just in there so we finally have something to listen to!
Yes, we do have something now! We’ve talked about all the important things, so here’s the finished source file: nesTetris2.asm

For instant joy here’s a .mp3 preview:


In the next part of this series we’ll look into the actual gameplay.

5 thoughts on “Cloning NES Tetris In x86 16 Bit Assembly Part 2: The Music

  1. designer outlet meubelen

    It took a lot longer than I thought it would just to get to this point, and it’s not even close to Tetris at all. For now, it’s a good idea of what one might do when starting an OS in assembly. Part two and more will come in the future, and links will go here.

  2. Ярослав

    Run and debug assembly with right click on the VSCode editor panel. You can choose using MASM or TASM in the preference-settings. For windows, also support use DOSBox and MSDOS player. All needed tools have been packaged in the extension. Just install and enjoy!


Leave a Reply

Your email address will not be published. Required fields are marked *