Assembly Language Tricks for the HACK computer

(Last Mod: 17 December 2013 13:19:14 )

Overview of the Hack instruction set

Instructions for the HACK computer are partitioned into two types, the A-type and the C-type. In addition there is a pseudoinstruction for defining line labels, plus end-of-line comments are supported.

The Hack CPU contains two registers, D and A. The D register is a general purpose data register whose contents are always presented to one of the two ALU data ports. The value in the D register can only come from the output of the ALU. The A register holds the address anytime that memory is to be accessed, but when not used for memory it is a general purpose register. The second data input to the ALU is either the contents of the A register or the value in memory, referred to as M, that is at the address contained in the A register, so M=Memory[A]. Consequently, the ALU cannot perform operations on both the A and the M values at the same time. The value stored in the A register can either come from the output of the ALU or it can be loaded with a zero-extended 15-bit constant using the A-type instruction.

Spaces are ignored and, hence can be inserted anywhere in any of the fields for readability without impacting the instruction represented.

A-type instructions

@value

There is only a single A-type instruction and its format is the @ symbol followed by a 15-bit value. The A-type instruction is used to load a zero=extended 15-bit value into the A register.

C-type instructions

[dest =] comp [; jump]

dest = A|M|D (more than one allowed, but must be in order)

comp = one of the 28 supported operations

jump = JMP|JEQ|JNE|JLT|JGE|JGT|JLE

The C type instruction performs all tasks other than loading constants into the A register. It has three fields, the destination (dest), the computation (comp) and the jump (jump) fields. The comp field is required, but the other two are optional; it configures the CPU to perform a specific computation. The dest field, if present, precedes the comp field and is separated from it by an equals sign; it controls which, if any, of the three available locations the result of the computation is to be stored in. Finally, the jump field, if present, follows the comp field and is separated from it by a semicolon; it determines, based on the result of the computation, whether a branch to the address contained in the A register occurs.

The destination field of a C-type instruction can be any subset of the A. M, and D registers but they must appear in that order. Hence, the possible destinations are {A, AM, AMD, M, MD, D}. As stated above, if no destination is to be written to, the field and the subsequent equals sign are omitted entirely.

The jump field of a C-type instruction can be any one of {JMP, JEQ, JNE, JLT, JGT, JLE, JGE}. Similar to the destination field, if no jump is to be made at all, the field and the preceding semicolon are omitted entirely.

The computation field of a C-type instruction must be present and must one of the 28 operations in the following table. The following table shows the possible computations that can be performed by the Hack CPU.

OPERATION	a=?	a=0	a=1
ARITHMETIC
Contstants	0	1	-1
Echo	D	A	M
Negative	-D	-A	-M
Increment	D+1	A+1	M+1
Decrement	D-1	A-1	M-1
Addition		D+A	D+M
Subtraction		D-A	D-M
Inverse Subtraction		A-D	M-D
LOGICAL (bitwise)
NOT	!D	!A	!M
AND		D&A	D&M
OR		D\|A	D\|M

While this is a pretty restricted instruction set, it is not restricting, meaning that you can do just about anything you could do with a much richer instruction set, it just may require a bit more cleverness and a few (perhaps quite a few) more instructions.

Symbols

A symbol is a text string that maps to a memory location. To create a symbol that maps to an address in the instruction memory, place the symbol name in parentheses on the line prior to the instruction it should point to. There can be a comment on this line, but nothing else. To create a symbol that points to an address in data memory, simply use the symbol in an A-type instruction. The assembler will automatically assign memory locations to these symbols beginning at memory location 16 (0x0010).

There are a handful of predefined symbols that serve as pointers into the data memory. They are

Symbol	Address	Purpose
`R0` tp `R15`	0 to 15	convenience
`SP`	0	Stack Pointer
`LCL`	1
`ARG`	2	Function Argument
`THIS`	3
`THAT`	4
`SCREEN`	16384 (0x4000)	Start of Screen Buffer
`KBD`	24576 (0x6000)	Keyboard buffer

Comments

The only comment type that is supported is the end-of-line comment that begins with a double forward slash (//) and ends at the end of the same line.

Blank lines are allowed and are ignored. Spaces are also ignored.

Bit Banging

The term "bit banging" refers to manipulating individual bits with a data word, especially when you can only interact with entire words at one time. In order to manipulate individual bits, you need to be able to do three things to a given bit within a word without affecting, or being affected by, any other other bits in the word. You must be able to query the status of the bit, and you must be able to set the bit to a HI state and clear the bit to a LO state. Before proceeding, you may want to look at the topic of bit banging in general.

Provided we can generate a mask having a HI in the desired bit position and a LO in all other positions, the three bitwise logical operations supported by the hack CPU are sufficient to let us perform the three required tasks.

For the following examples, let's assume that the mask is in the D register and the word with the bit we want to work with is in M (hence the address of the data value is in the A register).

D&M; JEQ // Branch if clear

D&M; JNE // Branch if set

M=D|M // Set the bit

D=!D // Invert the mask in prep for clearing

M=D&M // Clear the bit

It turns out that while the Hack assembly language does not include instructions for inverting the inputs and/or outputs when carrying out logical operations, the CPU does. Hence, it is possible to write machine code to perform the two actions shown to clear a bit in a single instruction.

The next hurdle is being able to generate the necessary mask. The easiest way to do this is to precompute the mask and load it directly using the A-type instruction. The problem with this approach is that we can only load a mask for the lower 15 bits. This is fine provided we do not need to interact with the msb. If we do, then we need to be able to fully determine all 16 bits in the value. Being able to do this is useful in general, not just for bit banging operations, so we take a slight aside and discuss how this can be done. Besides, it uses another trick that we will need shortly.

In order to load a full 16-bit value into a 16-bit register when our load instruction only allows us to load the lower 15 bits, we do the follow: Load the upper most fifteen bits, shift the contents to the left one place, then add the lsb. But since we don't have an instruction to shift the contents of a register to the left one bit, we take advantage of the fact that shifting a binary value to the left one bit is the same as multiplying it by two, which is the same as doubling it, which is the same as adding it to itself. Since we do have an instruction that can add two numbers, we simply have to make both numbers the same and we will have performed a left-shift-by-one operation. We can then increment the result, if needed, to add in the ;lsb.

Let's do this by way of example. Suppose we want to create a bit mask in the D register that has the upper bit and the lower bit set (all others are clear). The decimal value for this pattern is 32769.

// Load 1000 0000 0000 0001 (0x8001) into D

@16384 // 0100 0000 0000 0000 (0x4000)

D=A // Load lower 15-bitsBranch if clear

D=D+A // Left shift by 1

D=D+1 // Add in lsb (not needed, or use D=D, if lsb is 0)

So now we have a means of loading a bit mask that we can compute at the time we write our program. But what about bit masks that we need to generate at run time? This is actually little different than how it must be done in many high level languages, such as C. Let's assume that the number of the bit (0 to 15) is stored in the D register and we want to replace it with a mask that has that bit set To accomplish this, we start with a mask that has a the lsb set and then left shift it by the amount in the D register.

// Create a mask with bit n set where n is in D register

@mask

M=1 // Initial mask with b0 set

@bit

M=D // Store bit position in memory

@TEST

0; JMP // Jump to test at end of loop

(LOOP)

@mask

D=M // get copy of mask in D and M

MD=D+M // Double mask to left shift by 1

@bit

MD=M-1 // decrement bit position (i.e., shift amount)

(TEST)

D; JGT // Stay in loop if shift amount is still >0

@mask

D=M // put finished mask in D

@END

(END)

0; JMP // Infinite loop to stall simulator

We've seen how we can affect a left shift by adding the contents of a register by itself. But how about a left rotate? In this case, the only difference is that we want to capture the value that is going to roll off the msb and bring that into the lsb once the left shift is complete. There are a number of ways to do this, but one way to initialize another register to 0 on 1 depending on whether the msb of the value to be rotated is 0 or 1. We can detect the status of the msb by noting that it is the sign bit. Hence something like

// Rotate a value in 'data' left by 1

@lsb

M=0 // Initialize lsb mask to 0

@data

D=M // Load data value into memory

@SKIP

D; JGE // Skip the next step if msb is zero

@lsb

M=1 // Set lsb to 1 to match msb

(SKIP)

@data

MD=D+M // Shift data left by 1 (D still has data in it)

@lsb

D=D+M // Add in the lsb

@data

M=D // Store rotated data back in memory

@END

(END)

0; JMP // Infinite loop to stall simulator

So now that we can rotate left by one, we can rotate left by an arbitrary amount by putting the above code into a loop the cycles for the correct number of times. But if we use that approach, then we would have to include a copy of the above code, modified so as not to stomp on memory used by other parts of the code, in each place were we want to rotate a value by one place. So let's see if we can't make it so that we can call this code multiple times from different places in memory.

Our first approach will be to reserve a portion of memory as a "scratchpad" area in which we are free to overwrite data because it is understood that the code that called this routine is responsible for saving elsewhere anything in the scratch pad that can't be lost. For simplicity, let's say that our scratchpad consists of the lowest eight memory locations, R0 through R7. We will see that this approach has some issues as our code gets more complex, but it's a good place to start.

Since our current routine only has a single argument, namely the value to be rotated left by one place, and a single return value, namely the rotated result, we can put the input value in the D register, call the routine, and expect the routine to place the result in the D register before returning. This approach won't hold up once we have more than a single argument or return value, but for now we will go down this road for simplicity.

The next issue that we have to address is how we transfer control to our routine and, more importantly, how we transfer control back. The first task is very straightforward because we know the address at which the routine is located since we can place a suitable label there.; this works because the routine is located at a single, fixed location in memory. But if our routine can be called from more than one place in the code, we do not have a fixed location to jump back to. We will deal with this by requiring that the calling code place the address that the function should jump back to, known as the return address, in the first memory cell in the scratchpad. So our skeleton code will look something like:

// Main code that calls the routine FRED

... // Get desired argument value in D

@RET_A

D=A // Put Return Address in D

@R0

M=D // Store Return Address stored in R0

@FRED

0; JMP // Call FRED

(RET_A)

... // Return value is in D

@END

(END)

0; JMP // Infinite loop to stall simulator

Now we just have to make some modifications to our routine to be consistent with using the scratchpad.

// ==========================================================

// ROTL1

// ----------------------------------------------------------

// Description:

// Rotates the value in the D register left 1 bit

// ----------------------------------------------------------

// Arguments:

// D - value to be rotated

// R0 - Return address

// Return values:

// D - rotated value

// Local variables:

// R1 - data being worked on

// R2 - lsb value to be added to shift to finish rotate

// ----------------------------------------------------------

(ROTL1)

@R2

M=0 // Initialize lsb mask to 0

@R1

D=M // Load data value into memory

@ROTL1_SKIP

D; JGE // Skip the next step if msb is zero

@R2

M=1 // Set lsb to 1 to match msb

(ROTL1_SKIP)

@R1

MD=D+M // Shift data left by 1 (D still has data in it)

@R2

D=D+M // Add in the lsb

@R0 // R0 is NOT the return address

A=M // It is were the return address is stored

0; JMP // Return to calling code

// ==========================================================

Now let's flesh out the "main" code to rotate a value more than one position. We'll assume that the top level code is using the scratchpad to store its data, which will mean that we will have to save those values to higher memory before calling our routine and then restore them afterwards.

// ==========================================================

// MAIN

// ----------------------------------------------------------

// Description:

// Main code that uses the ROTL1 to rotate a value left

// by an arbitrary amount

// ----------------------------------------------------------

// Arguments:

// R0 - value to be rotated

// R1 - amount to rotate by

// Return values:

// D - rotated value

// ----------------------------------------------------------

(MAIN)

// Set up the memory with some test data

@100 // Value to be rotated

D=A

@R0 // R0 holds the value to be rotated

M=D

@7 // Number of positions to rotate the value

D=A

@R1 // R1 holds the rotate amount

M=D

// Loop to rotate value in R1 the amount in R0

(MAIN_TEST)

@R1

D=M

@MAIN_LOOP_END

D;JLE // Exit loop when shift amount is <=0

// --------------------------------------------------

// Use ROTL1 to rotate value in D left 1 bit

// --------------------------------------------------

// Transfer scratchpad registers to safe area

@R0 // Transfer R0 to R8

D=M

@R8

M=D

@R1 // Transfer R1 to R9

D=M

@R9

M=D

// CALL ROTL1

@RET_A

D=A // Put Return Address in D

@R0

M=D // Store Return Address stored in R0

@R8

D=M // Get desired value in D (from safe store)

@ROTL1

0; JMP // Call ROTL1

(RET_A)

// Plase return value into scratchpad

@R0 // Return value in D goes into R0

M=D

// Restore rest of scratchpad

@R9 // Transfer R9 to R1

D=M

@R1

M=D

// --------------------------------------------------

@R1

MD=M-1 // Decrement rotate amount

@MAIN_TEST

0; JMP

(MAIN_LOOP_END)

@R1

D=M // Final rotated amount is in D

@END

(END)

0; JMP // Infinite loop to stall simulator

Using a scratchpad this way is common in small assembly language programs of the type you find in many microcontrollers. It works very will when the subroutines all perform a specific task and there is no need for one subroutine to call another subroutine. Because many small embedded microcontrollers use cycle counting for timing events, this situation is more common than it might appear at first glance because having nested subroutines can quickly make accurate cycle counting virtually impossible. But as programs grow larger -- and even today's mainstream embedded microcontrollers are sophisticated enough to permit pretty extensive code -- and especially as anything resembling an operating system enters the picture, it becomes necessary to permit subroutines to call each other or even to call themselves recursively. The use of a scratchpad to manage nested calls becomes a nightmare almost immediately and it simply won't work in the case of recursion.

The basic problem is that each subroutine needs a "safe" area to store whatever values on the scratchpad it needs before calling another function. If only one subroutine is called at a time, then the solution is simple: You simply have the main body of the code store its data someplace other than the scratchpad and each of the subroutines only use the scratchpad. Since the calls aren't nested, each subroutine knows that it can do anything with the scratchpad it wants (as long as it doesn't mess up the return address). The only reasonable way around this is to give each function it's own dedicated area of memory -- it's essence it's very own scratchpad -- but this can very quickly run a small embedded microcontroller out of memory. Furthermore, it still doesn't permit recursion since any time that a function can be called while another instance of that same function has already been called and is still active, each instance of the subroutine needs its own scratchpad.

The solution is a remarkably clever and simple one. We will give each instance of each subroutine its very own scratchpad, but we will do so automatically in such a way that the subroutine only gets to use it until it finishes execution and then we will make that memory available for other subroutines to use as their scratchpad. We will do this by using a "stack" and each time a subroutine is called, whether by the main body of the code or another subroutine or even itself, a new scratchpad will be added to the stack and a pointer to the start of that scratchpad will be made available to the subroutine. All of the subroutine's use of the scratchpad will be to memory locations that are relative to the stack pointer it was provided.