Assembly
Basics Primer
Machine Code
Machine code is the native instruction format of a CPU. It is made of numerical instruction values that the hardware can execute directly. Each CPU architecture has its own machine-code format, because different CPUs can have different instruction sets, register layouts, operand formats, and control behavior.
Because raw machine code is difficult for humans to read and write, programmers usually use a more readable language first. High-level languages such as Python or C hide most CPU-specific details and let the programmer describe the program more abstractly. Assembly on the other hand, is still human-readable, but it stays much closer to the CPU’s actual instructions. In either case, the program must eventually be translated into machine code before the CPU can execute it.
What is Assembly Language?
Assembly language is a low-level programming language designed around a specific CPU or family of CPUs. It represents machine instructions with readable mnemonics instead of raw binary or hexadecimal values. For example, a programmer can write an instruction such as ADD, MOV, or JMP instead of writing the numeric opcode directly.
Assembly language gives the programmer close control over the CPU instruction flow. The tradeoff is that assembly code is less portable than high-level code. Assembly written for one CPU architecture usually cannot run directly on another architecture, because the instruction set and machine-code encoding are different.
What is an Assembler?
An assembler is a program that translates assembly language into machine code. It replaces labels with actual addresses, and produces the binary values that the CPU can execute. For example, a programmer may write a label such as LOOP instead of manually calculating the memory address of that instruction. The assembler resolves that label during translation and inserts the correct address into the final machine code.
Assemblers often work in two main passes.
During the first pass, the assembler reads the assembly source and builds the information it needs for translation.
During the second pass, the assembler uses that information to produce the final machine code. At that point, mnemonics are converted into numerical opcodes and data bytes are placed at their intended locations in the output.
In the examples from the Microcode ROM post, I assumed that the program bytes were conveniently loaded into RAM. That can be done manually, but it quickly becomes impractical as programs grow. A more practical approach is to write the program in assembly language and let an assembler produce the corresponding machine-code bytes.
For the assembler to produce code for a specific CPU, it must know that CPU’s instruction set and encoding rules. In other words, it needs a description of how each assembly instruction maps to the binary values that the CPU expects.
While programming your own assembler can offer deeper insights into computer architecture, I found using CustomASM to be extremely effective for my needs.
CustomASM
Customasm is an assembler designed by hlorenzi. It works with custom, user-defined instruction sets. I use it to assemble my source files.
The assembler is available in both an online and a local version. The online version is convenient as it can be used directly in a web browser without any installation, but it’s generally less flexible compared to the local version; which offers more advanced features and customization options.
Customasm is a Rust-based program, therefore, to run it locally, you need to have Rust installed on your machine.
Installing Rust
Note: The installer automatically installs Cargo, which is a package manager for Rust; just like pip is for Python.
On Windows
1- Go to the Getting Started page and Download rustup-init.exe.
After running the installer, you’ll see a brief breakdown of what it’d do; press Enter.
2- Restart your computer.
To check if Rust is installed, you can follow these steps:
- Open the Command Line:
- Press
Win + Rto open the Run dialog. - Type
cmdand press Enter to open the Command Prompt.
- Press
Check Rust Version:
In the Command Prompt, type the following command and press Enter:
rustc --versionThis command checks the version of the Rust compiler (
rustc) installed on your system.Check Cargo Version:
Additionally, you can check if Cargo, Rust’s package manager, is installed by typing:
cargo --version
Press Enter to execute the command.
If Rust and Cargo are correctly installed, these commands will return the version numbers of rustc and cargo, respectively.
On Linux and MacOS
1- Go to the Getting Started page and copy the curl command-line installer.
2- Execute it in your terminal.
Note to Linux users: I initially got an error running the installer because my version of curl was installed using snap.
So first I uninstalled curl with sudo snap remove curl and reinstalled it with sudo apt install curl
After running the installer, you’ll see a brief breakdown of what it’d do; press Enter
3- Restart your shell.
4- After installation, source your environment so that you can use the Rust tools from the current shell:
source $HOME/.cargo/env
You can then check the version of rust and cargo with rustc --version and cargo --version respectively.
Installing CustomASM
For in-depth references go to the official Customasm GitHub repository.
1- Have Rust Installed on your machine
2- Run cargo install customasm, then the customasm application should automatically become available in your command-line environment.
Writing Assembly With CustomASM
The micro-step examples from the Microcode ROM post should make more sense by the end of this article. In that post, I showed how the CPU executes instruction bytes after they are already present in memory. Here, the focus is on how assembly text becomes those instruction bytes in the first place.
#Ruledef
CustomASM needs a rule definition that tells it how assembly instructions map to machine code. That rule definition is written inside a #ruledef block.
Each line inside the #ruledef block defines how one instruction, or one family of similar instructions, should be translated. A rule usually starts with the assembly syntax, such as MOV $A, {im: i8} or ADD $A, $B, then maps that syntax to the byte or bytes that should appear in the assembled output.
The code I use to generate my instruction definitions, microcode_generator.py, includes a generate_ruledef() function. That function creates a ruledef.asm file with the encoding rules for the instructions supported by my CPU.
Let’s consider a program that adds two numbers and stores the result in a register:
; The involved instructions are defined as follows:
#ruledef
{
MOV $A, {im: i8} => 0x03 @ im
MOV $B, {im: i8} => 0x0A @ im
CLC => 0x02
ADD $A, $B => 0x32
HLT => 0xFF
}
#addr 0x0000
add_2_plus_8:
MOV $A, 2 ; Load 2 into Register A
MOV $B, 8 ; Load 8 into Register B
CLC ; Clear carry before ordinary addition
ADD $A, $B ; Add the contents of Register B to Register A
; The result (10) is stored in Register A
HLT ; Halt the program execution
A ruledef.asm file in the same directory as the source program can be included with #include "ruledef.asm". That keeps the instruction definitions separate from the program itself.
In my CPU, the instruction byte itself does not contain the operand value. For immediate instructions such as MOV $A, {im: i8}, CustomASM writes the opcode first, then writes the 8-bit immediate value into the next memory location. Address-based instructions follow the same idea, but the operand uses two bytes because the CPU has a 16-bit address space.
Instructions such as HLT, CLC, and ADD $A, $B do not need extra operand bytes, so each one occupies one memory location.
After the program is assembled, the output is:
outp| addr | data (base 2)
0:0 | 0 | ; add_2_plus_8:
0:0 | 0 | 00000011 00000010 ; MOV $A, 2
2:0 | 2 | 00001010 00001000 ; MOV $B, 8
4:0 | 4 | 00000010 ; CLC
5:0 | 5 | 00110010 ; ADD $A, $B
6:0 | 6 | 11111111 ; HLT
Assume the generated bytes have been loaded into RAM from 0x0000 through 0x0006, and the Program Counter currently points to 0x0000.
Note:
t represents the current micro-step.
Every instruction starts with the same fetch step:
_PCE | ME | IR_in | PCC
That fetch step places the Program Counter on the memory/system bus, reads the selected byte from memory, loads that byte into the Instruction Register, and increments the Program Counter.
; Load 2 into Register A
MOV $A, 2
At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0000) onto the memory bus.
b - The RAM places the byte at 0x0000(0b00000011) on the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0000 ==> 0b00000011.
d - The PC increments to 0x0001.
At t_1: Now, IR contains 0b00000011(MOV $A, #).
Step 2($t_1$) of MOV $A, # is:_PCE | ME | write_to_reg[A] | PCC.
From the falling edge of the clock to the end of the LOW phase: a - The PC outputs its current content(0x0001) onto the memory bus. b - The RAM places the byte at 0x0001(0b00000010) on the data bus.
From the rising edge of the clock to the end of the HIGH phase: c - Register A gets loaded with the content of the RAM at 0x0001 ==> 0b00000010. d - The PC increments to 0x0002.
IR still contains 0b00000011 (MOV $A, #).
At t_2:
Step 3($t_2$) of MOV $A, # is the last step _ScR.
At the falling edge of the clock: The step counter resets.
Register A now contains 2 (0b00000010).
; Load 8 into Register B
MOV $B, 8
At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content (0x0002) onto the memory bus.
b - RAM outputs the content at 0x0002 (0b00001010) to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR is loaded with 0b00001010.
d - The PC increments to 0x0003.
At t_1:
IR now contains 0b00001010 (MOV $B, #).
Step 2 ($t_1$) of MOV $B, # is: _PCE | ME | write_to_reg[B] | PCC.
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content (0x0003) onto the memory bus.
b - RAM outputs the content at 0x0003 (0b00001000) to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - Register B is loaded with 0b00001000.
d - The PC increments to 0x0004.
At t_2:
IR still contains 0b00001010 (MOV $B, #).
Step 3 ($t_2$) of MOV $B, # is the last step, _ScR.
At the falling edge of the clock:
The step counter resets.
Register B now contains 8 (0b00001000).
; Clear carry before ordinary addition
CLC
At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content (0x0004) onto the memory bus.
b - RAM outputs the content at 0x0004 (0b00000010) to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR is loaded with 0b00000010.
d - The PC increments to 0x0005.
At t_1, the control unit asserts the signal that clears the carry flag. No memory operand is needed, so the instruction only changes the flag state.
IR now contains 0b00000010 (CLC).
The carry flag is cleared.
At t_2:
The step counter resets.
; Add the content of Register B to the content of
; Register A and store the result(10) in Register A.
ADD $A, $B
At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content (0x0005) onto the memory bus.
b - RAM outputs the content at 0x0005 (0b00110010) to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR is loaded with 0b00110010.
d - The PC increments to 0x0006.
At t_1:
IR now contains 0b00110010 (ADD $A, $B).
Register A outputs its current content (0b00000010) onto the data bus, and the shift-register path is configured to mirror that value.
At t_2:
The shift register outputs the copied value, and the accumulator is loaded with 0b00000010.
At t_3:
Register B outputs its current content (0b00001000) onto the data bus. The ALU adds the accumulator value (0b00000010) and the bus value (0b00001000). The accumulator is loaded with the result, and the flags are updated.
At t_4:
The accumulator outputs 0b00001010 onto the data bus, and Register A is loaded with that value.
At t_5:
The step counter resets.
Register A now contains 10 (0b00001010).
; Halt the program execution
HLT
At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0006) onto the memory bus.
b - The RAM places the byte at 0x0006(0b11111111) on the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0006 ==> 0b11111111.
d - The PC increments to 0x0007.
At t_1: Now IR contains 0b11111111(HLT).
Step 2($t_1$) of HLT is:HLT(The halt control line).
From the falling edge of the clock to the end of the LOW phase:
a - HLT is asserted.
At the next active clock point, the halt signal stops the clock, so the CPU no longer advances to the next instruction.