Assembly

Basics Primer

Machine Code

Machine code is the native language of a CPU, consisting of numerical instructions that are directly executed by the hardware. Specific to each type of CPU architecture, it represents the absolute lowest level programming language. Due to its complexity, programmers typically use higher-level languages or Assembly language, which are then translated into machine code for execution.

What is Assembly Language?

Assembly languages are low-level programming languages uniquely designed for specific types of processors. They serve as an intermediary between high-level programming languages and machine code. Assembly language employs mnemonic codes, which are symbolic representations of machine instructions, allowing programmers to write in a format that's more comprehensible(to humans) than binary or hexadecimal machine code.

Despite its efficiency and control, assembly language is also characterized by its complexity and lack of portability. Code written in assembly language for one type of CPU architecture generally cannot be executed on another, as different architectures often have different instruction sets.

What is an Assembler?

An assembler is a software that translates assembly language into machine code. This conversion is necessary because while assembly language is more readable for humans, with its use of mnemonics and labels, computers only understand instructions in a numerical(binary) format.

The main role of an assembler is to parse the assembly language code and convert it into a binary format that can be executed by the computer's hardware. This process involves translating mnemonic operation codes into their numerical equivalents and resolving symbolic names for memory locations into actual addresses.

The operations of an assembler can typically be divided into two phases:

The first phase involves analyzing the assembly language code and preparing for the translation. This includes processing directives, macros, and defining symbols or labels used in the code.
The second phase is the actual conversion of the processed code into machine language.

So far in the examples I’ve mentioned in the microcode ROM post, I just assumed that programs were conveniently loaded in RAM. Though this can be done manually, the more practical approach is through the use of an assembler. As a programmer, you write your assembly code, and the assembler takes care of generating the machine code data, placing it at the appropriate memory locations for execution. For the assembler to generate code tailored for your CPU, it must be configured with your specific instruction set.

While programming your own assembler can offer deeper insights into computer architecture, I found using CustomASM to be extremely effective for my needs.

CustomASM

Customasm is an assembler designed by hlorenzi. It works with custom, user-defined instruction sets. I use it to assemble my source files.

The assembler is available in both an online and a local version. The online version is convenient as it can be used directly in a web browser without any installation, but it’s generally less flexible compared to the local version; which offers more advanced features and customization options.

Customasm is a Rust-based program, therefore, to run it locally, you need to have Rust installed on your machine.

Installing Rust

Note: The installer automatically installs Cargo, which is a package manager for Rust; just like pip is for Python.

On Windows

1- Go to the Getting Started page and Download rustup-init.exe.

After running the installer, you’ll see a brief breakdown of what it’d do; press Enter.

2- Restart your computer.

To check if Rust is installed, you can follow these steps:

Open the Command Line:
- Press Win + R to open the Run dialog.
- Type cmd and press Enter to open the Command Prompt.
Check Rust Version:
In the Command Prompt, type the following command and press Enter:
rustc --version
This command checks the version of the Rust compiler (rustc) installed on your system.
Check Cargo Version:
Additionally, you can check if Cargo, Rust’s package manager, is installed by typing:
cargo --version

Press Enter to execute the command.

If Rust and Cargo are correctly installed, these commands will return the version numbers of rustc and cargo, respectively.

On Linux and MacOS

1- Go to the Getting Started page and copy the curl command-line installer.

2- Execute it in your terminal.

Note to Linux users: I initially got an error running the installer because my version of curl was installed using snap.

So first I uninstalled curl with sudo snap remove curl and reinstalled it with sudo apt install curl

After running the installer, you’ll see a brief breakdown of what it’d do; press Enter

3- Restart your shell.

4- After installation, source your environment so that you can use the Rust tools from the current shell. This can be done using:

source $HOME/.cargo/env

You can then check the version of rust and cargo with rustc --version and cargo --version respectively.

Installing CustomASM

For in-depth references go to the official Customasm GitHub repository.

1- Have Rust Installed on your machine

2- Run cargo install customasm, then the customasm application should automatically become available in your command-line environment.

Writing Code With CustomASM

Hopefully; All the steps mentioned in the micro code ROM post will make sense by the end of this one.

#Ruledef

Every CustomASM code starts with a #ruledef directive that defines how assembly instructions are translated into machine code.

Within the #ruledef block, each line specifies a rule for a particular assembly instruction or a set of instructions. The syntax usually includes the instruction’s mnemonic, such as load, add, or sub, possibly accompanied by operands. It then maps these to the corresponding machine code format.

The code I use to generate my instructions (microcode_generator.py) includes a generate_ruledef() function, which creates a ruledef.asm file containing definitions for each instruction supported by my CPU.

Let’s consider a program that adds two numbers and stores the result in a register:

#ruledef ; The involved instructions are defined as follows:

MOV  $A,  {im: i8} => 0x03 @ im
ADD  $A,  {im: i8} => 0x04 @ im
HLT   => 0xff

add_2_plus_8:
    MOV $A, 2  ; Load 2 into Register A
    MOV $B, 8  ; Load 8 into Register B
    ADD $A, $B ; Add the contents of Register B to Register A
               ; The result (10) is stored in Register A
    HLT        ; Halt the program execution

A “ruledef.asm” file in the same directory as the code, containing the instructions mnemonics can be imported as #include "ruledef.asm” instead of defining “ruledef” in the main program.

Recall that none of my instructions contains operands. The operands are fetched from memory, therefore the assembler uses two memory addresses to write the MOV $A, {im: i8} instruction. The first address is for the MOV $A instruction itself; and the second is for the 8-bit immediate operand {im: i8}. Same for ADD $A, {im: i8}.

HLT does not require an operand so it is written at one memory location.

After running the program, the assembler generates the following:

outp| addr | data (base 2)

0 |    0 |                   ; add2_plus_8:
0 |    0 | 00000111 00000010 ; MOV $A, 2
0 |    2 | 00001110 00001000 ; MOV $B, 8
0 |    4 | 00100100          ; ADD $A, $B
0 |    5 | 11111111          ; HLT

Let’s say that the data above was loaded into RAM at the mentioned addresses(From address 0 to 5), and that the Program Counter is currently pointing to address 0x0000.

Note:

t denotes the micro-steps ticks/time.

Every instruction starts with a fetch cycle: _PCE | ME | IR_in | PCC.

; Load 2 into Register A
MOV $A, 2

At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0000) onto the memory bus.
b - The RAM points to the content at 0x0000(0b00000111) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0000 ==> 0b00000111.
d - The PC increments to 0x0001.
At t_1: Now, IR contains 0b00000111(MOV $A, #).
Step 2($t_1$) of MOV $A, # is:_PCE | ME | write_to_reg[A] | PCC.
From the falling edge of the clock to the end of the LOW phase: a - The PC outputs its current content(0x0001) onto the memory bus. b - The RAM points to the content at 0x0001(0b00000010) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase: c - Register A gets loaded with the content of the RAM at 0x0001 ==> 0b00000010. d - The PC increments to 0x0002.
At t_2:
IR still contains 0b00000111 (MOV $A, #).
Step 3($t_2$) of MOV $A, # is the last step _ScR.
At the falling edge of the clock: The step counter resets.

Register A now contains 2 (0b00000010).

; Load 8 into Register B
MOV $B, 8 

At t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0002) onto the memory bus.
b - The RAM points to the content at 0x0002(0b00001110) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0002 ==> 0b00001110.
d - The PC increments to 0x0003.
At t_1: Now IR contains 0b00001110 (MOV $B, #).
Step 2($t_1$) of MOV $B, # is:_PCE | ME | write_to_reg[B] | PCC.
From the falling edge of the clock to the end of the LOW phase: a - The PC outputs its current content(0x0003) onto the memory bus. b - The RAM points to the content at 0x0003(0b00001000) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase: c - Register B gets loaded with the content of the RAM at 0x0003 ==> 0b00001000. d - The PC increments to 0x0004.
At t_2:
IR still contains 0b00001110(MOV $B, #).
Step 3($t_2$) of MOV $B, # is the last step _ScR.
At the falling edge of the clock: The step counter resets.

Register B now contains 8 (0b00001000).

; Add the content of Register B to the content of
; Register A and store the result(10) in Register A.
ADD $A, $B

at t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0004) onto the memory bus.
b - The RAM points to the content at 0x0004(0b00100100) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0004 ==> 0b00100100.
d - The PC increments to 0x0005.
At t_1: Now IR contains 0b00100100 (ADD $A, $B).
Step 2($t_1$) of ADD $A, $B is: enable_reg[A] | ALU_MIRROR_BUS | ZW.
From the falling edge of the clock to the end of the LOW phase: a - Register A outputs its current content(0b00000010) onto the data bus.
From the rising edge of the clock to the end of the HIGH phase: b- The accumulator is loaded with the content of the Register A (0b00000010).
At t_2: IR still contains 0b00100100 (ADD $A, $B).
Step 3($t_2$) of ADD $A, $B is: enable_reg[B] | ALU_ADD |_FW | ZW.
From the falling edge of the clock to the end of the LOW phase:
a - Register B outputs its current content(0b00001000) onto the data bus.
From the rising edge of the clock to the end of the HIGH phase:
b- The accumulator adds its content(0b00000010) to the content of Register B(0b00001000).
c - The flags generated by the operation are latched into the Flags Register.
At t_3 :
IR still contains 0b00100100 (ADD $A, $B).
Step 4($t_3$) of ADD $A, $B is: write_to_reg[A] | ZE.
From the falling edge of the clock to the end of the LOW phase:
a - The accumulator outputs its current content(0b00000010 + 0b00001000) onto the data bus.
From the rising edge of the clock to the end of the HIGH phase:
b- Register A is loaded with the content of the accumulator(0b00001010)

Register A now contains 10 (0b00001010).

At t_4 :
IR still contains 0b00100100 (ADD $A, $B).
Step 5($t_4$) of ADD $A, $B is the last step _ScR.

At the falling edge of the clock: The step counter resets.

; Halt the program execution
HLT

at t_0:
From the falling edge of the clock to the end of the LOW phase:
a - The PC outputs its current content(0x0005) onto the memory bus.
b - The RAM points to the content at 0x0005(0b11111111) and outputs it to the data bus.
From the rising edge of the clock to the end of the HIGH phase:
c - IR gets loaded with the content of the RAM at 0x0005 ==> 0b11111111.
d - The PC increments to 0x0006.
At t_1: Now IR contains 0b11111111(HLT).
Step 2($t_1$) of HLT is:HLT(The halt control line).
From the falling edge of the clock to the end of the LOW phase:
a - HLT is asserted.
At the rising edge of the clock: The clock gets halted.

«««««««««« Previous Post: Control Unit & Instruction Register

Microcode Generator »»»»»»»»»»

Updated: May 21, 2024