note: This post relies heavily on the basics explained in Part 1 , Part 2, Part 3, Part 4, Part 5, and Part 6.
With an understanding of how ROM, RAM and the CPU works, we can now create programs that’ll use these resources.
As mentioned in the previous post, a CPU should implement circuits that fetch, decode and execute instructions. These circuits get their input and produce output in binary form. The CPU literally “speaks” in binary. In order to “talk” to the CPU, we would either need to get fluent at representing complex ideas in the form of 1’s and 0’s, or we can come up with something more comfortable.
In Part 3 we talked about the interesting connection between binary and hexadecimal numeral systems. If you are wondering why the decimal system doesn’t fit here, remember that the 10 is not a power of 2, and it’s roots probably lay in that ancient fact that humans have 10 fingers in each hand (which we often use for counting). Hex’s roots on the other hand lay in the modern need for a more comfortable way to talk to computers.
While hex is a comfortable way to speak binary, it’s still a long way from being a comfortable way to encode instructions for the CPU. Here’s an example of some 4004 CPU instructions:
D3 20 50 81
These are instructions that would make sense to a 4004, but not to a human (unless he spent a few month memorizing the 4004’s instructions set in hex form).
Programming languages are the interface through which humans can speak to computers. In order to create a programming language the first thing we need is an assembler. The assembler take human readable instructions, also known as assembly code, and converts them to hex digits (which can be seen as a compressed form of binary) which make sense to the CPU that decodes them. How is a assembler created? Well, for the most simple example, think about a typewriter. The programmer types in the human-readable assembly code, and the typewriter prints out hex digits representing the same instructions on paper. This is the concept behind a “punched-cards”, and the way punched-cards are created (video here). After “programming” a set of punch cards, they are fed into a machine that reads them and outputs binary into the system.
Lets look at the assembly code that produced the above hex digits:
LDM $3
FIM R0R1, $50
ADD R1
This looks a bit more understandable. We have the symbols R0 and R1, which probably represent registers, and we have an ADD symbol which probably represents an arithmetic addition instruction, and takes the value of R1 as a parameters. Naturally, to fully understand this assembly code we should look into the MCS-4 (Micro Computer Set) manual, chapter VIII. A more readable version can be found at this site.
The assembler turns assembly instructions to object code that can be decoded by the CPU, so object code and its assembly code source is computer architecture specific. This means that the assembly code you wrote for the 4004 will assemble for the 4004 only. In this post we’ll be focusing on 4004 programming, but programming principles can be easily carried over to over architectures.
Lets begin by creating a small program that adds the values of 5 and 7, and stores the result to memory outside the CPU. From our understanding that CPUs and RAM chips work with registers and communicate over the data bus, we can think of this pseudo-code:
- Store 5 in first register.
- Store 7 in second register.
- Add the values of the first and second register, and store the result in a third register.
- Send the address in which we want to store the result to a register in the memory chip over the data bus.
- Send the value of the third register over the data bus to the memory chip so it could store it.
Before we can translate this pseudo-code to 4004 assembly code, we must know what CPU resources we have in our disposal. The 4004 has seventeen 4 bit registers:
- 16 general purpose registers name R0-RF (F as in 0xF, or 15 decimal).
- 1 accumulator register.
And there are four 12 bit registers:
- PC (Program Control) register, that holds the ROM row address of the current instruction to fetch and execute from memory.
- 3 stack registers, their functionality will explained later in the post.
To create a program that’ll add the values of 5 and 7 and store the result in RAM address 0x10, we will use the following instructions:
LDM – Load data to accumulator – stores a given 4 bit value (0-0xF) to the accumulator register.
FIM – Fetch immediate (data) from ROM – Fetches 8 bits of data from a given ROM row address and stores them into a register pair (R0R1, or R6R7 for example).
ADD – Adds the value of a designated register to the accumulator register (result is stored in accumulator).
SRC – Send register control – The 8 bit value of a register pair is sent to the RAM’s address registers during instruction cycles X2 and X3. The addressing scheme works like this: The first two bits of the address designate 1 out of 4 chips in the current bank, the next two bits designate 1 out of 4 registers in each chip, and the next 4 bits designate the offset within the register (0-0xF) to which the 4 bit data is to be written.
WRM – Write accumulator to memory – The 4 bit value of the accumulator will be sent to the RAM chip during X2 cycle. The RAM chip would then store the 4 bit value to the address set during the previous SRC instruction.
Lets take a look at the code:
LDM $5 ; Load the value 5 in the accumulator register
FIM R0R1, $70 ; Load the value 7 to R0, and 0 to R1
ADD R0 ; Add the value of R0 to the accumulator register
FIM R0R1, $10 ; Load the value 1 to R0, and 0 to R1
; translates to binary 0010000b
SRC R0R1 ; Select RAM chip 0, register 1, offset 0
WRM ; Store accumulator to RAM
It’s a good practice to comment as many lines of code as possible. Assembly code usually makes much sense to the programmer who wrote it at the time of writing, but trying to understand someone else’s code (or your own after a month or two) without comments can be a difficult task. While it is easy to understand what an individual line of code does does, understanding combined purpose of all the lines in a program is much more difficult. It can be compared to understanding what is portrayed in a huge wall painting by looking at it through a microscope, one fraction of a millimeter at a time.
Now that we have our program’s assembly code, we can assemble it, burn the object code to a 4001 ROM chip, place it on a PCB with a 4004 CPU and a 4002 RAM chip, and run our program! Or we could go copy-paste our code to an online 4004 assembler, copy the object code to the online 4004 emulator and step through the code instruction by instruction. By the time you’ve reached the end of the program at PC 08, the value 0xC will be written to RAM address 0x10 (or RAM bank 0, chip 0, register 1 offset 0). If you are wondering how we could write to another RAM bank – that would require the use of the DCL instruction that controls the CM-RAM pins to activate different banks. This completes the answer to the question that was left open in the previous post on how RAM chip selection works.
There are more basic features that can be implemented in order to a program’s code more understandable, compact and effective:
Routines – Routines individual pieces of code that have a certain functionality. A 4004 routine for example might performs a series of calculations based on a 32 bit value stored in registers R0-R7 and store the 32 bit output value to R8-RF. Lets assume that this routine spans over 2 ROM chips and it can be now integrated into different 4004 programs. All the program has to do is set up the 32 bit number argument in R0-R7, execute the routine’s code, and use the output value in registers R8-RF.
Code branching – In the above example, a program executed a routine who’s code was sitting in 2 separate ROM chips. But what if this routine needs to be used 10 times during the process of the main program? Does it require 20 extra ROM chips? The solution to this problem is code branching. The main program’s code can be stored in ROM 0-5, and the routine’s code can be stored in ROM 6-7. Now when ever the routine’s code needs to be ran, the main program can “jump” to the sub-routine’s code (The main program that runs when the 4004 starts is a routine, so any branching routine is considered to be a sub-routine). In the 4004, this is implemented by the JMS instruction, which is a 16 bit instruction – 4 bits for the instruction op-code, and 12 address bits representing the ROM row address us the beginning of the sub-routine to be executed.
The JMS instruction also “pushes” the address for the next instruction to be executed after the JMS instruction onto the stack. The reason why it’s considered a “push” is because a JMS instruction stores the next instructions address to the level 1 stack register, while the level 1 register’s value gets pushed to the level 2 register, and the level 2 registers value gets pushed to the level 3 stack register.
Once the sub-routine is done, it will execute the BBL instruction, that’ll “pop” the value of the level 1 register to the PC register. A JMS instruction causes a push PC->lvl1->lvl2->lvl3 (PC now holds the address pointed by the JMS instruction, level 3’s value is overwritten), and a BBL instruction causes a pop lvl3->lvl2->lvl1->PC (PC’s value is overwritten, level 3’s value is now 0). The stack is controlled by the JMS and BBL instructions, and works as a LIFO (last in, first out).
Conditional and unconditional jumps – Codes usually implement conditional jumping to different code addresses depending on user input, or the result of sub-routines. The main difference between jumping and branching is that the stack isn’t used while jumping. The 4004 implements conditional jumping with the JCN instruction, which is a 16 bit instruction – 4 bits for the opcode, 4 bits for the condition, and 8 bits for the address which will be loaded to the lower 8 bits of the PC register. The 4004’s conditional jumps are limited to the current ROM chip, while the JMS instruction can jump to a full 12 bit address (jump to code in different ROM chips). The 4004 also features a 16 bit unconditional jump that can be used by the programmer.
You can now check out the P1 program on the 4004 emulator’s page. The routine fills up RAM chip 0 in bank 0 with a pattern, and also writes to the RAM chip’s status registers. The routine is basic, and features the ISZ instruction as a basic looping mechanism.
For something much more complex, you can check out the reverse-engineered Busicom 141-PF calculator’s firmware. It features some very interesting techniques to compensate for the very basic set of native instructions by using an engine that can parse “pseudo-instructions” , implemented in code at ROM address 0x4b. In my opinion the term “pseudo-instruction” might be a bit misleading, and the term “virtual instruction” might be more appropriate here, as the engine fetches op-codes that the 4004 can’t understand, and translates them to native 4004 instructions. This way, each virtual instruction can be translated to several native instructions. This technique is very interesting, and I recommend reading the descriptions in the link and getting a basic understanding of how this engine works.
I won’t be going over the details of how the 4004 gets the user’s input, because in my opinion the subject isn’t interesting enough to justify going into the excruciating detail. I might make a post in the future explaining how modern keyboards work though.
Lets conclude this post:
- CPU’s can execute instructions coded in binary.
- This binary code is called “object code”, and is usually represented in hex digits.
- The object code is assembled from human readable assembly code.
- Assembly code is CPU architecture specific, and makes use of the CPU’s resources.
- Assembly is the de-facto lowest level computer programming language (unless you consider writing object code as a programming language).
Despite the fact this post focused on 4004 assembly, the principles discussed are relevant for programming for whatever architecture you might chose, the main difference will be syntax, supported instructions and available resources.
In the next post (which will be the last for the introduction series) we’ll take a higher-level look at modern computer architecture, and discuss the various controllers (including the interrupt controller) and how everything connects to the CPU.
Hope you found this post informative. Feel free to leave comments, and ask questions.
demo