By this part of the introduction series, it’s important to get a sense of how transistors (logical gates) can be connected to create an actual processing unit. So before you continue reading, watch this video.
The CPU (Central Processing Unit) is a great net of interconnected circuits that can carry out instructions. It’s important to understand how these “instructions” are implemented in hardware.
A CPU that can execute 3 different instructions probably has 3 different circuits for each instruction. An instruction doesn’t have to be complex:
This is a circuit that can execute 4 different instructions:
00b (0x0) – Light up red LED.
01b (0x1) – Light up green LED.
10b (0x2) – Light up blue LED.
11b (0x3) – Light up all LEDs.
All you have to do is supply the circuit with the instruction’s “op-code”. Some instructions can take an argument input. The above circuit could be upgraded with a few flip-flops that’ll blink the LED that is activated. So the instructions would consist of 3 bits, two for the opcode, and one for the blink parameter.
These instructions can be fed to the CPU from memory. For example, lets combine this circuit with the DRAM circuit from the previous post (click to zoom in):
Since the RAM circuit contains 2 bits in each row, it can hold 4 different instructions in memory addresses 0x0 – 0x3 (basic circuit without the blink argument).
A CPU by definition is much more complex than the above circuit. A CPU has a much bigger instruction set, which could perform basic arithmetic functions (add, subtract, multiply, divide), basic logical functions (xor, or, and, not), basic control and Input Output (I/O) functions (store to RAM, load from RAM).
A CPU must also be able to execute these instructions without being dependent on the type, size, or location of the memory circuit. This is why CPUs have their own memory units, called registers, which are basically highly integrated SRAM memory cells. Instructions that are carried out by the CPU affect the values in the registers. A CPU might use two registers to execute an “add” instruction, the result of which can be stored in a third register, or in one of the two registers that held the values that were added. The bit size of the registers determine the native bit width of the CPU. If the CPU’s registers are 2 bits wide, the CPU speaks in 2 bit “words”. In comparison, most of the desktop processors you’ll find today speak in 64 bit words.
A CPU also contains special circuits that “fetch” and “decode” instructions. The basic instruction fetch-and-execute cycle usually looks like this:
1. Fetch the instruction from memory (and store it in a CPU register).
2. “Decode” the instruction.
3. Execute the instruction.
Lets see how an imaginary CPU might fetch, decode, and execute instructions:
Since the instructions are sitting in a distant memory circuit, there must be an agreed upon interface by which the CPU and memory chip connect. For this example lets assume a DRAM chip is used as main memory. If an imaginary CPU would like to access the circuitjs DRAM from the previous post, it would need a connection with the Row select inputs (2 bits), the Data inputs (2 bits), Read/Write inputs (2 bits) and the data outputs (2 bits). This means that a 2 bit CPU with 8 Input/Output (I/O) pins connected to the DRAM chip could easily work with that specific DRAM chip.
The instruction opcode could then be fetched in two clock cycles (assuming the time to read a bit from DRAM is much faster than the clock speed). In the first cycle the data, row address, and command lines will be set, and in the second cycle the values of the output will be fed into the CPU’s register.
Once the instruction opcode (which also contains any parameters needed for the instruction’s execution) is fed in a register, another clock cycle is needed to set the instruction handling circuit’s inputs according to the parameters fetched from memory along with the instruction. In the next clock cycle (or cycles), the instruction circuit is activated, and the result bits can be read from the output bit-lines to a register that will hold the output value.
Now lets examine how a real CPU works- the Intel 4004 (mentioned in the previous post), which was introduced in the early 70’s. Despite the fact that it wasn’t a real general purpose CPU, but rather one designed to be used by calculators, there are a few interesting things to learn from its design. Lets examine the CPU’s packaging and pinout (click to zoom in):
The first thing that might look odd is the small number of pins used. The 4004 is a 4 bit CPU that supports over 40 instructions. In order to support more than 16 (0x10) instructions, there should be a data bus at least 5 bits wide, because instruction number 0x11 translates to binary 10001b. In order to support over 32 instructions, we need a data bus of at least 6 bits. But the 4004 features a 4 bit data bus. It was still possible to fetch a large number of instructions instructions along with their parameters, however this required the instructions to be broken up into pieces and sent over the data bus over several clock cycles (slower performance).
If you’re wondering what’s the reason behind the narrow data bus, the answer is “obscure reasons”. This answer was taken from an interview with one of the 4004’s designers. Often compromises in design are made because of financial considerations. 4 bit data bus and 4 bit registers mean less transistors (less money spent) and more clock cycles to compensate (lower performance). It’s all a matter of price-performance ratio, and performance might not be an issue when the 4004 is used to power a simple calculator.
So we have the CPU package in the 4004 chip, but where does it fetch instructions from? According to the MCS-4 (Micro Computer Set) manual, page 3 (just at the bottom of the page), the minimum system is a CPU and a ROM chip that holds the CPU’s instructions. Using the MCS-4’s datasheet as reference, lets look at the way the 4004 connects to the 4001 ROM (which holds instructions) and the 4002 RAM (DRAM chip which serves as main RAM for the CPU) [click to zoom in]:
You can see that the 4004 communicates with the 4002 and 4001 via the 4 bit bus, to which all the chips are connected in parallel. This means that all the chips can sample the data on each bit-line at all time, and all chips can drive the data bit lines high or low at will. Naturally a protocol is implemented to make sure all chips communicates over the data bus in an organized fashion. All a chip has to do in order to “send” bits over the data bus is to pull the voltage high or low in sync with the clock shared by all chips on the bus. Usually chips sample the bit line during a “rising edge” of the clock (remember the flip-flops from Part 4) .
A natural question at this point would be “how does the CPU communicate with specific chips?” – First of all, the 4004 features several CM (Command Control) pins which can be used for chip activation and selection. The single CM-ROM (Command Control) pin, which is connected to all 4001 chips, is always active so the ROM chips are always standing by to receive commands from the CPU. This means that ROM chip selection is implemented in a different manner. In order to understand how ROM chip selection works, we need to understand the 4004’s basic operation
Since the 4004 is a CPU, it’s basic operation is fetch and execute instructions (instructions are held in the ROM). This is described in pages 5 and 6 in the MCS-4 manual:
The 4004 has a special register named PC (Program Counter) register, which holds the address of the instruction that needs to be fetched an executed from ROM. This register is special since it is a 12 bit register, meaning that it can hold the maximum value of 0xFFF (4,095 decimal). Since the 4001 holds 2,048 bits of instruction data this seems like too little at first, because the PC register cant even max out 2 4001 chips, despite Intel’s advertisement of 16 ROM chip support. However, by looking closely at the datasheet and manual, one could see that each basic 4004’s instruction is 8 bits wide, and each row in the 4001 chip contains an 8 bit word. The PC register holds a row index, not a bit index, and it can point to 4,096 different rows which hold a to a total of 32,768 bits worth of instructions (exactly 16 4001 chips).
When the 4004 is powered up (or reset), the PC register’s value is 0, so the CPU begins the instruction fetch, which is the first part of the “instruction cycle” (click to zoom in):
At the beginning of an instruction cycle, the sync line is pulsed to sync all ROM chips that are waiting for commands. During each cycle after the sync pulse, 4 bits of data are “sent” through the data bus in parallel and read by all ROM chips.
Cycles A1 and A2 carry the 8 bit address of the 8 bit instruction in the ROM chip, and A3 is the 4 bit chip select code (it is a bit strange that chip selection happens in the end and not the beginning). 4 bits translate to a maximum number of 0xF (15) meaning that ROM chips numbered 0-15 could be addressed (this answers the ROM chip select question). The ROM chip’s numbering is hard wired during the programming and manufacturing process, so when the CPU asks for data from ROM chip number 3 on the bus, there should be no confusion as to which chip was selected.
The 8 bit wide instruction is then sent over the data bus by the ROM chip to the CPU’s registers over the two M1 and M2 cycles, and it is stored in an 8 bit register for “decoding” (which is a fancy way of saying it will be fed in a multiplexer that’ll activate the relevant circuit in the CPU, and feed it with the command’s parameters as inputs). This is followed by the X1 cycle in which the CPU processes the instruction internally and X2 and X3 cycles in which the CPU might communicate with the RAM or ROM chips in order to prepare for a read, write (only for RAM) instruction.
This sums up the 8 cycles needed for the 4004 to fetch and execute a basic instruction in all of its narrow-bandwidth-data-bus-hack glory. Keep in mind that some instructions are 16 bits wide, and they require more cycles to fetch and decode. Take a look at the 4004’s instruction set. The JUN (Jump UNconditional) instruction for example (unconditionally begin executing instructions from a given address) is 16 bits wide – 4 bits for the instruction “op-code”, and 12 bits for the target address to be fed into the PC register in order to begin execution.
Chip selection for the 4002 chips is a bit different. First of all there are 4 CM pins for RAM (CM-RAM0 to 3), which activate different RAM banks (each bank contains 4 DRAM chips). The rest will be explained in detail in the next post.
If you’re wondering how the process of programming a ROM chip and running your program on the 4004 looked like, a simple answer would be – not like anything we are used to today. Back in the 70’s you had to write all you programs in human readable “Assembly” language, which could then be fed into another computer program that reads the lines of code and translates them to binary (usually represented to humans in hex digits). This binary code was then burned into the 4001 ROM chips by Intel, and sent to the client. One client of the MCS-4 is Busicom, a Japanese company that used the MCS-4 in its Busicom 141-PF calculator. Here’s a photo of the 141’s main board:
Take a look at this photo:
You can actually see the 4 bit data bus embedded in the PCB (follow the dark lines), connecting all 4002 and 4001 chips to the 4004!
Another important thing to note is the way a 4 bit processor handles numbers bigger than 0xF. They do it by using shift registers. The idea is that the CPU carries out an addition of two big numbers by calculating 2 digits at a time using an adder circuit (similar to the one used in Part 3), and gradually push the result in a shift register chip to form a bigger and bigger number. The 4003 is the MCS-4’s shift register chip.
To conclude this post, lets go over some important points:
- The CPU contains circuits that can handle different instructions.
- The CPU contains a special decoding circuit, that decides on which instruction circuit should be activated depending on the bits in, i.e the value of, the register that holds the fetched instruction. It also feeds the decoded input parameters to the instruction circuit.
- The instruction fetch circuit sends a series of bits over the data bus, this series of bits is actually a command for a specific ROM chip to send data from a specific address to back to the CPU.
- When an instruction is fetched and executed, then Program Counter register is incremented to point to the address of the next instruction to be fetched and executed.
- The communication over the data bus is synchronized by the clock the CPU generates over the clk phase 1 and 2 pins.
In the next post, we’ll create a simple assembly program for the 4004, and further discuss the use of registers and memory in computer programs.
Hope you found this post informative. Feel free to leave comments, and ask questions.