Menu

Skip to content
0xdemo's low level stuff

0xdemo's

Low-level stuff

Month: September 2017

An introduction to modern computers, Part 8B – I/O (Input/Output), peripherals and DMA (Direct Memory Access)

Posted on September 29, 2017 by demo

note: This post relies heavily on the basics explained in Part 1  , Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8A.

In the previous part we’ve discussed the purpose of driver/buffer chips, how chip selection is implemented through the signals on the address bus, how interrupts work and how the stack works. In this part we’ll begin by taking a closer look at the interrupt mechanism, while introducing some interesting hacks that allow system and software designers to extend the basic features of the 8080 without changing the 8080 itself.

The 8080 in its basic configuration supports up to 8 different interrupt routines. This is implemented through the INT signal and the interrupting chip sending the interrupt vector through the single byte RST instruction on the data bus. Here’s a diagram of an implementation of the 8214 for multi interrupt vector support:

The 8212 is used to buffer the 8214’s output. Notice all but 3 of the 8212’s input data pins are hardwired to Vcc, meaning they will always output 1 when ever the 8212 is selected. Speaking of selection, the 8212 will be activated when the INTA signal on the control bus goes low. The 8228 system controller will drive the control bus’ INTA signal low based on a combination of the data pins when the 8080 is in output mode (DBIN is low): D0, D3 and D5 are high while the rest are low. The different data pins are decoded using this table:

This means the INTA signal on the control bus will go be generated during a fetching cycle (the RST instruction is about to be fetched from the data bus), while an INTA is sent on the data bus (D0 is high), and that the operation of the current machine cycle (several instruction cycles combined are considered a machine cycle) is a read cycle in general (again, instruction fetch). The INTA signal will activate the 8212 just in time (T2) for it to transmit the buffered RST instruction during the T3 cycle as described in the previous post.

Also, notice the INT signal has its polarity reversed when it comes out of the 8212 so it’ll match the 8080’s specification. Another interesting thing to notice when examining the 8214 logic diagram on page 5-153 in the manual is that the INT signal, when activated, will remain active for one clock cycle only (it is connected to the “set” pin of a D-flipflop). The 8212 latches this INT signal so that it continuously transmits to the 8080.

As mentioned, the 8214 chips can be cascaded in order to support more interrupt vectors. But since there’s still a single interrupt pin on the 8080, and since the RST instruction still is a one byte instruction with 3 bits to indicate the interrupt vector, how can we tell the 8080 where the interrupt handler’s code is located? The intuitive solution would involve removing the hard wiring from the D0-D2 pins on the 8212 and connecting them to multiplexing circuitry that’ll be connected to multiple 8214s. The problem is that if D0-D2 will not be hardwired to 1, an RST instruction won’t be transmitted on the data bus, since an RST instructions opcode is constructed in the following way:

meaning that we only have 3 bits to work with and that’s it. To work this problem out, we need a little outside help that comes in the form of the RST 7 instruction feature of the 8228 system controller. Just to make things clear, and RST 7 instruction is a regular RST instruction with the reset vector of 0x38. An RST 7 opcoce is 0xFF, or an instruction byte of all binary 1’s.

To activate this feature, a system designer needs to connect the INTA output from the 8228 to a voltage source through a resistor as specified on page 5-8, and the 8228 will be ready to send an RST 7 instruction to the 8080 through the data bus as soon as the 8080 acknowledges and interrupt.  Remember that the 8228 (if used in the system) sits between the 8080 and the data bus, so it can intercept signals before they go from the 8080 to the other chips through the data bus, and vise-versa.

After the RST 7 instruction is executed, the 8080 will begin executing code from address 0x38. Now it’s the programmer’s job to write the code that “talks” with the 8214 array in order to read the extended interrupt vector, and figure out the address for the real interrupt handler. But again, how can the 8214 array point to more than 8 different interrupts? Since the 8214 was relived from the responsibility of transmitting the RST instruction to the 8080 (which is now done by the 8228), outputs from multiple 8214s can now be multiplexed into the 8212 input pins. This method can support up to 40 different interrupt vectors. Here’s an example of an array that can support up to 16 interrupts:

Notice that in this implementation, input pins D0-D2 are on the 8212 are hard wired to 0, meaning that watever vector is sent down the data bus is already a multiple of 8 (if you aren’t sure why, check out how multiples of 8 look like in binary), so the programmer skips multiplying the interrupt vector by 8. In this implementation the top 8214 is the higher priority controller, because the ENLG pin output of the top 8214 goes into the bottom chip’s ETLG input. This means that the bottom chip’s interrupts will be serviced only when there are no interrupts waiting to be serviced in the top 8214.

Since this interrupt controller array isn’t autonomous anymore (it doesn’t send the interrupt vector through the RST instruction), a programmer must somehow be able to read the interrupt vector from the array. This can be done by activating the array (chip selection) and then reading the array’s output on the data bus.

In the previous post we’ve covered memory chip selection through the address bus, and general chip selection can work in a similar way. There are many ways to implement chip selection through the address and control bus. The only thing a system designer needs to be careful about is collisions. The 8080 has different memory read and write instructions, and their execution causes the 8228 to send the according MEMR or MEMW signals down the control bus. This means that if a system designer wants to implement interrupt controller array using a 8205 that is connected to the address bus, he must make sure that the address the programmer uses to read the data from the array isn’t also a functioning memory address. In case such a collision happens, both the interrupt array and the memory array will try to send their data down the data bus, and the result will be undefined.

Since the communication with the interrupt array is not memory communication, it is considered as chip I/O (Input/Output). Chip selection using the address bus is called Memory Mapped I/O (MMIO). In a system that uses MMIO, if pin A15 is used to activate the interrupt array, a programmer can read the interrupt vector by using a single MOV instruction targeting address 0x8000. While this method allows for very simple and intuitive programming, it “eats” a chunk of the address space, meaning that the amount of addressable memory is decreased.

The 8080 features an alternative to MMIO in the form of isolated I/O (or just I/O). The I/O address space is made of “ports”, and reading/writing from/to them requires special instructions. The main advantage of using isolated I/O is that it doesn’t diminish the amount of addressable memory.

Physically, isolated I/O is implemented through the IOR and IOW signal outputs on the 8228 chip:

and as far as the programmer is concerend, this is how the address spaces look like using the different methods:

In the above example A15 pin is used for general I/O chip selection, meaning that the programmer is left with only 32KiB of addressable memory (which might be fine for some systems).

To support the isolated I/O model, specific instructions were implemented: IN and OUT, which are synonymous to read and write respectively. A port number must be specified with the IN/OUT instruction and the content of the data bus is read to the accumulator or written from the accumulator to the data bus respectively. The reason only 256 ports are addressable through isolated I/O probably has something to do with instruction length (this way the instruction size can be reduced to two bytes, one for the instruction and one for the port address), and the thought that 256 different ports seem like a large amount of chips that can be selected with a single instruction.

Lets get back to the interrupt array example.

When an interrupt is accepted by the 8080, the 8228 sends and RST 7 instruction back to the 8080, which begins executing the single interrupt handler. The interrupt handler’s code will probably look like this:

First thing is to save processor status (as discussed in the previous post). Then, the interrupt vector is read from the interrupt array to the accumulator register using the IN instruction. now a-16 bit address is set using registers L and H, representing low and high address bits respectively. Notice the base address of the interrupt handling routine array loaded into register H, while the specific interrupt routine’s offset from the beginning of the array is loaded to the L register. The PCHL instruction means load L and H to PC register, which is synonymous to “jump to the address represented by a combination the values in H and L registers”.

This is how I/O works in terms of chip selection and programming. We can now continue to the next subject, peripheral I/O.

For reference, here’s a diagram of a 8080 based system:

  • I/O peripheral interface – Lets consider a common input device – a fully decoded keyboard. “Fully decoded” means that the keyboard has its own processor that handles polling for key presses, and outputs bits that represent them (using an ASCII table for example) through a data bus that can be connected to our system. While it sounds surprising that keyboards might be anything but “fully decoded”, remember the Busicom calculator’s firmware from the previous post, the 4004 actually handled polling and decoding key presses!

This keyboard however can’t (more correctly – shouldn’t) just jam a byte (representing a character) down the data bus when ever a key is pressed. For synchronization and control we need a chip that can fulfill the specifications required by the 8080’s interrupt system. The 8255 is the right chip for this job. It is a programmable peripheral interface chip that comes in a 40-pin package. There reason for the large amount of pins is versatility. The 8255 can be implemented in many ways and reprogrammed on the fly by software – but in this post we’ll focus on the 8255’s mode 1 that can handle keyboard and display I/O (the chip’s functions are explained in detail starting from page 5-113 in the manual).

Lets look the the 8255’s pin-out and internals:

We can see that the chip has its own internal data bus connecting 3 different ports (A, B and C that is split to lower and upper 4 bits). The internal is connected to the host’s data bus through a data bus buffer. The chip also features control logic circuitry that connects to the host’s control, data and address bus. In theory, each port can handle I/O to a single device, and port selection is implemented by using 2 bits from the address bus (A0 and A1).

But before the 8080 can select a specific port, it needs to be able to select the 8255 for communication first. Here’s an example of how 8255 chip selection can be implemented through MMI/O without additional 8205 decoders:

Address pins A0-A1 are used for 8225 port selection, and A2-A14 are used for direct chip selection (each address pin is connected to an individual chip select pin on the 8225). This way 13 different 8225 can be selected, while the memory address space gets cut in half, but this might be a very good solution for systems with a small amount of memory while also reducing the cost of the system (because extra 8205 chips aren’t needed for chip select decoding).

An isolated I/O chip select similar to the above would limit the possible number of addressable chips to 6 (since an I/O port is 8 bits in size, 2 of them used for 8225 port selection). Again, this is with direct chip select using the address lines – adding 8205 decoders can allow much more chips to be selected using I/O ports.

Now that we have chip selection figured, lets see how we can control the chip to set it up for proper communication with our keyboard.

When the system boots, or when the 8255 is reset, it will automatically go into mode 0. For our implementation, we need to put the chip in mode 1. This is done by sending a control word to the 8255 down the data bus. Since the 8255 has 3 ports, when the A0 and A1 pins are both 1, it means the word on the data bus is addressed to the 8255 itself (and not to a peripheral connected to port A,B or C) and it is a control word. Here’s the chips full control table:

Remember that all the signals with the line above them are “active low”. The control word on the data bus programs the 8255’s mode for each group as explained by this diagram:

There are many possible combinations that can be programmed mixing up between groups and modes, and they are described starting from page 5-116. As mentioned, we’ll be focusing on mode 1, which is strobe I/O mode. With strobe I/O, the peripherals and 8080 communicate indirectly through the 8255. Lets see what happens when a key is pressed on a keyboard connected to port A which acts as input mode 1:

  • The keyboard’s internal circuitry decodes the keypress and translates it to 6 bits of data.
  • The keyboard uses pin PA6 (pin 6 of port A) to strobe (signal) that there’s keypress data ready on the PA0-5 pins.
  • The 8255 latches the data from the keyboard to an internal buffer.
  • The 8255 sends an INT signal through PC3 pin (pin 3 of port C). This port should be connected to the 8080 through a 8212 or through a combination of 8212 and a-8214 priority interrupt chip.
  • The 8080 eventually executes the interrupt handling routine, which will select the interrupting 8255, and read the latched decoded keypress on the data bus, and save it memory.
  • An ACK signal is sent from the 8080 to the keyboard to notify its circuitry that the decoded key was read, meaning that the keyboard is now ready do decode the next keypress.

Here’s a diagram of the connection:

You can see that we have another peripheral device connected to the 8255 – a Burroughs self-scan display which is a small system by itself. You can watch a video on how it looks and works here (I/O explanation starts at the 5:45 mark).

So lets see how a cycle of reading a character from the keyboard and printing it on the display works:

  • Keyboard input as described above. In the end a byte representing a character will be saved to RAM in a known address, lets say it’s saved to a “ready to print buffer”. A variable that represents the number of characters sitting in the “ready to print” buffer is incremented by this routine.
  • before the keyboard input interrupt routine finishes, it reads  the ACK signal from the display. If the ACK signal is high, it means the display is ready to print a character, so a display print routine will be called before the keyboard input interrupt handler finishes. If the ACK signal from the display is low, the keyboard input interrupt handler finishes.
  • The display print routine translates the bytes stored in ram to data the display “understands”, and it to the display by writing  the data to the 8225 chip. The 8225 latches the data to pins PB0-8 and then sends a signal down the DATA READY pin. When the display receives the DATA READY signal, it’ll read the byte from pins PB0-8, and send it to its internal circuitry. Once the data is sent to the display, the display print routine finishes.
  • When the display is done printing the character, it’ll send an ACK signal to the 8255, which will trigger another (different) interrupt.
  • The interrupt handler that deals with the display’s ACK will check if there are any more charecters waiting to be printed from the “ready to print” buffer (that’s located in RAM). If there are charecters to be printed, the routine will call the display print routine. Before it finishes, the interrupt handler will decrement the number of charecters in the “ready to print” buffer (because one just got printed).

A ring buffer can be implemented in software to handle this scenario. You can see that the software programmer needs to be very accurate in order for this system work properly. The program described above is the “software driver” that drives the hardware. Without the software driver, this sophisticated system will be useless.

Now that we have a good understanding of how memory and I/O reads/writes work, we can discuss one last feature supported by the 8080 – DMA (Direct Memory Access). 

With DMA, peripheral I/O chips can be designed to write directly to memory. While the 8255 doesn’t support this feature, other chips can be designed to support it. The way DMA is implemented is simple: A device with DMA capabilities will prepare the target address (and data in case of a memory write), and when it is ready, it will drive the signal on the 8080’s HOLD pin high. The 8080 will sample the HOLD pin during T2, and if it accepts the request (depends on conditions described on page 2-13), it’ll drive the HLDA pin high, signalling to the requesting device that the CPU is now suspended.

The device then executes the memory read/write, and drives the HOLD pin low once it’s done. The 8080 then resumes normal operation. Simple.

DMA saves a considerable amount of time. Instead of interrupting the CPU, causing it to execute routines to read the data from the device to internal registers, and them write the data from the registers to the memory – a device can get everything read, and execute the read/write to memory in just one cycle! Of course the device won’t just decide on where and what to write to memory, that’s where the software driver that is in charge of the DMA operation comes in.

DMA is used extensively in modern computers, and the chip arrays that handle communications with the device and DMA operations are called controllers (USB controller, SATA controller, etc..), or adapters (Display adapters, etc..).

* * * * *

This post concludes the introduction series. The purpose here was to answer the simple question “How computers work” without leaving anything in the dark or refer to something as “magic” or “it just works”. This series also set the basic knowledge-base so I won’t have to repeat things, or have to include a low-level intro in future posts.

There are many more interesting subjects to discuss, and from this point it would be impossible to set them up in a linear fashion like these series was set. That’s why from here I’ll make post on different subjects, going as low-level as need to understand exactly how things work.

I’ll also be accepting post requests on any hardware or software subjects. So if you have any, write them down in the comments, or send them to the email in the side-panel.

Hope you found this series informative and enjoyable to read. Feel free to leave comments, and ask questions.

 

demo

Posted in An introduction to modern computers | Leave a comment

An introduction to modern computers, Part 8A – Introducing Intel’s 8080, interrupts and the new stack

Posted on September 26, 2017 by demo

note: This post relies heavily on the basics explained in Part 1  , Part 2, Part 3, Part 4, Part 5, Part 6 and Part 7.

By this point of the introduction series, we’ve established a good understanding of how the CPU, RAM and ROM work individually and together when connected by a data bus. While the CPU and memory are the heart (or brain) of the computer, they can’t do much by themselves. Have you ever tried using a computer without a keyboard/mouse/screen? It basically makes no sense. In this post we’ll see how these input/output devices are integrated into the system.

This time we’ll be looking at Intel’s 8080 processor which was introduced in 1974. The 8080 is an 8-bit CPU, and it improved on Intel’s first 8-bit CPU, the 8008 (which was introduced in 1972) design. 8 bits are considered to be a byte – a basic unit of data still used to this day (the less common 4 bit data unit is called a nibble). The 8008’s differs from the 4004 in the fact that it is designed to work with a variety of different memory components, while the 4004 was designed to work with other MCS-4 components. This “we build CPUs and you combine them with whatever you want” approach is the standard today – the CPU and RAM you are using right now is manufactured by different companies.

While the 8008 featured a wider 8-bit data bus, it still wasn’t wide enough to fit a 14-bit address which was held in the 8008’s PC and 7 stack registers. Lets look at the different packaging:

The 8008 has just 2 pins more compared to the 4004, while the 8080 has 40 pins. On closer inspection we can see that the 8008 and 8080 discarded the CM-RAM pins, and that the 8080 features 16 address pins (meaning up to 64KiB of memory was addressable) .

By this point you are probably wondering how RAM and ROM chip selection works in the 8008/8080, and how the instruction fetch/decode/execute cycle looks like. Lets start by taking a look at this image from the MCS-80 User’s manual:

This is a high level view of how different devices are connected by a single bus. Notice that the only part that has its part number printed on it is the 8080 (as opposed to the MCS-4 architecture), this is due to the fact that the 8080 can work with different chips, that can be produced by different manufacturers, as long as they fit Intel’s published specifications. Here’s a more detailed view of the bus:

We can now see that the bus is actually 3 different buses that connect much more chips compared to the MCS-4. There are many interesting old and new elements to examine. Lets begin with an element which was first introduced in Part 4 of the series:

  • Clock generator and driver – The 8224 chip provides the clock for the system:

When the 8224 is connected to a crystal (pins 14 and 15) it outputs an oscillating square-wave to pin 12 (which can be used as clock input for other chips), and uses a “device by nine” counter to generate two slower non-overlapping square-waves (as the 8080 requires) on pins 10 and 11.  The 8224 also features a RESET and READY signals for the CPU which are in sync with the CPU’s clock. The importance of these signals being in sync with the clock will be explained later in the post.

Also, notice the line above the STSTB and RESIN signals (pin 2 and 7). This means that the signal is “active low”.

  • System controller – The 8228 chip sits between the data bus and the 8080. On output (to the bus) the chip’s job is to decode the signals from the 8080’s pins to a combination of outputs to the data and control bus. On input (from the bus), the chip passes the signals from the data bus to the D0-D7 pins on the 8080:

Another important feature of the 8228 is that is acts as a buffer/driver. This means that it has the needed circuitry to make sure there’s enough current to properly activate the transistors on all the chips that are connected to the data bus, and a minimum voltage level to be maintained on input (coming from the chips):

Notice the control bus pins (which are all active-low pins), they play an important role in chip selection (chip selection will be gradually explained through the post). Interestingly, Intel decided to add an example of an alternative setup using a combination of 8212 and 8216 chips instead of a single 8228 chip on page 3-4 in the manual (figure 3-5).

  • Address buffers and decoders – Sitting between the address bus and the 8080 (optionally) are the 8212/8216 buffers and the 8205 decoder. The reason why the buffers are optional is because the they are needed only to drive the address buss with more current if a large number of chips are connected to it. The 8205 1-out-of-8 decoder is used for chip selection by being selectively connected to the address bus:

In the above configuration, address 0x300 will select chip 0, since it translates to binary 1100000000b (bits 10-14 are 0). Address 0x400 will select chip 1. This way RAM chip selection is implemented using just the address, alleviating the programmer from manual RAM chip selection (remember the 4004’s DCL instruction).

  • ROM and RAM chips  – The reasons why there are so many chip numbers in the ROM and RAM blocks is because they represent different possible combinations. For RAM, a system designer can chose the more expensive SRAM chip like the 8102, or a less expensive DRAM chip like the 8107B-4. If a DRAM chip is chosen, a 8210 TTL-TO-MOS chip must be added since the address bus transports a TTL signal while the RAM is a MOS device. More about TTL vs MOS can be read here (The listed SRAM chips can handle direct TTL input). Also it is recommended to add a 8222 memory refresh controller along with DRAM chips. It is possible to replace the 8222 by software that reads all bits from ram ever ~60m/s, but that’s like using a Ferrari as a farm tractor.

Chip selection is implemented by decoding the signal on the address bus with the 8205. To better understand how it’s done, lets examine an image from the user’s manual which shows an implementation of a 16KiB DRAM system which is based on an array of 8107B-4, 4096  x 1 bit word chips :

Two “service” chips are implemented here. The 8210 TTL-TO-MOS chips that handle the address signals that go to the DRAM chips and the 8212 buffer that drives the data input to the DRAM chips and buffers the output for the data bus.

Address lines A12-A13 combined with the 8205 chip act as row selectors. Each row contains eight 8107B-4 DRAM chips which read a 12 bit address from A0-A11, and output a single bit of stored data. This means that when a 14 bit address is fed into this system, 8 bits (one byte) will be read from the chips and stored in the 8212 buffer. Read/Write selection is implemented through the MEMW and MEMR signals on the control bus.

Chip selection for ROM chips works in the exact same way. A system can have a mix of ROM and RAM memory, and with a combination of 8205’s it is possible to select ROM and RAM simple by logically assigning them with their own address space. For example, memory reads from address 0-8KB are reads from ROM chips (writes to these addresses wont have any effect), and memory reads and write to addresses 8-16KB will read and write to RAM chips. This is done by using A13 as a main ROM/RAM selector. If A13 is 0, all RAM chips are deactivated, and if A13 is 1, all ROM chips are deactivated.

Another important difference over the MCS-4 is that the MCS-8/80 can fetch instructions anywhere in from the unified address space – meaning that instructions can be fetched from both ROM and RAM.

  • Interrupts – The 8080 features a single INT (interrupt) pin. What’s an interrupt? It’s a signal coming to the CPU from an external source that causes the CPU to stop what it’s doing, and activate a predefined set of instructions to handle the interrupt. Interrupts are asynchronous by nature, and can come at any time.

Why use interrupts? Well, interrupts are probably the most sane way for a general purpose CPU to handle peripheral I/O (Input/Output). A CPU that doesn’t work with interrupts (like the 4004 for example) would have to constantly poll devices as part of its programming, meaning it would constantly waste cycles just to check if there’s any input from a device like a keyboard for example. Polling too often would harm performance, and polling less often might cause input lag, or miss the input completely.

Here’s an example of a simple instruction loop circuit with an interrupt circuit:

As usual, you can get the TXT file for the circuit and test it out in the circuit simulator (link in the side panel). The main loop lights up each LED light by order starting from the top and going down. At any moment the main loop can be interrupted and an interrupt handler will blink the blue LED 3 times. Once the interrupt handler is done blinking the blue LED, it’ll signal the system by toggling the flip-flop and the main loop will continue executing. An important thing to notice in the above circuit is the clock synchronization feature, which makes sure the interrupt handler begins execution at a rising edge of the clock. This makes sure that the blue LED gets 3 consistent blinks, and that each red LED will have an equal activation time (not including the time it takes to execute the interrupt handler) even if it’s interrupted.

Interrupt handling synchronization is even more critical when it comes to CPUs. What happens if an interrupt occurs half way through the instruction decode cycle? Intel takes care of that by sampling the INT line during the T1 instruction cycle (instruction cycles are discussed in detail starting from page 2-3 in the manual):

Also, an internal interrupt enable (or INTE) flip-flop is implemented along with an INTE output pin. Here’s a snip from the manual:

As we can see, the 8080 automatically disables interrupts during the T1 if an interrupt was accepted (to prevent another interrupt to interrupt with the interrupt handling).

So what happens when the 8080 accepts the interrupt? Lets break it down to the cycles:

T1 – The 8080 drives INTE low (disable interrupts).
T2 – The interrupting chip’s circuitry will drive the INT signal low after getting the low INTE signal. The 8080 sends an interrupt acknowledge signal (by sending a 1 on D0).
T3 – The 8080 drives DBIN (Data Bus IN) pin high, and it’s the interrupting chip’s job to send a single byte RST instruction down the data bus during this cycle.
T4… – RST instruction is decoded to an address which is loaded to the PC register, and from there we have the usual fetch decode and execute.

The RST instruction, which is a single byte instruction transmitted during T3, has 3 variable bits that indicate the interrupt handing routine’s address (the routine’s instructions will be fetched and executed during the following cycles). The value of the 3 bits get multiplied by 8 to get the routine’s address. This means that in the basic configuration (cascading interrupt controllers is discussed in the manual , the 8080 can have 8 different interrupt handling routines at the following addresses: 0x0, 0x8, 0x10, 0x18, 0x20, 0x28, 0x30 and 0x38.

This means that if the system designer wants to implement interrupt handling, he’ll need to introduce chips that can handle this interrupt protocol, and also to make sure that the relevant memory addresses contain the needed code to handle the interrupts.

An important question should be asked at this moment- if there’s only one INT pin on the 8080, what happens when there are multiple interrupting chip in the system?The 8214 is a priority interrupt controller is the answer:

It is possible to connect multiple interrupting chips to the 8214. The 8214 receives the interrupts from the chips and pitches the interrupts to the 8080 based on priority. The chip with the interrupts that are considered highest in priority will be connected to pin R0 on the 8214, and its interrupt handling code will start at address 0. The chip with the least important interrupts will be connected to pin R7 and its interrupt handling code will start at address 0x38. It is possible to cascade the 8214 chips in order to deal with more than 8 different interrupts. This technique is pretty straight forward and is explained on page 5-159 in the manual (and will be discussed in the part B of this post).

If the space giving to each interrupt handler looks too little to hold enough instructions to actually handle the interrupt, remember that it is possible to use the JMP instruction, or the CALL instruction (which is the improved version of the 4004’s JMS instruction) to continue execution of code that sits in another memory address. The 8080’s instruction set is detailed in the users’s manual.

On the software side of interrupt handling there are three important things to note. First, the execution of the RST instruction during T3 doesn’t have an effect on the PC register. Second, the programmer can disable interrupts by using the DI instruction [This is usually done in performance critical parts of the software, or when the programmer knows that the code in the interrupt handler might break the program’s logic if the program is interrupted at that moment] . Third, the address of the instruction that was supposed to be executed before the code flow was interrupt is stored on the stack.

  • The stack – Starting with the 8008, the stack took a new form (this form is used to this day in modern Intel CPUs). It is now located in memory (and not in registers like 4004), meaning that an infinite (as long as you have enough memory) stack dept can now be reached. An RST or CALL instruction would cause the next instruction’s address to be stored in memory pointed by the stack pointer (SP, a 16-bit register) in an action called push. The push action also decrements the SP register’s value by 2 (representing the 2 bytes stored in memory).

That’s not a typo, the SP is decremented whenever data is pushed on the stack (into memory). The reason for this strange behavior lies in the fact that having an entity that grows (in the normal direction) autonomously in memory will force software developers to constantly think about stack placement and how to prevent it from destroying data in memory, or how not to corrupt the stack with memory writes (since the “return” addresses are usually stored on the stack, corrupting the stack will cause undefined behavior). With a stack that “grows down” in memory, it is possible to place the stack pointer at the top of RAM. This way, we have normal memory store instructions use memory while moving up the RAM (controlled directly by the programmer), and the stack moving down the RAM autonomously. This gives a clear indication on when (and if) the two memory segments are about to collide (if memory write address is greater or equal to the SP address, we have a problem). Memory could also be logically divided to segments – the code/data segment, and the stack segment, all having a predefined size.

A RET instruction will pop the address in memory pointed by the stack pointer to the PC register (and also increments the SP register’s value by 2). It is also possible to store general data on the stack by using the PUSH instruction, and read data from the stack by using the POP instruction. Both instructions effect the SP register’s value (POP increments, PUSH decrements). This gives programmers a general purpose LIFO (Last In First Out) memory mechanism to work with when needed.

Since interrupts usually disrupt normal program flow, it’s the programmer’s responsibility to first save all current CPU registers in memory before executing code that actually handles the interrupt. A comfortable way (and de-facto the standard way) of doing this is to push all registers on the stack:

: 

Once the interrupt handling routine is done, the programmer must add instructions that pop the saved processor status from the stack in reverse order before the RET instruction.

* * * * * * *

This is a lot of information to digest. By this point you should have a good understanding on how memory chip selection works, and how interrupts are implemented in software. The way the 8080 handles interrupts, manages memory and stack is very similar to how modern Intel processors work. Those of you who know their way around x86 CPU architecture assembly would probably feel at home with the 8080, despite it being an “ancient” 8 bit CPU.

I’ve decided to split this post in two parts. In the next part, which will conclude the introduction series, we’ll examine how peripheral I/O (input/output), and what DMA (Direct Memory Access) means an how it’s is implemented.

Hope you found this post informative. Feel free to leave comments, and ask questions.

demo

Posted in An introduction to modern computers | Leave a comment

An introduction to modern computers, Part 7 – Computer programming with assembly

Posted on September 16, 2017 by demo

note: This post relies heavily on the basics explained in Part 1  , Part 2, Part 3, Part 4, Part 5, and Part 6.

With an understanding of how ROM, RAM and the CPU works, we can now create programs that’ll use these resources.

As mentioned in the previous post, a CPU should implement circuits that fetch, decode and execute instructions. These circuits get their input and produce output in binary form.  The CPU literally “speaks” in binary. In order to “talk” to the CPU, we would either need to get fluent at representing complex ideas in the form of 1’s and 0’s, or we can come up with something more comfortable.

In Part 3 we talked about the interesting connection between binary and hexadecimal numeral systems. If you are wondering why the decimal system doesn’t fit here, remember that the 10 is not a power of 2, and it’s roots probably lay in that ancient fact that humans have 10 fingers in each hand (which we often use for counting). Hex’s roots on the other hand lay in the modern need for a more comfortable way to talk to computers.

While hex is a comfortable way to speak binary, it’s still a long way from being a comfortable way to encode instructions for the CPU. Here’s an example of some 4004 CPU instructions:

D3 20 50 81

These are instructions that would make sense to a 4004, but not to a human (unless he spent a few month memorizing the 4004’s instructions set in hex form).

Programming languages are the interface through which humans can speak to computers. In order to create a programming language the first thing we need is an assembler. The assembler take human readable instructions, also known as assembly code, and converts them to hex digits (which can be seen as a compressed form of binary) which make sense to the CPU that decodes them. How is a assembler created? Well, for the most simple example, think about a typewriter. The programmer types in the human-readable assembly code, and the typewriter prints out hex digits representing the same instructions on paper. This is the concept behind a “punched-cards”, and the way punched-cards are created (video here). After “programming” a set of punch cards, they are fed into a machine that reads them and outputs binary into the system.

Lets look at the assembly code that produced the above hex digits:

LDM     $3
FIM     R0R1, $50
ADD     R1

This looks a bit more understandable. We have the symbols R0 and R1, which probably represent registers, and we have an ADD symbol which probably represents an arithmetic addition instruction, and takes the value of R1 as a parameters. Naturally, to fully understand this assembly code we should look into the MCS-4 (Micro Computer Set) manual, chapter VIII. A more readable version can be found at this site.

The assembler turns assembly instructions to object code that can be decoded by the CPU, so object code and its assembly code source is computer architecture specific. This means that the assembly code you wrote for the 4004 will assemble for the 4004 only.  In this post we’ll be focusing on 4004 programming, but programming principles can be easily carried over to over architectures.

Lets begin by creating a small program that adds the values of 5 and 7, and stores the result to memory outside the CPU. From our understanding that CPUs and RAM chips work with registers and communicate over the data bus, we can think of this pseudo-code:

  1. Store 5 in first register.
  2. Store 7 in second register.
  3. Add the values of the first and second register, and store the result in a third register.
  4. Send the address in which we want to store the result to a register in the memory chip over the data bus.
  5. Send the value of the third register over the data bus to the memory chip so it could store it.

Before we can translate this pseudo-code to 4004 assembly code, we must know what CPU resources we have in our disposal. The 4004 has seventeen 4 bit registers:

  • 16 general purpose registers name R0-RF (F as in 0xF, or 15 decimal).
  • 1 accumulator register.

And there are four 12 bit registers:

  • PC (Program Control) register, that holds the ROM row address of the current instruction to fetch and execute from memory.
  • 3 stack registers, their functionality will explained later in the post.

To create a program that’ll add the values of 5 and 7 and store the result in RAM address 0x10,  we will use the following instructions:

LDM – Load data to accumulator – stores a given 4 bit value (0-0xF) to the accumulator register.
FIM – Fetch  immediate (data) from ROM – Fetches 8 bits of data from a given ROM row address and stores them into a register pair (R0R1, or R6R7 for example).
ADD – Adds the value of a designated register to the accumulator register (result is stored in accumulator).
SRC – Send register control – The 8 bit value of a register pair is sent to the RAM’s address registers during instruction cycles X2 and X3. The addressing scheme works like this: The first two bits of the address designate 1 out of 4 chips in the current bank, the next two bits designate 1 out of 4 registers in each chip, and the next 4 bits designate the offset within the register (0-0xF) to which the 4 bit data is to be written.
WRM – Write accumulator to memory – The 4 bit value of the accumulator will be sent to the RAM chip during X2 cycle. The RAM chip would then store the 4 bit value to the address set during the previous SRC instruction.

Lets take a look at the code:

LDM      $5          ;  Load the value 5 in the accumulator register
FIM      R0R1,  $70  ;  Load the value 7 to R0, and 0 to R1
ADD      R0          ;  Add the value of R0 to the accumulator register

FIM      R0R1,  $10  ;  Load the value 1 to R0, and 0 to R1
                     ;  translates to binary 0010000b
SRC      R0R1        ;  Select RAM chip 0, register 1, offset 0
WRM                  ;  Store accumulator to RAM

It’s a good practice to comment as many lines of code as possible. Assembly code usually makes much sense to the programmer who wrote it at the time of writing, but trying to understand someone else’s code (or your own after a month or two) without comments can be a difficult task. While it is easy to understand what an individual line of code does does, understanding combined purpose of all the lines in a program is much more difficult. It can be compared to understanding what is portrayed in a huge wall painting by looking at it through a microscope, one fraction of a millimeter at a time.

Now that we have our program’s assembly code, we can assemble it, burn the object code to a 4001 ROM chip, place it on a PCB with a 4004 CPU and a 4002 RAM chip, and run our program! Or we could go copy-paste our code to an online 4004 assembler, copy the object code to the online 4004 emulator and step through the code instruction by instruction. By the time you’ve reached the end of the program at PC 08, the value 0xC will be written to RAM address 0x10 (or RAM bank 0, chip 0, register 1 offset 0). If you are wondering how we could write to another RAM bank – that would require the use of the DCL instruction that controls the CM-RAM pins to activate different banks. This completes the answer to the question that was left open in the previous post on how RAM chip selection works.

There are more basic features that can be implemented in order to a program’s code more understandable, compact and effective:

Routines – Routines individual pieces of code that have a certain functionality. A 4004 routine for example might performs a series of calculations based on a 32 bit value stored in registers R0-R7 and store the 32 bit output value to R8-RF. Lets assume that this routine spans over 2 ROM chips and it can be now integrated into different 4004 programs. All the program has to do is set up the 32 bit number argument in R0-R7, execute the routine’s code, and use the output value in registers R8-RF.

Code branching – In the above example, a program executed a routine who’s code was sitting in 2 separate ROM chips. But what if this routine needs to be used 10 times during the process of the main program? Does it require 20 extra ROM chips? The solution to this problem is code branching. The main program’s code can be stored in ROM 0-5, and the routine’s code can be stored in ROM 6-7. Now when ever the routine’s code needs to be ran, the main program can “jump” to the sub-routine’s code (The main program that runs when the 4004 starts is a routine, so any branching routine is considered to be a sub-routine). In the 4004, this is implemented by the JMS instruction, which is a 16 bit instruction – 4 bits for the instruction op-code, and 12 address bits representing the ROM row address us the beginning of the sub-routine to be executed.

The JMS instruction also “pushes” the address for the next instruction to be executed after the JMS instruction onto the stack. The reason why it’s considered a “push” is because a JMS instruction stores the next instructions address to the level 1 stack register, while the level 1 register’s value gets pushed to the level 2 register, and the level 2 registers value gets pushed to the level 3 stack register.

Once the sub-routine is done, it will execute the BBL instruction, that’ll “pop” the value of the level 1 register to the PC register. A JMS instruction causes a push PC->lvl1->lvl2->lvl3 (PC now holds the address pointed by the JMS instruction, level 3’s value is overwritten), and a BBL instruction causes a pop lvl3->lvl2->lvl1->PC (PC’s value is overwritten, level 3’s value is now 0). The stack is controlled by the JMS and BBL instructions, and works as a LIFO (last in, first out).

Conditional and unconditional jumps – Codes usually implement conditional jumping to different code addresses depending on user input, or the result of sub-routines. The main difference between jumping and branching is that the stack isn’t used while jumping. The 4004 implements conditional jumping with the JCN instruction, which is a 16 bit instruction – 4 bits for the opcode, 4 bits for the condition, and 8 bits for the address which will be loaded to the lower 8 bits of the PC register. The 4004’s conditional jumps are limited to the current ROM chip, while the JMS instruction can jump to a full 12 bit address (jump to code in different ROM chips). The 4004 also features a 16 bit unconditional jump that can be used by the programmer.

You can now check out the P1 program on the 4004 emulator’s page. The routine fills up RAM chip 0 in bank 0 with a pattern, and also writes to the RAM chip’s status registers. The routine is basic, and features the ISZ instruction as a basic looping mechanism.

For something much more complex, you can check out the reverse-engineered Busicom 141-PF calculator’s firmware. It features some very interesting techniques to compensate for the very basic set of native instructions by using an engine that can parse “pseudo-instructions” , implemented in code at ROM address 0x4b. In my opinion the term “pseudo-instruction” might be a bit misleading, and the term “virtual instruction” might be more appropriate here, as the engine fetches op-codes that the 4004 can’t understand, and translates them to native 4004 instructions. This way, each virtual instruction can be translated to several native instructions. This technique is very interesting, and I recommend reading the descriptions in the link and getting a basic understanding of how this engine works.

I won’t be going over the details of how the 4004 gets the user’s input, because in my opinion the subject isn’t interesting enough to justify going into the excruciating detail. I might make a post in the future explaining how modern keyboards work though.

Lets conclude this post:

  1. CPU’s can execute instructions coded in binary.
  2. This binary code is called “object code”, and is usually represented in hex digits.
  3. The object code is assembled from human readable assembly code.
  4. Assembly code is CPU architecture specific, and makes use of the CPU’s resources.
  5. Assembly is the de-facto lowest level computer programming language (unless you consider writing object code as a programming language).

Despite the fact this post focused on 4004 assembly, the principles discussed are relevant for programming for whatever architecture you might chose, the main difference will be syntax, supported instructions and available resources.

In the next post (which will be the last for the introduction series) we’ll take a higher-level look at modern computer architecture, and discuss the various controllers (including the interrupt controller) and how everything connects to the CPU.

Hope you found this post informative. Feel free to leave comments, and ask questions.

demo

Posted in An introduction to modern computers | Leave a comment

Post navigation

  • Older posts

Categories

  • An introduction to modern computers (9)
  • How Operating Systems work (2)
  • Prolog (1)

Posts

  • October 2017 (2)
  • September 2017 (5)
  • August 2017 (5)

Useful Links

Circuit Simulator

Intel 4004 emulator

Intel 8080 User’s Manual

An introduction to modern computers

1 – Electricity basics

2 – Basic electric circuits

3 – Transistors, Boolean algebra and Hex

4 – Clocks and Flip-Flops

5 – Memory and how DRAM works

6 – How the CPU works

7 – Computer Programming with assembly

8A – Intel’s 8080, interrupts and the new stack

8B – I/O, Peripherals and DMA

How Operation Systems Work

Part 1 – The forgotten history of Operation Systems

Part 2 – Privilege control and memory protection

Proudly powered by WordPress
Theme: Flint by Star Verte LLC