Menu

Skip to content
0xdemo's low level stuff

0xdemo's

Low-level stuff

How Operating Systems work, Part 2 – Privilege control and memory protection

Posted on October 21, 2017 by demo

note: I recommend reading the ITMC and the previous parts (Part 1) in this series first.

The GM-NAA I/O System (which was discussed in Part 1) introduced the basic concepts of a “supervisor” program, or operating system (OS), that allows operators to run their programs sequentially and efficiently. This new supervisor program was great, except that from time to time honest mistakes made by programmers caused data to be written to the wrong memory address, resulting in “memory corruption”. The memory corruption did not necessarily bring the computer to a grinding halt – imagine a single byte corrupted in the binary-to-decimal conversion subroutine that is a part of the OS’s code – this might have severe effects on programs that will run on that system (causing them to produce incorrect results), without giving the operators any clues about what exactly went wrong.

It very quickly became clear that the OS code must be protected from memory corruption by rogue program code.

The question is – how? The OS’s code consists of the same bits and bytes that make up regular program code, and it’s the CPU is entirely indifferent about the names we humans give to different pieces of software. In other words, the OS’s code is not inherently special in any way. Those who read the ITMC might be thinking about using ROM, but while ROM can protect the OS’s code, the OS still makes use of variables and variable data that must be stored in RAM (leaving it exposed to corruption).

Protection cannot be provided without privilege and access restriction. As long as programs can access code and data belonging to the OS, there is no real way to prevent memory corruption. Protection mechanisms must be provided and supported by hardware. Hardware can provide the means to prevent memory reads and/or writes on a physical level if certain conditions are met. As an example, think about an imaginary modified version of the 8228 system controller (discussed in part 8A of the ITMC) that blocks I/O or memory R/W control signals if the address lines contain a specific address range. But just blocking out an address range makes no sense in terms of protection, meaning there must be a way to program the hardware based protection mechanisms. It seems like we are back in square one – we need software to manage the hardware protection mechanisms, and software can be corrupted. back in the mid-60’s IBM’s engineers created a protection model that works by providing hardware privilege enforcement facilities that can be used by the first software, but in a very specific manner that will be described next.

 

One of the first (if not the first) computer system to support these concepts of protection was IBM’s System/360. At the heart of the system there’s the Process Status Word (PSW) as described on page 16 in this document:

The PSW contains all relevant controls and flags for a running program. Some fields, like the 24-bit instruction address (which serves the same purpose as the PC/IP register), change automatically depending on instructions that were executed and events that occurred while the system is running. In order for software to actively change the PSW, a LOAD PSW instruction must be used – but here’s where things get interesting – the LOAD PSW command is a privileged command that will be executed by the CPU only if the problem state bit in the current active PSW is set to ‘0’, meaning that the program currently executing is the supervisor program (i.e the OS). But how does the system get to that state in the first place? Since the OS is the first software that is executed after the computer system is powered on, it gets the highest level of privilege. Lets see how it works step by step – To load the OS an operator selects the load unit by turning a few knobs, powers on the system, and depresses the “load” key:

System/360’s control panel (Source: VAXBARN)

Depressing the load key makes the system perform a predefined set of actions (called Initial Program Load – IPL phase). First, the system reads 24 bytes from the load unit (starting from index 0) to address 0 in RAM. Address 0-8 will contain a PSW that is not yet loaded, following two Channel Command Words (CCW) that describe an I/O operation:

Bits 8-31 specify the location in RAM in which the data will be stored in case of a read I/O operation. The system then executes the input operation as described by the CCW to read the OS “nucleus” from the load unit to RAM. Once the input operation is complete, the system loads the PSW from address 0 to the CPU, the CPU kicks in and starts executing instructions from the given address with the problem state bit set to ‘0’ (supervisor mode). The IPL is synonymous to the “boot” operation of PCs today, with the nucleus being synonymous to today’s “bootloader“.

Since the nucleus is the first piece of code that runs under the constraints of IBM’s protection model, it must conform to the memory protection rules. The protection key is System/360’s method of memory protection – RAM is divided to blocks of 2,048 (0x800) bytes that are protected by a 4-bit protection key that is unreadable and unchangeable by memory read and write operations. The block’s protection key is compared to the PSW’s protection key for every memory access by a program. A key mismatch not only prevents the memory access, but also raises a specific interrupt (interrupts and interrupt handling are discussed in Part 8A and Part 8B of the ITMC). The IPL sets the key of the RAM block to which the nucleus is loaded to ‘0’ while bits 8-11 in the relevant PSW are also set to ‘0’. A RAM block’s key can be set by executing the SET STORAGE KEY command, which is a privileged command  that can be executed only when the PSW’s bit 15 (problem state bit) is set to ‘0’ (supervisor mode).

The nucleus’ code than proceeds to set up the system:

  • Collect information about available resources and peripherals and set the OS’s parameters accordingly.
  • Copy (from disk/tape) the interrupt handling routines to RAM.
  • Copy “supervisor call” handling routines to RAM (will be discussed next).
  • Copy service routines (that will be used by programs) to RAM.
  • etc…

When the OS is running, it is basically interrupt driven. Input from the user is handled by the relevant interrupt handling subroutines (discussed in Part 8B of the ITMC), which can in turn call other subroutines to initiate different functionalities. The most basic usage of the OS is loading and executing a program from a given input device (punched-card reader/disk/tape). Job (a job might contain several programs) loading and execution is done by the following steps:

  • Read job data from selected input device.
  • Mark a data block as used (data block usage is managed by OS subroutines), assign it with a protection key and copy the program’s to it.
  • Run the assembler to generate the program’s object code (copy the object code to another block if needed).
  • Execute a LOAD PSW instruction to begin executing the program’s code (bits 8-11 are set with the key, bit 15 is set to ‘1’, and bits 40-63 point to the first instruction in the code).

Once the program’s code begins executing, the program is in control of the system, but with limited privileges. The program cannot access other RAM memory blocks and cannot execute privileged instructions, like LOAD PSW, SET STORAGE or any instruction that handles device I/O. Basically, an unprivileged program is free to make changes to data and code that reside in memory blocks that match the program’s protection key (that sits in the PSW during execution). If an unprivileged program needs to perform device I/O, it can do so by “asking” the supervisor program by triggering a software interrupt – this is achieved by executing a CALL SUPERVISOR instruction. System/360’s handles 5 different interrupt classes with handlers that are set by writing their addresses to permanently assigned locations in low RAM addresses:

Memory addresses 0-24 are used by the IPL and were discusses earlier in this post. Addresses 24-56 store the PSWs that should be loaded on each interruption respectively, while addresses 88-120 hold the saved PSW from the moment interruption occurred. When a supervisor call interruption occurs, the system automatically saves the interrupting program’s PSW to address 32, and loads the new PSW from address 96 to the CPU’s circuitry. This way, code the interrupt handler’s code will begin executing with supervisor privileges (since the supervisor call PSW will have bit 15 set to ‘0’), and it’s up to the OS programmer to read the interruption code from bits 16-31 of the old PSW and branch to the subroutine that can service the request coming from the unprivileged program.

It is now completely up to the OS developers to handle the request coming from the program. Naturally, there are standards set by the OS developers that the programmers follow. One of these standards is the calling convention that’s agreed upon when a program makes a supervisor call. In case a program wants access to more memory, it must must pass a size parameter (size of extra memory access requested) to the OS, and the OS in turn must return the address of the allocated memory back to the program (or return some kind of error value which is agreed upon as an error value). The DOS (IBM’s Disk Operating System, which is a slightly less powerful version of OS/360) calling convention makes use of specific registers:

It’s again the OS developers job to make sure the SVC (supervisor call) subroutines perform parameter checking, and return and error value to the requesting program if the parameters are invalid. If proper checks aren’t made by the supervisor, rogue parameters might cause the SVC subroutine (which is running with supervisor permissions) to corrupt data or crash the system, rendering all the hardware-enforced protection features useless.

Lets review:

  • During system boot, the OS is loaded from the disk/tape to RAM and executes at supervisor privilege level.
  • Before a program is executed, the OS assigns unused blocks of memory for the program’s usage.
  • To begin program code execution, the OS executes the LOAD PSW instruction.
  • The PSW that was loaded by the OS has bit 15 set to ‘1’, meaning that the program’s code will execute as unprivileged code.
  • If the program tries to execute a privileged instruction (like an I/O operation instruction), or access a memory block with a key that doesn’t match, it’ll trigger a program exception interrupt which will be handled by the supervisor (which will probably terminate the program that caused the interruption).
  • If a program needs to perform a privileged action, it must ask the supervisor to do it by setting up the requested parameters, and making a supervisor call.

This model of privilege and memory access control is implemented in a very similar way in modern operating systems running on modern hardware. Take the following C language code for example:

printf("low-level stuff");

A program that was compiled while containing this line of code will at some point have to print a string to the selected output device, i.e it will have to perform a privileged operation. If this program is ran on a x64 CPU (which uses 64-bit words) under Windows 10 OS, the user program eventually executes the SYSCALL instruction (which is the supervisor call equivalent) to ask the OS to print the string:

Before the SYSCALL instruction is called, the relevant parameters are set in memory. Here are two of them:

At address 0xaa33b7bdd48 there are 8 bytes representing the address of the string to be printed, followed by 4 bytes representing the length of the string. The 8 byte address might look strange because it is displayed exactly the way it’s stored in memory – in little-endian form (which is a common way to store data in desktop computers). If we go to that address we’ll find our 0xf long character string:

In modern operating systems like Linux and Windows, the different privilege level execution states are dubbed “Kernel mode” (supervisor), and “User mode” (unprivileged). While the OS’s privilege control model remained basically similar to what it was half a decade ago, memory management and protection by the OS changed significantly, along with the addition of the virtual memory model, which will be discussed in the next parts of the series.

 

Hope you found this post informative. Feel free to leave comments, and ask questions.

 

demo

 

Posted in How Operating Systems work | Leave a comment

How Operating Systems work, Part 1 – The forgotten history of Operating Systems

Posted on October 14, 2017 by demo

This series complements the “Introduction to modern computers” (ITMC) series and focuses on software rather than hardware. I recommend reading the introduction series first, as it goes in dept and low-level discussing elements like memory (and use of memory) and assembly programming, while this series makes free use of these elements. The idea of this series is to give a complete, up-to-date and high/low-level picture of how operating systems work (along with a historic review). Lets begin.

 

“The history of software in the United States has been somewhat under documented.” – This is the first line of the abstract of a short RAND document (“the document” for this post), and it’s true today just even more than it was back in 1987. The birth and early evolution of operating system is one of these scarcely documented subjects which, in my opinion, are important to know in order to understand why operating systems today look and behave like they do.

The document was written by Robert L. Patrick, and gave an interesting view of how computers were used back in the 40 and 50’s. Try to put yourself in the shoes of a student or a scientist back in those days (since the dawn of history till the 1960s) – You couldn’t just pick up a calculator to calculate the results of multiplication of two real numbers (like π and e), or get the result of sin(3.4). There weren’t any handheld calculators during those times, and humanity was on the brink of harnessing electricity to build circuits that could perform these kind of calculations. In Part 3 of the ITMC for example, we saw how a combination of transistors and some simple logic can create a circuit that could add two small numbers. After figuring out how to add numbers, engineers and mathematicians (and any other curious scientist who got involved with computers back then) figured out and began to standardize circuits that could perform subtraction (along with the representation of signed numbers), multiplication and division (along with the representation of floating-point numbers – here’s a nice video showing how to manually represent real numbers in binary).

Programmers back then had to figure out how to do complex calculations with only addition, subtraction, multiplication and division. There were mathematical methods that could solve these type of problems – using Taylor series for example, a programmer could write code that would approximate the result of a trigonometric function with good precision using a combination of multiplication, addition and division. Since the result of a calculation using Taylor series gets more precise with each iteration, but more iterations mean more time to calculate, the programmer was in charge of the balance between precision and time. Questions like “How can I solve this problem using a computer?”, and “How can I get these calculations to  run faster on the computer?” gave birth to the computer science field.

Now back to the subject. During the 40’s and early 50’s computers were used without operating systems at all. This means that they could run a single program at a time, and a considerable amount of hardware setup was needed in between programs. Consider the fact that these early computers were far from being a common sight back then, and that they were far from “personal”. A computer would fill an entire room (remember the vacuum-tubes), cost a small fortune, and be shared by a large number of workers. This snip from the above-mentioned document will help you imagine how it looked like:

An IBM-704 circa 1950s (Source: Computer History Museum)

To use this computer (and all of its resources and peripherals, like the punched-card reader, the magnetic-core memory, and the magnetic-tape storage media) you’d have to get in a FIFO queue and wait till your turn comes up. Once in the computer room, you have a limited time quanta to run your program through the computer. Part 7 of the ITMC briefly mentioned punched-cards that were used back then. Imagine a programmer sitting at his desk, punching holes in a bunch of cards, running back and forth to the computer room to check if it’s his time to use the computer, and carrying around a stack of punched cards that are his computer program:

A program on punched cards. (Source: Wikipedia)

This was of course a mess (those of you who work as programmers probably understand why). As Patrick described in the document:

So the very expensive computer sat idle most of the time, and it’s usage was inefficient (to say the least). The lack of efficiency mainly came from the fact that each programmer had to manually setup the entire computer system before he could run his program. Programmers also had to “reinvent the wheel” and write their own I/O handling code and service routines (like a binary-to-decimal converter for printer output). Patrick mentioned that around mid-1955 an IBM computer user’s group (SHARE) was formed to tackle these subjects – code and knowledge sharing for mutual benefit. The programmer’s proposal highlights, which were derived during these meetings, is described on pages 7-10 in Patrick’s document. The main thought was that in order to make the usage of the computer system more efficient, a main program should reside in memory to handle user input and output, following these guide lines:

  • Input and output will be implemented using magnetic tapes (not punched cards) containing files known as SYSIN and SYSOUT – operators now came with their tapes, ran the program (paper card jam free), and took their tapes back to a machine that would print out the results (allowing for someone else to use the computer).
  • SYSIN contained batch of independent jobs (programs), along with meta-data describing each job.
  • I/O peripherals and external memory modules were standardized. This removed the need to set the hardware up, and programmers made use of different memory modules and I/O peripherals by programmatic means (described in Part 8A and Part 8B of the ITMC)
  • Programmers were kicked out of the computer room. Computer operators handled the machinery while the programmers wrote the code.
  • Standard decimal-to-binary/binary-to-decimal routines were available for usage. The programmer only needed to know how to call these routines for his service.

These ideas marked the birth of the first operating system, the GM-NAA I/O System (General Motors and North American Aviation Input/Output system) in 1957. The first OS was born from the frustration of programmers, and the interest of corporations to maximize the usage of the computer which was very expensive to rent. The operating system contained code which provided the infrastructure to assemble and run programs. SYSIN contained the job’s assembly code, and with a flick of a few switches, the OS read the code form the tape, compiled it to object code (which was kept in the much faster magnetic core memory), sample the system’s hardware clock and finally run the program (object code) while keeping track of the time and resources used. Once the program ended (completed or failed somewhere along the way), the OS would add an advisory invoice to the trailing page of each printout so that the programmer will be aware of the resources used and the cost of these resources each time a run was made. To quote Patrick’s document: “We found the resulting self-discipline of great benefit since programmers naturally do more desk checking when a wasted shot at the machine costs more than a day’s wage”. Fun times. When one program finished, the the OS was ready to run another program (tapes still needed to be manually switched in and out by the computer operators).

Beside compiling and running user programs, the OS offered standard, ready-to-use routines that the programmer could use for his (and everyone’s) benefit stands at the base of modern operating systems. This will be discussed in the following parts of the series.

The OS’s code did not sit in ROM (probably sat in magnetic tape, and loaded into the magnetic-core memory during run-time), meaning that rogue programs could corrupt the OS’s code. Furthermore, programs had unrestricted access to all of the computer’s resources. Thinking in more malicious terms, a programmer could theoretically modify the records of resource usage to get a nice “discount” on the usage of equipment. Here’s a snip concerning that subject:

The first thing operating-system developers (and OS code programmers) are taught today is to treat the user as an entity that will always try to destroy the OS and burn the hardware on which it runs (intentionally or not), this way of thinking has its roots back to the first use of operating systems, and it is not baseless.

Soon after the GM/NAA I/O System was implemented, it was upgraded to include a FORTRAN compiler (which compiled FORTRAN code to object code). FORTRAN was a high-level programming language, which means it provided an abstraction in terms of hardware usage. For example, programmers writing FORTRAN programs could now use a “print” function which handled the peripheral I/O code for them. Instead of writing god knows how many lines of IBM-704 assembly code, here’s a program that prints (using the standard output device, which ever was set at the time) the “Hello, World!” string:

program Hello
print *, "Hello, World!"
end program Hello

Again, the OS came to the programmer’s rescue by including the code for the compiler. If by this point you are thinking “generating the object code for a ‘print’ function for each program over and over again is a horrible waste of time”, than you are starting to think like an OS developer, and these are exactly the thoughts that turned these basic I/O systems to OSs that we know today.

 

This is how the beginning of Operating Systems looked like. OSs were created out of a need to provide the user with an infrastructure to run his programs as quickly and efficiently as possible (because there are 30 angry programmers with stacks of punched-cards/tapes in their hand waiting in line). In the next post we’ll continue going over the historical development of operating systems with new concepts of standardization, portability and protection. Hope you found this post informative. Feel free to leave comments, and ask questions.

 

demo

 

Posted in How Operating Systems work | Leave a comment

An introduction to modern computers, Part 8B – I/O (Input/Output), peripherals and DMA (Direct Memory Access)

Posted on September 29, 2017 by demo

note: This post relies heavily on the basics explained in Part 1  , Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8A.

In the previous part we’ve discussed the purpose of driver/buffer chips, how chip selection is implemented through the signals on the address bus, how interrupts work and how the stack works. In this part we’ll begin by taking a closer look at the interrupt mechanism, while introducing some interesting hacks that allow system and software designers to extend the basic features of the 8080 without changing the 8080 itself.

The 8080 in its basic configuration supports up to 8 different interrupt routines. This is implemented through the INT signal and the interrupting chip sending the interrupt vector through the single byte RST instruction on the data bus. Here’s a diagram of an implementation of the 8214 for multi interrupt vector support:

The 8212 is used to buffer the 8214’s output. Notice all but 3 of the 8212’s input data pins are hardwired to Vcc, meaning they will always output 1 when ever the 8212 is selected. Speaking of selection, the 8212 will be activated when the INTA signal on the control bus goes low. The 8228 system controller will drive the control bus’ INTA signal low based on a combination of the data pins when the 8080 is in output mode (DBIN is low): D0, D3 and D5 are high while the rest are low. The different data pins are decoded using this table:

This means the INTA signal on the control bus will go be generated during a fetching cycle (the RST instruction is about to be fetched from the data bus), while an INTA is sent on the data bus (D0 is high), and that the operation of the current machine cycle (several instruction cycles combined are considered a machine cycle) is a read cycle in general (again, instruction fetch). The INTA signal will activate the 8212 just in time (T2) for it to transmit the buffered RST instruction during the T3 cycle as described in the previous post.

Also, notice the INT signal has its polarity reversed when it comes out of the 8212 so it’ll match the 8080’s specification. Another interesting thing to notice when examining the 8214 logic diagram on page 5-153 in the manual is that the INT signal, when activated, will remain active for one clock cycle only (it is connected to the “set” pin of a D-flipflop). The 8212 latches this INT signal so that it continuously transmits to the 8080.

As mentioned, the 8214 chips can be cascaded in order to support more interrupt vectors. But since there’s still a single interrupt pin on the 8080, and since the RST instruction still is a one byte instruction with 3 bits to indicate the interrupt vector, how can we tell the 8080 where the interrupt handler’s code is located? The intuitive solution would involve removing the hard wiring from the D0-D2 pins on the 8212 and connecting them to multiplexing circuitry that’ll be connected to multiple 8214s. The problem is that if D0-D2 will not be hardwired to 1, an RST instruction won’t be transmitted on the data bus, since an RST instructions opcode is constructed in the following way:

meaning that we only have 3 bits to work with and that’s it. To work this problem out, we need a little outside help that comes in the form of the RST 7 instruction feature of the 8228 system controller. Just to make things clear, and RST 7 instruction is a regular RST instruction with the reset vector of 0x38. An RST 7 opcoce is 0xFF, or an instruction byte of all binary 1’s.

To activate this feature, a system designer needs to connect the INTA output from the 8228 to a voltage source through a resistor as specified on page 5-8, and the 8228 will be ready to send an RST 7 instruction to the 8080 through the data bus as soon as the 8080 acknowledges and interrupt.  Remember that the 8228 (if used in the system) sits between the 8080 and the data bus, so it can intercept signals before they go from the 8080 to the other chips through the data bus, and vise-versa.

After the RST 7 instruction is executed, the 8080 will begin executing code from address 0x38. Now it’s the programmer’s job to write the code that “talks” with the 8214 array in order to read the extended interrupt vector, and figure out the address for the real interrupt handler. But again, how can the 8214 array point to more than 8 different interrupts? Since the 8214 was relived from the responsibility of transmitting the RST instruction to the 8080 (which is now done by the 8228), outputs from multiple 8214s can now be multiplexed into the 8212 input pins. This method can support up to 40 different interrupt vectors. Here’s an example of an array that can support up to 16 interrupts:

Notice that in this implementation, input pins D0-D2 are on the 8212 are hard wired to 0, meaning that watever vector is sent down the data bus is already a multiple of 8 (if you aren’t sure why, check out how multiples of 8 look like in binary), so the programmer skips multiplying the interrupt vector by 8. In this implementation the top 8214 is the higher priority controller, because the ENLG pin output of the top 8214 goes into the bottom chip’s ETLG input. This means that the bottom chip’s interrupts will be serviced only when there are no interrupts waiting to be serviced in the top 8214.

Since this interrupt controller array isn’t autonomous anymore (it doesn’t send the interrupt vector through the RST instruction), a programmer must somehow be able to read the interrupt vector from the array. This can be done by activating the array (chip selection) and then reading the array’s output on the data bus.

In the previous post we’ve covered memory chip selection through the address bus, and general chip selection can work in a similar way. There are many ways to implement chip selection through the address and control bus. The only thing a system designer needs to be careful about is collisions. The 8080 has different memory read and write instructions, and their execution causes the 8228 to send the according MEMR or MEMW signals down the control bus. This means that if a system designer wants to implement interrupt controller array using a 8205 that is connected to the address bus, he must make sure that the address the programmer uses to read the data from the array isn’t also a functioning memory address. In case such a collision happens, both the interrupt array and the memory array will try to send their data down the data bus, and the result will be undefined.

Since the communication with the interrupt array is not memory communication, it is considered as chip I/O (Input/Output). Chip selection using the address bus is called Memory Mapped I/O (MMIO). In a system that uses MMIO, if pin A15 is used to activate the interrupt array, a programmer can read the interrupt vector by using a single MOV instruction targeting address 0x8000. While this method allows for very simple and intuitive programming, it “eats” a chunk of the address space, meaning that the amount of addressable memory is decreased.

The 8080 features an alternative to MMIO in the form of isolated I/O (or just I/O). The I/O address space is made of “ports”, and reading/writing from/to them requires special instructions. The main advantage of using isolated I/O is that it doesn’t diminish the amount of addressable memory.

Physically, isolated I/O is implemented through the IOR and IOW signal outputs on the 8228 chip:

and as far as the programmer is concerend, this is how the address spaces look like using the different methods:

In the above example A15 pin is used for general I/O chip selection, meaning that the programmer is left with only 32KiB of addressable memory (which might be fine for some systems).

To support the isolated I/O model, specific instructions were implemented: IN and OUT, which are synonymous to read and write respectively. A port number must be specified with the IN/OUT instruction and the content of the data bus is read to the accumulator or written from the accumulator to the data bus respectively. The reason only 256 ports are addressable through isolated I/O probably has something to do with instruction length (this way the instruction size can be reduced to two bytes, one for the instruction and one for the port address), and the thought that 256 different ports seem like a large amount of chips that can be selected with a single instruction.

Lets get back to the interrupt array example.

When an interrupt is accepted by the 8080, the 8228 sends and RST 7 instruction back to the 8080, which begins executing the single interrupt handler. The interrupt handler’s code will probably look like this:

First thing is to save processor status (as discussed in the previous post). Then, the interrupt vector is read from the interrupt array to the accumulator register using the IN instruction. now a-16 bit address is set using registers L and H, representing low and high address bits respectively. Notice the base address of the interrupt handling routine array loaded into register H, while the specific interrupt routine’s offset from the beginning of the array is loaded to the L register. The PCHL instruction means load L and H to PC register, which is synonymous to “jump to the address represented by a combination the values in H and L registers”.

This is how I/O works in terms of chip selection and programming. We can now continue to the next subject, peripheral I/O.

For reference, here’s a diagram of a 8080 based system:

  • I/O peripheral interface – Lets consider a common input device – a fully decoded keyboard. “Fully decoded” means that the keyboard has its own processor that handles polling for key presses, and outputs bits that represent them (using an ASCII table for example) through a data bus that can be connected to our system. While it sounds surprising that keyboards might be anything but “fully decoded”, remember the Busicom calculator’s firmware from the previous post, the 4004 actually handled polling and decoding key presses!

This keyboard however can’t (more correctly – shouldn’t) just jam a byte (representing a character) down the data bus when ever a key is pressed. For synchronization and control we need a chip that can fulfill the specifications required by the 8080’s interrupt system. The 8255 is the right chip for this job. It is a programmable peripheral interface chip that comes in a 40-pin package. There reason for the large amount of pins is versatility. The 8255 can be implemented in many ways and reprogrammed on the fly by software – but in this post we’ll focus on the 8255’s mode 1 that can handle keyboard and display I/O (the chip’s functions are explained in detail starting from page 5-113 in the manual).

Lets look the the 8255’s pin-out and internals:

We can see that the chip has its own internal data bus connecting 3 different ports (A, B and C that is split to lower and upper 4 bits). The internal is connected to the host’s data bus through a data bus buffer. The chip also features control logic circuitry that connects to the host’s control, data and address bus. In theory, each port can handle I/O to a single device, and port selection is implemented by using 2 bits from the address bus (A0 and A1).

But before the 8080 can select a specific port, it needs to be able to select the 8255 for communication first. Here’s an example of how 8255 chip selection can be implemented through MMI/O without additional 8205 decoders:

Address pins A0-A1 are used for 8225 port selection, and A2-A14 are used for direct chip selection (each address pin is connected to an individual chip select pin on the 8225). This way 13 different 8225 can be selected, while the memory address space gets cut in half, but this might be a very good solution for systems with a small amount of memory while also reducing the cost of the system (because extra 8205 chips aren’t needed for chip select decoding).

An isolated I/O chip select similar to the above would limit the possible number of addressable chips to 6 (since an I/O port is 8 bits in size, 2 of them used for 8225 port selection). Again, this is with direct chip select using the address lines – adding 8205 decoders can allow much more chips to be selected using I/O ports.

Now that we have chip selection figured, lets see how we can control the chip to set it up for proper communication with our keyboard.

When the system boots, or when the 8255 is reset, it will automatically go into mode 0. For our implementation, we need to put the chip in mode 1. This is done by sending a control word to the 8255 down the data bus. Since the 8255 has 3 ports, when the A0 and A1 pins are both 1, it means the word on the data bus is addressed to the 8255 itself (and not to a peripheral connected to port A,B or C) and it is a control word. Here’s the chips full control table:

Remember that all the signals with the line above them are “active low”. The control word on the data bus programs the 8255’s mode for each group as explained by this diagram:

There are many possible combinations that can be programmed mixing up between groups and modes, and they are described starting from page 5-116. As mentioned, we’ll be focusing on mode 1, which is strobe I/O mode. With strobe I/O, the peripherals and 8080 communicate indirectly through the 8255. Lets see what happens when a key is pressed on a keyboard connected to port A which acts as input mode 1:

  • The keyboard’s internal circuitry decodes the keypress and translates it to 6 bits of data.
  • The keyboard uses pin PA6 (pin 6 of port A) to strobe (signal) that there’s keypress data ready on the PA0-5 pins.
  • The 8255 latches the data from the keyboard to an internal buffer.
  • The 8255 sends an INT signal through PC3 pin (pin 3 of port C). This port should be connected to the 8080 through a 8212 or through a combination of 8212 and a-8214 priority interrupt chip.
  • The 8080 eventually executes the interrupt handling routine, which will select the interrupting 8255, and read the latched decoded keypress on the data bus, and save it memory.
  • An ACK signal is sent from the 8080 to the keyboard to notify its circuitry that the decoded key was read, meaning that the keyboard is now ready do decode the next keypress.

Here’s a diagram of the connection:

You can see that we have another peripheral device connected to the 8255 – a Burroughs self-scan display which is a small system by itself. You can watch a video on how it looks and works here (I/O explanation starts at the 5:45 mark).

So lets see how a cycle of reading a character from the keyboard and printing it on the display works:

  • Keyboard input as described above. In the end a byte representing a character will be saved to RAM in a known address, lets say it’s saved to a “ready to print buffer”. A variable that represents the number of characters sitting in the “ready to print” buffer is incremented by this routine.
  • before the keyboard input interrupt routine finishes, it reads  the ACK signal from the display. If the ACK signal is high, it means the display is ready to print a character, so a display print routine will be called before the keyboard input interrupt handler finishes. If the ACK signal from the display is low, the keyboard input interrupt handler finishes.
  • The display print routine translates the bytes stored in ram to data the display “understands”, and it to the display by writing  the data to the 8225 chip. The 8225 latches the data to pins PB0-8 and then sends a signal down the DATA READY pin. When the display receives the DATA READY signal, it’ll read the byte from pins PB0-8, and send it to its internal circuitry. Once the data is sent to the display, the display print routine finishes.
  • When the display is done printing the character, it’ll send an ACK signal to the 8255, which will trigger another (different) interrupt.
  • The interrupt handler that deals with the display’s ACK will check if there are any more charecters waiting to be printed from the “ready to print” buffer (that’s located in RAM). If there are charecters to be printed, the routine will call the display print routine. Before it finishes, the interrupt handler will decrement the number of charecters in the “ready to print” buffer (because one just got printed).

A ring buffer can be implemented in software to handle this scenario. You can see that the software programmer needs to be very accurate in order for this system work properly. The program described above is the “software driver” that drives the hardware. Without the software driver, this sophisticated system will be useless.

Now that we have a good understanding of how memory and I/O reads/writes work, we can discuss one last feature supported by the 8080 – DMA (Direct Memory Access). 

With DMA, peripheral I/O chips can be designed to write directly to memory. While the 8255 doesn’t support this feature, other chips can be designed to support it. The way DMA is implemented is simple: A device with DMA capabilities will prepare the target address (and data in case of a memory write), and when it is ready, it will drive the signal on the 8080’s HOLD pin high. The 8080 will sample the HOLD pin during T2, and if it accepts the request (depends on conditions described on page 2-13), it’ll drive the HLDA pin high, signalling to the requesting device that the CPU is now suspended.

The device then executes the memory read/write, and drives the HOLD pin low once it’s done. The 8080 then resumes normal operation. Simple.

DMA saves a considerable amount of time. Instead of interrupting the CPU, causing it to execute routines to read the data from the device to internal registers, and them write the data from the registers to the memory – a device can get everything read, and execute the read/write to memory in just one cycle! Of course the device won’t just decide on where and what to write to memory, that’s where the software driver that is in charge of the DMA operation comes in.

DMA is used extensively in modern computers, and the chip arrays that handle communications with the device and DMA operations are called controllers (USB controller, SATA controller, etc..), or adapters (Display adapters, etc..).

* * * * *

This post concludes the introduction series. The purpose here was to answer the simple question “How computers work” without leaving anything in the dark or refer to something as “magic” or “it just works”. This series also set the basic knowledge-base so I won’t have to repeat things, or have to include a low-level intro in future posts.

There are many more interesting subjects to discuss, and from this point it would be impossible to set them up in a linear fashion like these series was set. That’s why from here I’ll make post on different subjects, going as low-level as need to understand exactly how things work.

I’ll also be accepting post requests on any hardware or software subjects. So if you have any, write them down in the comments, or send them to the email in the side-panel.

Hope you found this series informative and enjoyable to read. Feel free to leave comments, and ask questions.

 

demo

Posted in An introduction to modern computers | Leave a comment

Post navigation

  • Older posts

Categories

  • An introduction to modern computers (9)
  • How Operating Systems work (2)
  • Prolog (1)

Posts

  • October 2017 (2)
  • September 2017 (5)
  • August 2017 (5)

Useful Links

Circuit Simulator

Intel 4004 emulator

Intel 8080 User’s Manual

An introduction to modern computers

1 – Electricity basics

2 – Basic electric circuits

3 – Transistors, Boolean algebra and Hex

4 – Clocks and Flip-Flops

5 – Memory and how DRAM works

6 – How the CPU works

7 – Computer Programming with assembly

8A – Intel’s 8080, interrupts and the new stack

8B – I/O, Peripherals and DMA

How Operation Systems Work

Part 1 – The forgotten history of Operation Systems

Part 2 – Privilege control and memory protection

Proudly powered by WordPress
Theme: Flint by Star Verte LLC