In the previous posts we did not discuss performance issues at all despite performance being a major factor in computing. So lets talk about performance. We use computers as tools to assist us in doing whatever needs to be done, and we expect them to do it fast. would you find it acceptable if it would take a hand held calculator one whole minute to calculate and display the result of a simple multiplication operation?
So what can cause a performance bottleneck in a well designed logical circuit? The current flows (the propagation of the electromagnetic wave mentioned in Part 1) through the wires at nearly the speed of light, so it seems like the only thing that stands between us and a really fast computer is a super fast clock generator to drive our switching logic. One problem though – physics.
Since transistors control the flow of electrons by saturating a “gate” with electrons (or draining electrons from mentioned gate) there’s a delay in flow control (a delay in how fast a transistor can turn “on” and “off”). This means that if you keep cranking the clock speed up, eventually a one of the transistors in your circuit won’t get enough time to saturate and allow current to pass through it. This means that somewhere in your logic, instead of getting a 1 you get a 0. This is how a “bug” is born. A possible fix for this bug is to crank up the voltage as you increase the clock speed (this allows faster saturation of electrons in transistors) – however this added tension can destroy the fragile components making up your circuit. Over the years, materials and fabrication techniques improved, allowing for faster switching transistors which in term allowed an increase in clock speed.
So we got the speed issues with the switching logic covered. But what about the memory component? If you want a routine or a program to run, you need to use memory. A CPU without memory is just a bunch (a big bunch though) of switches flicking on and off following the ticks and the tocks provided by the clock. Could memory be a bottleneck in our system?
To answer that question, we need to make some things clear first. In this part of the post the term “memory” is used to describe a technology that stores bits (1’s or 0’s). An example for such technology is the hard disk, in which large amounts of bits (data) are usually stored. During the mid 1970’s the average seek time for a hard disk was 25ms, and that’s not including the overhead of reading/writing the actual data, and moving it all the way back to the CPU.
Lets assume that with all overheads combined, reading a randomly located bit from the disk would take 30ms on average. Now lets imagine a system in which an Intel 4004 (clocked at 740kHz), which could execute 46300 to 92600 instructions per second (the meaning of an instruction will be discussed in the next post), is paired with a mid 70’s hard disk (which is used as the CPU’s main memory). In this scenario, each random access to memory (needed for a single bit read/write) would cause the CPU to stall (while waiting for the data to be read or written) for a period of time in which it could have executed at least 2000 more instructions!
Notice that the calculations above were made based on random access to memory. When we are talking about memory which will be used by a CPU, we are talking about RAM (Random Access Memory), since the CPU should have the ability to access memory in a non-sequential pattern.
So how do we deal with this memory bottleneck? A natural solution would involve discarding all mechanical components. Over the years, several interesting memory technologies appeared (and some disappeared). One example is magnetic-core memory:
This technology however was expensive, and had implications on the amount of storage available due to the size of the toroids (each toroid holds a single bit, so you can actually see how much RAM is installed in your system).
SRAM (Static RAM) was introduced in the late 1960’s, and was based on a combination of transistors that would create a circuit (similar to a latch) that can “hold” a single bit:
SRAM is great. The only problem is that SRAM is expensive. The previously mentioned Intel 4004 was built from approximately 2300 transistors, meaning that an SRAM storage of just 1000 bits (just over 100 bytes) would require almost 3 times more transistors than the CPU.
DRAM (Dynamic RAM) was introduced during the late 1960’s to deal with the cost issues. Noways, DRAM is the most common type of RAM, which you can find just about anywhere. The commonly used DRAM cell is created by a combination of a transistor and a capacitor:
Here’s a TXT file for it.
You’ve probably noticed all the extra components around the SRAM and DRAM cell. They are necessary for writing and reading the bit from the cell.
If you’re wondering “why use the more expensive and complex SRAM when DRAM works just fine?”, and that’s a great question, the answer to which is -performance. In order to change the state of DRAM cell, a capacitor must be charged or discharged. This takes time. Changing the SRAM state is just a matter of activating a few transistors.
Another reason for the performance gap is the fact that the read cycle in DRAM is destructive, and the cell must be rewritten after it is read (this will be explained later in the post). This read overhead doesn’t exist in SRAM.
CPU cache uses SRAM memory cells, which are fast but expensive. So expensive in fact, that the amount of bits you can store in cache is extremely small compared to the main memory (Modern CPU’s L1 cache is several KBs compared to GBs of main memory). The main memory modules (installed in your motherboard) are DRAM based and significantly slower.
Since DRAM is the most common RAM in use by CPUs today, lets go in dept on how a DRAM circuit actually works.
One important thing to keep in mind while dealing with DRAM is that DRAM leaks. Because there is current leakage between the transistor’s gate, source and drain, the capacitor might discharge or gain charge from neighboring cells. SRAM doesn’t have this problem because it doesn’t use capacitors to store the bit. You can try it with the DRAM TXT file above, you’ll notice that over time the capacitor looses its charge. When the voltage drops below 2.5v, the op-amp (which figures out if the capacitor holds a 0 or a 1 by comparing the voltages on the two lines) will sense a 0 where there was previous a 1.
The solution to DRAM leakage is an independent refresh circuit that periodically refreshes all cells. This requires the cells to be read (destructive) and rewritten. During the refresh cycle the DRAM cells can’t be used. This widens the performance gap between DRAM and SRAM further (despite the fact that the refresh cycle takes about 1% of operating time for modern DRAM chips).
Lets look at an example of a DRAM array. First of all there’s the sample DRAM circuit in circuitjs (under circuits tab -> Sequential Logic -> Dynamic RAM):
This circuit is a good example to understand the basic usage of RAM in general. Bits are stored in rows (also called “word-lines”) and columns (also called “bit-lines”). In this case, there’s a single column connecting 4 rows, so the output data will always be 1 bit wide. Each column in a DRAM array has its own charge sensing unit (or “sense amplifier”). Basically, the row select in the above picture is the address of the data we want to read/write.
The operation of a real DRAM cell is a bit different than the circuitjs version. While the circuitjs version uses an op-amp for a direct voltage comparison between the capacitor and the +2.5v line, a real DRAM circuit makes use of the physical attribute of the bit-line. The bit-line (column) is a relatively long piece of conductive material means that can be charged with electrons. A bit-line can actually hold around 10 times the charge of a single DRAM cell’s transistor (which can hold the charge of a few femtofarads). Using this physical attribute, the read cycle is performed following these steps (Lets assume for this example that a fully charged capacitor is at 5v):
Pre-charge phase – The bit-line is pre-charged to half the voltage of a fully charged capacitor (if a charged capacitor is 5v, the bit-line will be pre-charged to 2.5v).
Row activation phase – When the row is “activated”, the transistor allows charge to flow between the capacitor and the bit-line. If the capacitor was empty, some charge will flow to it from the bit-line and the bit-line’s voltage will fall below 2.5v. If the capacitor was charged, some charge will flow from it to the bit-line, and the bit-line’s voltage will rise over 2.5v. During this phase the voltage of a fully charged capacitor will probably drop below 2.5, and an empty capacity will probably be charged over 2.5v. For this reason, the read cycle is considered destructive – the former state of the capacitor (representing the stored bit ) is lost.
Sensing phase – The voltage on the bit-line is compared with the voltage of another pre-charged bit-line to figure out if the cell’s value is 1 or 0. In order for this to work the DRAM array is organized in pairs:
You can see that when a word-line is activated, each cell is connected to a bit-line on its left, while on its right there’s a bit-line that will not be connected to a capacitor. The result of the comparison is stored in a buffer. Since the entire row is activated, this buffer is called the “row buffer”.
Recharge phase – The cells are recharged according to their sensed values.
While the circuitjs sample is nice for getting an impression of how DRAM works, it lacks an independent refresh circuit, doesn’t have separate controls for reading and writing, doesn’t have a row buffer, and so on…
I’ve created a two bit 4 cell DRAM array which contains the above mentioned features. Get the TXT file and check it out. Lets go over the main components:
First of all the refresh circuit is automated and performs a refresh every 40ms. While the refresh is in progress, the read and write controls are disconnected. The capacitor array has now doubled in size, and each read and write operation affect the two cells in each row. The result of reads is now stored in a row buffer.
While this configuration looks over-the-top for just 8 memory cells, keep in mind that in order to increase the amount of memory in the circuit, it is possible to add more columns (along with their sense amplifiers, write drives and buffers), without any need to change (add transistors) the DRAM control, row select and refresh circuits.
Much more information on how memory works can be found in this great book.
Here’s a real world implementation of the above as seen in this diagram of the Intel 4002’s (RAM chip) internals (holds 80*4 bits of data):
Notice the sensing and row buffer components for each column.
Another type of memory that should be mentioned is ROM (Read Only Memory). Besides containing the grid of rows and columns which hold the bits (by being hard-wired to ground or voltage source), a ROM chip should hold a control circuit that allows fetching of the data. There are several ways to implement a PROM (Programmable ROM), one of which is the use of UV-light. A manufacturer could program a ROM using UV-light, encase it in epoxy, and thus creating a one time programmable ROM (unless someone goes through the trouble of decapping a chip and reprogramming it with UV). The Intel 4001 is the ROM counterpart of the 4002.
By this point we’ve covered the basics of how memory, ROM and RAM works. In the next post we’ll talk about how a CPU works.
Hope you found this post informative. Feel free to leave comments, and ask questions.