CPU Design: Part 3 - Power

Power efficiency has increasingly become an important issue in CPU design. Performing calculations is not free - you have to supply power in order to enable the CPU to move electrical charge carriers around in such a manner as to perform computations. There are many techniques that CPU designers use to try and minimize the necessary power, but before we get to that, it’s useful to look at why CPUs consume power in the first place.

Why CPUs need power​

At the most fundamental level, CPUs operate by moving electrical charge through circuits to charge and discharge capacitors. An archetypical capacitor consists of two parallel electrically-conducting plates, with a “dielectric” (an electrical insulator) between them. Placing electrical charge of one polarity on one of the plates induces charge of the opposite polarity on the other plate.


Capacitors can be thought of as charge storage devices. Much like a bucket holds water, a capacitor holds electrical charge. The amount of electrical charge stored by a capacitor tells us the voltage difference between its two plates, as will be discussed later.

Note: voltage refers to electric ”potential.” I like to think of it like water on a hill. The top of the hill has higher potential than the bottom of the hill. Electrons like to roll down hill, from higher potential to lower potential. (”Holes,” the positive charge-carrier in semiconductors, which are really the absence of an electron where there could be one, roll the opposite way, from low potential to high potential.). Many people prefer to think of voltage like pressure; higher voltage means higher pressure on the electrons. What’s important is that voltage always refers to a difference in potential. Engineers always are thinking about voltage differences. The top of a hill may be 1000 ft, and the bottom may be 100 ft, in which case the difference is 900 ft. But if the top of the hill is 1800 ft and the bottom is 900 ft, the difference is the same. And it’s the difference that matters. When we refer to a wire being at, say 1.2 V, what we mean is that the voltage difference between the wire and “ground” is 1.2 V. But ground is arbitrary. It could be higher or lower than ground in some other circuit. You can thus have a wire at a negative voltage, if it is at a lower voltage than an arbitrarily-defined ground.

When we refer to the voltage on a capacitor, we are referring to the voltage difference between its two plates.


The “capacitance” - essentially the amount of charge that the capacitor can hold - is determined by this equation:

C = eA / d​

In this equation, A is the surface area of each plate (we assume that they have the same surface area), and d is the distance between the plates. e is a physical constant that is determined by the material used for the dielectric. For example, pure SiO2, or glass, which was often used as a dielectric on CPUs for many years, has a dielectric constant of around 3.9 x e0, where e0 is a fundamental natural constant corresponding to vacuum. In other words, when you use glass instead of vacuum between the plates, your capacitance will be 3.9 times as high.

For this reason, one active area of research and development in CPU design involves replacing SiO2, at least for unwanted capacitors, with other dielectric materials that have lower dielectric constants. It turns out that power consumed is proportional to the capacitance, so if you can halve the dielectric constant you can halve the power. For example, adding fluorine into the SiO2 can reduce the dielectric constant to the low 3’s. Other dielectrics, like polymide or paralyne, have dielectric constants in the 2’s. One company in the 1990s tried to use air bridges - in other words, replacing dielectric with air - which has a dielectric constant of close to 1. It didn’t turn out well for them.

Of course one can also reduce the surface area of the conducting parts of the capacitor, or increase the distances between conducting parts, to reduce capacitance. This can require a lot of work.


Two sources of capacitors

There are two primary sources of capacitance in CPUs. One is inevitable and required for the CPU to operate. The other is an unfortunate parasitic effect that is the unfortunate consequence of having to have wires in the CPU. We’ll discuss the latter first.

A CPU design consists primarily of transistors and wires that connect them. These wires are usually referred to as “interconnect.” There are multiple levels of wires, stacked on top of each other, with dielectric between them. The wires at the bottom of the stack, closest to the transistors, are usually the smallest. That is, they usually have a smaller width and length, and typically a smaller height as well. Many wires exist at each level, and there is dielectric material between the wires on a given level, as well as between the levels. In the figure below, the blueish material is dielectric, and the brownish features are wires.




These wires, each of which is a conductor, separated from each other by dielectric material, forms millions or billions of parasitic capacitors. The height of the wires and the length of the coincident portions of neighboring wires determines the cross-sectional area, A, of each capacitor. The distance between the wires determines d.

7A0773D6-8700-49E7-9819-314158EAB099.png

For example, in the above figure, the cross-sectional area, A, is h x L1, because L1 corresponds to the coincident parts of the left and right wires.

The same analysis applies upward and downward. Wires may be found above a given wire, creating more capacitors where the wires cross. Wires may also be found below a given wire. And for the wires closest to the transistors (frequently referred to as the M0 or M1 layer), there may be capacitance between the wires and the semiconducting substrate (i.e. the silicon).

Physical designers work hard to try and minimize all these capacitances. They try to keep wires as short as possible, and spaced as widely as possible, to avoid creating big capacitors. This effort starts at the floorplanning stage, and continues all the way through ”placement” (deciding where the transistors go), and “routing” (deciding where the wires go). In my experience we often times manually position wire segments or placed temporary blockages to prevent automated tools from putting wires in certain places in order to try and minimize capacitances.

The second source of capacitors is the transistors. The term “CMOS” refers to “complementary metal oxide semiconductors.” ”Metal oxide semiconductor” refers to the transistor structure. The transistor “gate,” which can be thought of as the switch that turns the transistor on or off, is constructed by layer metal on top of oxide (i.e. SiO2) on top of semiconductor. The metal portion of the conducting gate acts as one plate of a capacitor, and the semiconductor acts as another. The oxide between the metal and the semiconductor is a dielectric. This forms a capacitor.


This is most easily seen in the above image of an old-fashioned MOSFET, where the gate is formed from a red conductor over a gray insulator over the P-type semiconductor substrate. (The SiO2 arrow is a little misplaced).

The capacitance of the gate will be determined by its dimensions - its width and length. The strength of the transistor - how much capacitance it can drive and how fast it will turn on and off - is also a function of the width and length of the gate (specifically, width divided by length.). The bigger W/L is, the more powerful the transistor is, but the more capacitance it shows to any transistors connecting to its gate. For this reason, CPU designers try to use the smallest W/L possible that still meets timing and other requirements.

Unfortunately, unlike the “field dielectric” between wires, you don’t want to decrease the dielectric constant of the gate. Reducing the dielectric capacitance decreases the performance of the transistor, and may even make it difficult to fully turn the transistor off (which, as we’ll see, is another source of unwanted power consumption).

This all creates a difficult optimization problem for CPU designers. For example, spreading wires apart may improve wire capacitances, but can require increasing the size of the driving transistors to compensate, thus increasing capacitance seen by other transistors.

Types of power consumption​

When we think about power consumption in a CPU, we generally think of there being two types: static and dynamic. At least for CMOS, for many years we ignored static power because it was negligible, approaching zero.

Static Power Consumption​


Static power is the power that is consumed by the CPU regardless of whether it is doing anything. This is different than “idle power.” Idle power generally refers to a situation in which the CPU is not busy computing anything. But even when idle, transistors on the CPU could be, and typically are, turning on and off. Static power, on the other hand, is power consumed when no transistors are switching. The advantage of CMOS, as opposed to earlier MOS technologies such as NMOS, is that in CMOS the static power is greatly reduced.

In CMOS, transistors can be thought of as being stacked (conceptually) between a power supply voltage rail and electrical ground. Current flows from the power rail, through the transistors, to ground. Current is, by definition, the moving of charge. The equation is I=dq/dt. Here I is the current, and dq/dt is mathematical notation for “the change in charge over time.” The change of charge generally requires that charge has moved. And the moving of charge requires power.

In CMOS there are two types of transistors - NMOS and PMOS. A FET transistor has three logical terminals: source, gate, and drain. Current flows from drain to source (or source to drain, depending on your viewpoint) if the gate voltage is set such as to allow it. The gate voltage can be changed so as to allow or disallow the current to flow between the source and drain. NMOS and PMOS transistors reverse the gate voltage needed to enable current to flow. For NMOS, the gate must be set to a high voltage to allow current flow. For PMOS, the gate must be set to zero volts. The idea behind CMOS is that there is never a path from the power rail and the ground, because when NMOS transistors are on, PMOS are off, or vice versa.

The simplest CMOS logic gate - the inverter - illustrates this nicely. An inverter takes its input and “inverts“ it. So if the input is a 0, the output is a 1, or vice versa.


The circuit schematic for a CMOS inverter is above. There is a P-channel device (a PFET) and an N-channel device (an NFET). Vcc is the power supply voltage rail, and the symbol for the ground rail is at the bottom of the diagram. (Note: we usually use Vss, not Vcc, when referring to CMOS. Since this diagram, oddly enough, uses Vcc, I’ll continue to use that in this article).

When the input is a logical 1, then the NFET is turned on, because “IN” is connected to its gate. This allows charge that is above the N-channel transistor to flow through the N-channel transistor to ground. So, for example, if any charge is accumulated on the node marked “OUT,” that charge will be removed, which will tend to reduce the voltage on OUT toward 0 (so long as more charge isn’t being injected onto OUT from someplace else, like from Vcc).

It’s important to note the relationship between voltage and charge on a capacitor. The voltage difference between the two plates of a capacitor is proportional to the charge stored in the capacitor: less charge means a lower voltage. The equation is simply V=q/C, where C is the capacitance of the capacitor (essentially how big the “bucket” is), q is the charge stored on the capacitor, and V is the voltage. So, when the OUT wire, which acts like a capacitor and stores charge, loses charge, its voltage decreases. A voltage of 0V is typically a logic 0, and a voltage somewhere around 1V (it’s been declining over the years) is typically a logic 1.

The P-channel device is turned off because it behaves in the opposite manner of an N-channel device. When the IN wire is a logical 1, the PFET sees a 1 at its gate, which turns off PFETs. As a result, no charge can move from Vcc down to the OUT node. The only charge that can be moved in this scenario is thus from OUT down to ground, reducing OUT to a logic 0.

If, on the other hand, IN is a logic 0, then the PFET is on and the NFET is off. This prevents charge on OUT from falling down to ground, but allows charge to fall from Vcc down to OUT. Since OUT then accumulates charge, its voltage corresponds to a logic 1.

The idea is that the PFET and the NFET should never be on at the same time.

If, however, both transistors are on for some length of time, or if one of the transistors can’t be shut all the way off, then charge can move from Vcc down to ground - this results in undesirable static power consumption.

Note: The above discussion refers to “static CMOS” circuits, which are by far the most common type of CMOS circuits. For a long time, Intel used “dynamic CMOS” circuitry in its ALUs and in other parts of its chips, and sometimes dynamic CMOS may still be used. In a typical dynamic CMOS circuit, the output wire is ”precharged” at the start of each clock cycle. If the output is supposed to be a 0, then NFETs connected between the output wire and ground will turn on (because of appropriate inputs), causing the output node to discharge. This is very fast. One problem is that the inputs must be 1’s in order to accomplish this, because only NFETs are involved in the “pulling down” of the output. Often there is a requirement that there be a static CMOS inverter between every dynamic CMOS logic gate. The speed of dynamic CMOS circuits is typically faster than static CMOS. But the power consumption is also typically much higher.


C7438E15-C0B2-42FB-9638-B0B930DE8E5B.png


This is illustrated in the graph above. During the time t, both the NFET and PFET in the inverter are turned on, allowing some current to flow between Vcc and ground, consuming power.

Unfortunately, unlike, say, in the 1980’s, static power consumption has become a problem. There are two primary reasons for this. First, as the transistor gate lengths have become smaller and smaller, it has become more difficult to turn the transistors off. I like to imagine a garden hose with water flowing through it. If I step on it with my foot, the water stops flowing. But if I try to squeeze it with a single finger, it becomes much more difficult to shut off the flow through the hose.

This effect is sometimes referred to as a quantum effect. The idea is that CPU designers have to think in terms of quantum physics. When we set voltages on nodes we are determining the probability that charge carriers (electrons or holes) will be where we want them. But the more that quantum physics dominates (because of shrinking dimensions), the probabilities go down. The probability that charge carriers will be in unexpected places increases. Schroedinger‘s equation tells us the probability, for example, of trapping an electron by setting a voltage. Given shrinking dimensions, it becomes more and more difficult to trap the electron.

This is why MOSFETs were abandoned for high end processors, and replaced with FINFETs and increasingly complicated structures. Going back to the garden hose, if I use a single finger to smush down on it, it’s difficult to shut off the flow. But if I squeeze all sides of the hose with fingers, it becomes a lot easier. The increasingly exotic transistor structures we are seeing a 3nm and below are intended to apply an electrical field on the gate from multiple locations to surround the gate from as many sides as possible, to improve the ability to shut off the transistor.

The second issue is slow edge rates. The input to our inverter may switch from a 0 to a 1 or a 1 to a 0. While it is in the middle of that switching, both transistors may be on for some period of time. This is because the transistor doesn’t fully shut off until the voltage has gotten very close to its endpoint (either 0 volts or Vcc volts, corresponding to a logical 0 or 1).

As we increase the relative capacitances of things, for example because transistors are getting smaller faster than wires are getting smaller, then it gets harder to make the wires switch quickly. Switching quickly requires increasing the size of the transistors that are driving the signal. But that, in turn, means that the transistors driving those transistors need to get bigger too. Additionally, electrical resistance on the wires has been becoming a bigger effect as we to smaller and smaller nodes. This also makes it more difficult to rapidly swing the voltage on wires. For this reason, in the mid-to-late 1990’s, CPUs started moving from aluminum to copper interconnect. Copper is less resistive than aluminum (though it oxidizes very easily, which is important to avoid).

One important way to reduce static power consumption is to turn off the voltage supply to unused portions of the chip. For example, if not using a multiplier during a given clock cycle, the portion of the Vcc rail connected to the multiplier can have its voltage reduced to 0 volts (ground). If both the Vcc and ground rail are at the same voltage (0 volts, for example), then no current will flow between them. (This is Ohm’s Law at work.). By the same principle, instead of reducing Vcc to 0, the local portion of the ground rail can be raised to Vcc volts. This requires that the power grid, which is typically a literal grid of metal on various metal layers, be broken into separate sections, and that there be special circuitry that is used to control the voltage on one or the other power rail in a dynamic fashion.

Dynamic Power Consumption​


Dynamic power consumption is the power consumed by switching transistors. This is power that is required to charge and discharge the gates of transistors and to charge and discharge wires (to switch their values from 0 to 1 or vice versa).

We know that, in CMOS, dynamic power consumption is equal to C x V^2 x f. C is the capacitance being charged or discharged, V is the voltage difference between Vcc and ground, and f is the frequency - how many times per second - the capacitance charges and discharges. Depending on how you determine f, you may need to divide this equation by 2.

This gives us several avenues to try and reduce power consumption.

First, we can try to reduce the capacitance. As discussed above, this is usually accomplished by reducing the lengths of wires, spreading wires apart, and properly sizing transistors. This can also be accomplished by using new dielectric materials for the material that separates the wires.

Next, we can reduce the voltage. Voltage is squared in the power equation, so a little reduction goes a long way. Unfortunately, reducing the voltage also slows the transistors, and makes it more difficult to rapidly switch wires from 0 to 1 or vice versa. Reducing voltage also reduces the signal-to-noise ratio - just because you reduced your CPUs voltage doesn’t mean that energy injected by external noise sources, such as fluctuations in power supply voltage due to activity by other chips on the motherboard, will decrease. This means errors may be more likely. And there is an absolute minimum voltage, determined by the semiconductor materials used to make the transistors, below which the transistor won’t work.

However, one can also effectively reduce voltage to 0, at least for small portions of the chip that aren’t being used at a given time, as discussed above with respect to one way to reduce static power consumption.

Finally, we can reduce how often transistors switch. We have various ways to do this. One way would be to reduce the clock frequency. This is why most CPUs now do frequency scaling, turning the clock speed up and down based on how much work needs to be done. Another way is to turn off the clock where it is not needed, using “clock gating.” If a particular subunit doesn’t need to be used during a particular clock cycle, we can turn off the clocks that connect to its input and output flip flops. This prevents them from changing state, even temporarily, during the clock cycle, and thus prevents signal transitions on transistors and wires, reducing power consumption. The circuitry for gating clocks is simpler than the circuitry for turning off power supplies, generally, but gating clocks doesn’t help so much with static power consumption.

We can also be very careful about how we choose which inputs to assign to which wires in various gates. Consider a CMOS NAND gate, the circuit diagram of which follows:


This performs the opposite of an AND function. If both A and B are logical 1, then the output can discharge to ground, and the output becomes a logical 0. A and B are connected to two different wires, coming from prior gates on logic paths, or from prior flip flops. Imagine that B reaches its intended value, 1, first. If A is still a 0, then A is off, and the output can’t discharge at all yet. Once A becomes a logic 1, then the output can finally discharge. This has implications in how long it takes the circuit’s output to switch from 1 to 0. For speed, you would want A to arrive first, allowing the output to start discharging across transistor A before B arrives.

But for power, there are other considerations. Imagine that A temporarily switches from 0 to 1 before switching back to 0 during a clock cycle. It is very common for signals to switch back and forth during a clock cycle, because whatever is generating the signal has multiple inputs that, themselves, arrive at various times and cause the output to flip back and forth before settling on a final value. If A switches on for a bit, the output will discharge at least a little through transistor A. If B also flips back and forth, then the output could discharge quite a bit all the way down to ground. If the finally value of the output should be a 1 (because the final value of A or B is a 0), then the output will need to charge again. All of this takes power. For that reason, if an input is likely to switch back and forth a lot, you may want to connect it to input B. If transistor B turns on and off but transistor A stays off, then the output cannot discharge at all down to ground.

When choosing logic gates and choosing how to connect them, a logic designer thus tries to be careful to make such choices in a way that minimizes this sort of “transient” logic circuit switching.

Conclusions​

The above discussion deals with things that can be done in the “physical” or “logic” design of a CPU in order to minimize power consumption. Of course there are also architectural considerations. For example, it is important to avoid unnecessary work. Bad branch prediction, as an example, will result in more wasted work because instructions will execute speculatively and the results never used. Similarly, bad cache design can increase power consumption because it takes more power to read RAM than it does to read the caches. An instruction set that doesn’t have enough registers can raise power consumption because it takes less power to read and write a register than it does to read and write cache memory. And complicated CISC instructions can require very complicated instruction decoders with lots of capacitance that needs to be switched, and lots of speculative execution that doesn’t get used, in order to align and decode instructions.

Power consumption is a very complicated issue, touching everything from floorplanning to ISA to the shape of the polygons in transistors, but it has become incredibly important and is a key differentiator between, for example, Apple-style and Intel-style microprocessor designs.
About author
Cmaier
Cliff obtained his PhD in electrical engineering with concentrations in solid state physics and computer engineering from Rensselaer Polytechnic Institute. Cliff helped design some of the world’s fastest CPUs, including Exponential Technology‘s x704, Sun’s UltraSparc V, and many CPUs at AMD, including the original Opteron and Athlon 64.

Cliff’s CPU design experience ranges from instruction set architecture, including contributions to x86-64, to microarchitecture (especially memory hierarchy design), to logic and physical design (including ownership of floating and integer execution units, instruction schedulers, and caches). Cliff was also a member of AMD’s circuit design team, and was responsible for electronic design automation at AMD for a number of years in the Opteron era.

Cliff has designed both RISC and CISC microprocessors, using both GaAs and silicon, and helped design two different bipolar microprocessors before shifting to FET technology.

Comments

This was quite informative. As an enthusiast, I've spent years hearing terms like CMOS, MOSFET, and FINFET tossed around, and I vaguely knew what they were, but not in this detail. Also, Parasitic Capacitors would make a great band name. Thanks for the new article, Cliff, much appreciated.
 
This was quite informative. As an enthusiast, I've spent years hearing terms like CMOS, MOSFET, and FINFET tossed around, and I vaguely knew what they were, but not in this detail. Also, Parasitic Capacitors would make a great band name. Thanks for the new article, Cliff, much appreciated.
I think my next one will be “introduction to semiconductor physics” and I will explain how a FET works. Real crowd pleasing topic.
 
Next article will be about “Memory,” which will be a prelude to an article on caches.
 

Article information

Author
Cliff Maier, PhD
Article read time
15 min read
Views
403
Comments
3
Last update
Rating
5.00 star(s) 1 ratings

More in General Technology

More from Cliff Maier, PhD

Top Bottom
1 2