Low-power design - IEEE Xplore

Viewer
Transcript

ISSCC 94 / SESSION 1 I PLENARY SESSION I PAPER WA 1.1

WA 1.1* Low-Power Design : Ways to Approach the Limits Eric A. Vittoz Swiss Center for Electronics and Microtechnology Inc., Neuchtel, Swtzerland

Introduction

Low-voltage,very-low-current integrated circuits techniques were originally developed more than 20 years ago for the watch, one of the first very large scale exploitations of CMOS technology. Modem watch circuits have a complexity ranging from a few thousands to several tens of thousands of transistors, combine microcontroller architectures with some highperformance analog circuits, and are routinely produced in tens of millions per year with a power consumption below 0.5pW at 1.5 V. These techniques have progressively found numerous other applications in small portable battery-operated instruments in the fields of instrumentation, medical devices and short-range low-frequency radio communication. Other industrial realizations are found in data acquisition systems powered by light or by RF fields, and new openings will come in the analog VLSI implementation of massivelyparallel processing systems based on the neural paradigm. In all these systems, low power consumption (ranging from sub-microwatt to sub-milliwatt levels) is the first constraint, for which speed and/or dynamic range have to be sacrificed. During this period, in the large majority of VLSI-based electronic systems, including most of the computer and telecommunication products, low power consumption has been close to last in the list of specifications. Technology improvements and design cleverness have been devoted mostly to reaching higher spced and higher precision. In recent years, there has been a sudden change driven mainly by an urgent need for portability, but also by concerns about the growing relative cost of power supplies and of heatremoval systems, and even about limits of power available from the local network. It has now become necessary to build new products with a t least the same performance in speed and dynamic range, but with stringent requirements on low power. This raises questions about the limits and how they can be approached. Limits of Low Power In analog circuits, the absolute limit comes from the need to maintain the energy of the signal much larger than the thermal energy, to achieve the required signal to noise ratio S1 N This condition can be expressed as a minimum power per functional poleP,,,,=SfkTS/N,wherefisthesignal-frequency bandwidth [l]. This limit does not depend on the technology. It is reached in a simple passive RC filter, whereas the best existing active filters are still 2 orders of magnitude above. It applies to amplifier stages with an additional margin proportional to their voltage gain. As shown in Figure 1, this limit is steep since it corresponds to a 10-fold increase of power consumption for each lOdB increase ofS/N. In analog circuits, power is proportional to frequency and to S/N. In digital systems, each elementary operation requires a certain number m of binary-gate transition cycles, each of which dissipates an amount of energy Etr. The number m of transitions is only proportional to some power of the number

Eric A. Vittoz Eric A. Vittoz received the MS and PhD degrees in electricalengineeringfrom the SwissFederal Institute of Technology at Lausanne (EFPL) in 1961 and 1969, respectively. He joined the Centre Electronique Horloger S A (CEH),Neuchitel, in 1962, where he participated in the development of the first electronic watches. In 1971, he became Vice-Directorof CEH, supervising advanced development in micropower circuits andsystems.In 1984,he tookresponsibilityfor the Circuits and Systems Research Division of the newlycreated Swiss Center for Electronics and Microtechnology (CSEM)in Neuchitel. In 1991,he was appointed Executive Vice-president, responsible for the IntegratedCircuitsand SystemsDepartment which specializes in developmentof micropowerASICs. He is directly responsible for the advanced research program of CSEM in the field of biology-inspiredanalog perceptive computation. Dr. Vittoz is an IEEE Fellow, has published more than 90 papers, and holds 26 patents. He is also a Professor at EPFL.

of bits N,,, and therefore power consumption is only weakly dependent on S/N (essentially logarithmically). Comparison with analog is obtained by estimating the number of gate transitions required to compute each period of the signal, for example m=50Nbi:. Immunity to thermal noise imposes an absolute minimum energy per transition Etnn estimated a t SkT, which provides the absolute minimum power limit. However, in practice E,,=CV; is forced to a much higher value (1015 to 10l2 Joules) by the need to recharge the equivalent capacitance C of each gate to the supply voltage V,. Power is then much higher, as shown in Figure 1. Cmi,is strongly dependent on the process feature size andV,isimposed by the need to achieve the required delay time Td and by established standards. Furthermore, if the activation rate of the circuit is very low (very small percentage of the Ng available gates in transition on average), then the standby current I, of each of the Ng gates may contribute to a non negligible additional static power consumption Psht. There are also several practical technical obstacles on the approach to the limits for analog circuits: the poor power

ISSCC94 / FEBRUARY 16,1994 /YERBA BUENA BALLROOM /9:15 AM

eficicsncy oKsome basic blocks such as amplifiers, their noli niinirnum noise factors due to inadequate architecture, losses in bias circuitry, and limitations on peak-to-peak amplitude Peakingin high-Q active filters contributes to the reduction 06 their signal-to-noise ratio, contrary to what occurs in their passive LC counterparts. Another important limit is the poor transconductance-tocrrent ratio gmA of MOS transistors. Noise produced on chip by high-level analog or by digital blocks may be orders of magnitude above thermal noise and may require a proportional increase in power to achieve the required dynamic range. Other obstacles are historical or even psychological. Analog blocks must often be taken from existinglibraries with bias currents at the milliampere level and with architectures t h a t are not compatible withlow voltageor low current. The use of very low bias currents is often discarded for lack of adequate transistor models or for fear of breaking the psychological microampere barrier. The requirements on PSRR are often exaggerated and mistaken for insensitivity t o noise generated on-chip. Digital library blocks are designed for speed only with no concern about power. Buffers are exaggeratedly self-loaded. Circuits are operated a t the maximum possible voltage and a lot ofeffort is spent in developing scaled-down processes that accept a minimum reduction in supply voltage. Digital architectures are completely synchronous with a fixed clock frequency adapted tothe highestspeedrequiredbyeachfunction.The overall system split is often made without any consideration forlimitingpower. Quitegenerally,companieslackthebasic power-conscious culture and designers need to be educated in this respect.

Many of the problems and solutions enc iunterchd 111 analog micropower circuits arc. directly related to the properties o f the MOS transistor itself, that must thebrefore be properly understood and modelled down to very low currents The essential features of a transistor can be captured in a symmetrical model where the drain current ID is the superposition of a forward component IF and reverse components IR, that are proportional to the same function of source voltage Vs and drain voltage V,, respectively, as shown in Figure 2. This core model includes only the 3 most fundamental parameters: the threshold voltage V,, the slope factor n and the specific current I, (typically 10to IOOnA for minimum size). A channel pinch-off voltage Vp that depends directly on the gate voltage V,. If IRB) * I, (VqD,>Vp), this component is in weak inversion and can be approximated by an exponential function instead of the usual quadratic function for strong inversion. Forward saturation is reached for I, B I,. Whenboth1,and &are in weakinversion, the whole transistor is said to operate in the weak inversion mode and behaves, in many aspects, similarly to a bipolar transistor. This mode provides the minimum possible drain saturation voltage (Vos?at = 4 to SU,) which helps reduce V,. The saturation current IS exponentially dependent on both V, and The transconductance-to-current ratio increases when the inversion coeffcientI#,isdecreased, andreaches amaximumvalueinweak inversion. Thus, when the current is limited, weak inversion provides the maximum bandwidth for a given load C or for a given kT/C sampled noise, and results in a minimum equivalent input noise resistance. Weak inversion also provides maximum voltage gain per device and minimum input offset in a pair.

Existing Micropower Techniques [21 CMOS has been used almost exclusively for micropower circuits for its ability to consume very little power in the absence of logic transitions. The sole particular feature of present-day micropower processes is the low threshold voltage required by the low supply voltage V,. However, if the threshold is too low, typically below 0.4V, the static component of power due to subthreshold current is unacceptable. The optimum nominal value thus depends on the spread, which can be kept below -t100mV. Junction leakage poses no particular problem at low voltage, and all standard approaches forminimizationofc toincrease speed are applicable to micropower.

p-/! =

lo-'*

-Analog:

P,in = 8f kTS/N 10.'~

1 0.18

:

G f Et, + PStat

4

--

NgVEJO

SIN [dB]

Elrmin=8kT c----,

0

30

60

large if activation rate w = n f T d - l

90

120

Other limitations to low voltage are the implementation of analog switches, which may require on-chip clock voltage multiplication, and the impossibility of stacking devices, which led to the development ofspecial low-voltage cascode schemes. The supply voltage is often adapted to the needs of analog blocks by high-efficiency up-converters.

50N&

Digital:

/=m

On the other hand, the maximum value of gJD results in a maximum mismatch of current mirrors due to small differences in threshold voltages, as well as a maximum value of drain current noise a t a given current. Low-voltage operation of current sources must thus be traded off with higher noise and worse precision. Much better results may be obtained by using bipolar-operated MOS transistors.

Ng

specific current

pc,,

W L

uT2

i,Zo"+y in; weak inv.:

................... . .

..

pinch-off: Vp Figure 1: Minimum power for analog and digital circuits.

IS = 2n

i-exp(-!P-V.)

V

UT

U T = kTiq

vq

l < n < 2 (slope factor)

Figure 2: Transistor continuous and synrmetrical model.

ISSCC 94 I SESSION 1 I PLENARY SESSION I PAPER WA 1.1

NCLUstrategies for low power

10

i

in Figure 3: Low-power OTA (2-phase operation).

Micropower analog blocks exploit the features ofweak inversion in architectures that are as simple as possible. As an illustrative example, Figure 3 shows an OTA for SC circuits based on a CMOS inverter. Operated in weak inversion, this circuit provides the maximum possible transconductance and the minimum noise that can be achieved with bias current I,. It operates in class AB with no slew rate limitation and no offset except that due to charge injection. It provides the maximum dc gain possible with a single stage. This gain can be increased beyond lOOdB by cascoding the inverter, while keeping the output swing just 400 to 600mV below supply voltage. The auto-zeroing property of the switching scheme results in good low-frequency PSRR and eliminates the l/f noise. A single additional capacitor C, is needed to obtain a SC integrator that can operate close to the theoretical limits of Fi&mrc1. Solutions for reducing the power are also found a t the system level. Examples are automatic range control to extend the total dynamic range of filters or converters much beyond its instantaneous value, and direct conversion in micropower receivers to minimize their high frequency part. Present micropower digital circuits mostly operate a t voltage V -1 to 2V obtained from a single cell, and usually compatible Hwith the required speed. RAMS are powered down while they are not accessed to reduce their Pet.,.Voltage is occasionally reduced locally under control of a critical delay time [31. The energy per transition Etris further limited by limiting C by various means. Minimum size devices are used everywhere, except on some critical delay paths. Logic gates based solely on series branches result in regular layouts with reduced C. The number of transistors per branch is limited to 3. Larger fan-ins are decomposed into several gates, actually reducing the total capacitance. The most simple schematics are systematically selected for a given function. In particular, dynamic circuits are used whenever possible, especially for high frequencies. ROMs and static RAMS use pre-charge logic and are split into blocks. Only the addressed block is precharged. Additional contributions to Etrdue to the simultaneous conduction of p- and n-channel transistors are kept negligible by ensuring homogeneous transition times. Minimization of the number m of transitions per operation is obtained bya varietyofcomplementarymeans. RISC architectures are used, with single-word instructions that are executed in 1or 2 cycles. Idling blocks can be put in sleep mode by a halt instruction, and restarted by external events. Different clock frequencies are used in different parts of the circuit. Counters and frequency scalers are asynchronous, limiting the t.otal number of node-transitions in a long chain to twice those of the first stage. Special race-free sequential blocks have been developed [4]. They are insensitive to the spread of gate delay time that may occur a t low V,, and they eliminate on-site clock regeneration or for large clock buffers. Two illustrative examples are given in Figure 4.

Many of the techniques developed for micropower applications can be exploited to reach the goal of portability in now communication and computer products. However, new challenges are posed by the requirements on high speed for drgital and by those on high frequency and high precision for analog. New approaches are beingexploredtominimize thenumher m of transitions in digital circuits. Optimum coding in signal processing circuits can exploit signal statistics to reduce average activity [5]. Resource sharing by time multiplexing should absolutely be avoided, since it increases activity by eliminating correlations. Glitches in combinatorial blocks can be eliminated by proper logic structuring. Clock transitions carry no information, therefore their number should be reduced by using the lowest possible clock frequency for each block. Clocks should be locally suppressed whenever they are not needed and their frequency could even be locally adapted to the rate of information to be processed. The power consumptionoflarge systems isoften dominated by clock power, since the clock must be distributed throughout the whole chip with short delays and small skew. In this case, a promising approach is to eliminate the master clock and t o use self-timed modules that communicate with each other by means of a handshake procedure IS]. Substantial power savingscan be expected from theseschemes, but they will not be sufficient to achieve the drastic reduction required for portability. The same is true for the various ways to reduce the energy per transition E,r=CV,: by only reducing capacitance C. Capacitances per unit area are being reduced with the feature size in the regular but slow trends towards scaled-down processes. The resulting reduction of C can be assisted by a variety of design options. For example. large clock buffers can be avoided by using clock-skew-tolerant circuits. Interconnections can be shortened by optimum cell placement or even by abutment. Long wires should be used only for iow-activity signals. Alot morecanbe expectedfrom reductionofthesupply voltage V,. Traditionally, the benefit of scaled-down processes have been totally invested in reaching higher speed and smaller area, with no consideration for power consumption. For this purpose, the supply voltage has been maintained as high as possible without deterioration of device operation. As a matter of fact, complicated processes have been architectured to limit the required reduction of voltage to only 40% (5to 3 V)for a scaling factor of 10 (6 to 0.6pm). This trend will have to be reversed to meet the new demand for low power. As a first step, substantial power reduction can be achieved by reducing the

pider-by-4 (divideby-?. for D = 6) (input CK,output A,B or C)

Figure 4: Illustrative example.

ISSCC94/ FEBRUARY 16,1994/YERBA BUENA BALLROOM I9:15 AM -

--

--

voltage swing on busses or on memory bit lines 171. However. a reduction of E:,r by orders ofmagnitude is possible only with a drastic reduction of power supply voltage V, [SI. The prohlems faced are then the resulting reduction in local speed and the practical impossibility of reducing temperature T (and U,=kT/q) proportionally to V,.

As shown in Figure 5, the delay time Td of a CMOS gate is approximately inversely proportional to voltage. Therefore, to maintain the same overall frequency of operation f, a reduction of supply voltage V, (and of threshold V,,) by a factor R must be compensated for by splitting the task among R parallel systems, each of them operatingR times slower. The energy per transition Etr is then reduced by R2,and so is the total dynamic power P&",since the overall frequency of transitions mf remains the same. The price is an R-fold increase in area

m.f transit./s

area A

I _

of V,. Now V, is given hy the process and c a n n o t he adapted to each block. Furthermore., it varies by more than 0.2V with temperature, may differ by as much from wafer 1.0 wafer, and depends on W and I, for small transistors. Therefore, the only acceptable way to control small thresholds; values is to use a very low V , and to electrically control the effective threshold by substrate modulation [lo].True twin-well processes (also called triple-well) will thus be needed to permit local voltage optimization.Thresho1d shift by source modulation can also be used to reduce Ps,,,lduring periods of inactivity, in particular in SRAMs.

Most of the techniques developed for micropower can be applied to high-frequency analogcircuits. The most noticeable exception is the exploitation of the attractive features of MOS transistors in weak inversion, since the transition frequency fT in this mode does not exceed a few hundred MHz. These features are lost iff,is pushed to several GHz by operating the transistors deep into strong inversion, but they can be obtained from high-frequency bipolar transistors. RiCMOS thus appears to be the best technology for low-power high-frequency analog circuits. The high-frequency parts of receivers can be reduced by using passive SAW filters and direct frequency conversion by undersampling I l l ] Since the power consumed by analog circuits is proportional to their signal-to-noise ratio S/N, the latter should be reduced to the minimum required by the system. In particular, the full dynamic range should be distinguished from the required S/N and should be achieved by "floating point analog" approaches using automatic instantaneous range selection.

3

R-parallelism : area R A

Figure 5: The effects of parallelism. Parallelism is inherent in some tasks such as image processing. Otherwise, various styles and degrees of parallelism can be introduced in each functional block of a system to increase the minimum delay time T, required per gate. Bit-parallelism is the most immediate. Pipeline, systolic and cellular arrays are excellent approaches for signal processing, since they do not require much overhead circuitry for control and communication. They are, however, not applicable when increasing the latencyis notacceptable,inparticularwhenloopsare required by the task. Multiprocessors must be used for more general computation and ways must be found to limit the overhead required to distribute the task. The minimum value of V, necessary to reach the required delay time should be applied independently to each block. An absolute minimum V p 4U, is given by the need to maintain regeneration of the logic states [91. Even with the most advanced processes, the corresponding value of Elr is much larger than the minimum needed to maintain immunity to thermal noise. However, for a given feature size, it corresponds to a possible reduction of power by a factor of 900 with respect to 3V operation. Drastic voltage reduction is possible only if the threshold V, also is reduced. However, the standby current I, of each gate Incrc'ases exponentially when VdnU, is reduced, and U, is usually not scalable. Therefore a very low value of V,, may result in a dominant contribution of static power Pslal.This static power is negligible if the activation rate a (defined in Figure 1)is close to 1, that is if at any time each of the Ng available gates is in transition. Circuits should thus be architectured for a maximum value of a to permit a low value

Analog circuits will benefit from scaling, provided they are adequately redesigned [ 11. Reducing their supply voltage is not an advantage, but can be accepted down to 1V. There is no evidence that current-mode analog provides any basic advantage for low power. Creative innovativn design based on simplifying the active part of circuits is still needed to approach the limits. Reducing power will have several impacts on systems. Power management will be needed to provide the optimum supply voltage to each block and to control sleep modes and powerdown when inactivity is detected. Blocks will be interfaced by means of level converters, and on-chip high-eficiency voltage converters will be required. Power estimation tools will be needed to facilitate the choice of optimum chip partition and architecture. It may be necessary to accept reduced input and output swings and frequencies to limit off-chip power. Analog processing should be avoided when high precision is needed, but massively parallel analog processing is expected to be much more power efficient for low-precision perceptive tasks such as vision and audition. Conclusion Awareness, alone, that power is limited is expected to provide substantial power savings a t very low cost in a first phase. Meeting the goals will however require acceptance of the fact that limiting the power has its price. Ifit cannot be paid for by trading off performance, the price will be longer design time, larger chip area, and a growing investment of the benefits of scaled-down processes into power reduction instead of speed increase. Design methodologies, tools, libraries and models will have to be adapted and a low-power culture based on improvement by simplification should pervade all levels from system to process. (See next page for references.)

Rvfewnces [ 11 Vittoz, E., "Future of Analog in the VLSI Environment", hoc.

ISCAS'90, pp. 1372-1375, 1990. [21 Vittoz, E., "Micropower Techniques", in Design ofVLSICircuitsfor Telecommunication and Signal Processing, Ed. J. Franca and Y. Tsividis, Prentice Hall, 1993. 131 von Kaenel, V., et al., "A Voltage Reduction Technique forBattery-

Operated Systems," IEEE J. Solid-state Circuits, vol. SC-25, pp. 11361140, 1990. 141 P i p e t , C. , "Logic Synthesis of Race-Free Asynchronous CMOS Circuits", IEEE J. Solid-state Circuits, vol. SC-26, pp. 371-380,1991. [51 Chandrakasan, A., e t al., "A Low-Power Chipset forPortable Multimedia Applications", ISSCC Digest of Technical Papers, pp. 8283, Feb., 1994. [SI van Berkel, IC,"Hnadshake Circuits: An Intermediary Between Communication Process and VLSI," PhD Thesis, Eindhoven, 1992. [71 Nakagome, Y, et al., "Sub-1V Swing Internal Bus Architecture for Low-PowerVLSIs."IEEE J. Solid-state Circuits, VOV28, pp. 414-419. 1993. [SI Chandrakasan, A., et al., "Low-power CMOS Digital Design," IEEE J. Solid-state Circuits, vol. 27, pp. 473-484, 1992. 191 Swanson, R. M., J. D. Meindl, "Ion-Implanted Complementary MOS Transistors in Low-Voltage Circuits", IEEE J. Solid-state Circuits, vol. SC-7, pp.146-153, 1972.

[lo] Burr, J.,J. Shott, "A 200mV Self-Testing EncoderlDecoder Using Stanford Ultra Low Power CMOS", ISSCC Digest ofTechnica1 Papers, pp. 84-85, Feb., 1994. [ I I ] Yan, P. Y., et al.. "Highly Linear I-GHz CMOS Downconversion Mixer." Proc. 1:SSCIRCYR. pp. 210-213. 1993.