Simplifying Clock Gating Logic by Matching Factored Forms - kaist

Viewer
Transcript

1338

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

Simplifying Clock Gating Logic by Matching Factored Forms Inhak Han and Youngsoo Shin, Senior Member, IEEE

Abstract— Gate-level clock gating starts with a netlist, with partial or no gating applied; some flip-flops are then selected for further gating to reduce the circuit’s power consumption, and a gating logic of the smallest possible size must then be synthesized. We show how to do this by factored form matching, in which gating functions in factored forms are matched, as far as possible, with factored forms of the Boolean functions of existing combinational nodes in the circuit; additional gates are then introduced, but only for the portion of gating functions that are not matched. Strong matching identifies matches that are explicitly present in the factored forms, and weak matching seeks matches that are implicit in the logic and thus are more difficult to discover. Factored form matching reduces gating logic by an average of 24%, over a few test circuits, for which Boolean division only achieves an average reduction of 8%. Index Terms— Clock gating, factored form, factoring tree, gating logic.

I. I NTRODUCTION

T

HE clock distribution network and registers typically use 40% of the power drawn by a processor [2]. Techniques proposed for reducing clock power include blocking unnecessary clocking of registers by clock gating, using dual edge-triggered flip-flops to halve the clock frequency [3], and sharing clock inverters by merging several flip-flops into a multibit flip-flop [4]. Clock gating is the most popular among these techniques, and now has become a standard design practice. It involves the conversion of load-enable registers, as shown in Fig. 1(a), to registers using clock gating, as shown in Fig. 1(b). Clock gating is usually applied within RTL design. During this process, a clock gating cell (CGC) is inserted, which contains an internal latch that filters out potential glitches from a block called as the gating function. This block, specified by the circuit designers, determines whether it is necessary (g = 0) or unnecessary (g = 1) to load data into each register. RTL clock gating has two significant limitations: 1) the designer has to provide a gating function and 2) registers whose gating functions are not specified are left ungated. One way to resolve these problems is to take each ungated register, connect its input and output to an exclusive- NOR gate, and use the output of that gate as a gating function of register [5]. Manuscript received October 24, 2012; revised April 10, 2013; accepted June 11, 2013. Date of publication July 16, 2013; date of current version May 20, 2014. This work was supported in part by the Mid-Career Researcher Program through NRF Grant funded by the MEST under Grant 2011-0029087, and by Samsung Electronics. The authors are with the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TVLSI.2013.2271054

g

Gating function Gating function

CLK CGC

CLK (a)

Fig. 1.

(b)

Load-enable registers (a) before and (b) after clock gating.

However, this technique can only be applied when the register input arrives sufficiently early so that a delay through the exclusive-NOR gate and the CGC can be tolerated. A. Motivation Another potential approach is to synthesize gating functions automatically from a gate-level netlist, which is often called as gate-level clock gating. The key to this approach is to minimize the number of additional gates introduced to implement the gating functions, because these extra gates can represent a substantial overhead. Fig. 2 shows the proportion of circuit area occupied by gating functions for some test circuits, which demonstrates the area overhead of as large as 18%. A few techniques are suggested to simplify gating functions, which we will review in the following section. In this paper, we propose a new technique to simplify gating functions. The idea is to use the existing logic as far as possible while the gating functions are synthesized. This is achieved by matching factored forms of the gating functions with those of existing logic nodes, thus we call this technique as factored form matching. We will present two matching methods: 1) strong matching (SM), which looks for matches that are explicitly apparent in an expression of the circuit logic and 2) weak matching (WM), which seeks matches that are logically present but cannot be found by inspection. Experiments on test circuits show reductions in the area of gating functions between 11% and 39%, with an average of 24%. We contrast these results with those from Boolean division [6], [7], in which, on average, 8% reduction is observed. The rest of this paper is organized as follows. In Section II, we describe the basic steps in gate-level clock gating, to lay a foundation for factored form matching; related works on simplifying gating functions and Boolean matching [8], to which our proposed factored form matching is very similar, are also reviewed. Factored form matching is introduced in Section III, and the matching algorithm is presented and analyzed. In Section IV, we present our experimental results, and, finally, a conclusion is summarized in Section V.

1063-8210 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

Gating func. 11%

16%

Comb. logic

Q

Area [µm2]

16% 600

7%

…

14%

…

FF

16%

di

…

Comb. 900

1339

18%

qi i

15% 11%

fi

300

Comb. logic

usbf

sasc

mem_ctrl

i2c

b12

b07

b04

Gating function f i of a flip-flop i. D Q Q

...

s9234

0

s1423

Fig. 3.

D

Fig. 2. Percentage of total circuit area used by gating functions for several test circuits.

R F Q

II. P RELIMINARIES Given a gate-level netlist, the first step in gate-level clock gating is to identify the gating function for each flip-flop i . Fig. 3 shows an intuitive view of this process. If the input di in the current clock cycle has the same value as the output qi in the previous clock, then there is no need to load di at the next clock edge and the clock can be gated by setting fi to one, where (1) f i = di qi . The symbol is an exclusive-NOR operation, di is the function of the circuit inputs I and flip-flop outputs Q, and qi ∈ Q is the variable. Therefore, fi is the Boolean function of I and Q. Implementing every fi individually as extra logic is not practical, but the terms f i can be merged into a single aggregate gating function F as follows: F = f1 f2 · · · fn .

(2)

This can be expected to reduce the requirement for extra logic. The probability that F is evaluated to one, denoted by P(F), is called as the gating probability and satisfies P(F) ≤ min P( fi ) i

(3)

because the onset of F is the intersection of the onsets of all the terms f i . Thus, the clock is now gated (F = 1) when all n flip-flops are gated ( f i = 1). The problem is then to merge similar gating functions expressing further intersections of onsets in a way that keeps P(F) high as much as possible. There are some heuristic approaches [9]–[11] to this problem, but that is beyond the scope of this paper.

Fig. 4.

Using Boolean division to simplify gating functions.

A well-chosen F can be implemented with less logic, but the gating probability will be reduced. Assuming F can be expressed in a sum-of-products form, we can obtain the probability that each product term will be evaluated to one. Terms associated with low probabilities are unlikely to be useful, and can therefore form a don’t-care set. The sum of the remaining product terms constitutes F . This function and the don’t-care set can then be submitted to twolevel minimization, and subsequently to multilevel synthesis. We compared this method [12], with the low probability empirically set to 0.001, to our method by the experiment. A circuit that implements F may be constructed by ORing some of the internal nodes of a combinational logic circuit, each node of which corresponds to the onset of a Boolean function, which is a subset of the onset of F [13]. It is essential to choose good internal nodes, or the gating probability of F can become much lower than that of F. The timing constraint on the gating function is tight because of the presence of the CGC [see Fig. 1(b)]. Thus, it may be appropriate to control the construction of F, so that the depth of its logic is less than some user-specified value [14]. This can be achieved by representing F as a binary decision diagram [15], and then adjusting its depth. 2) Division: The method of using a Boolean division [6], [7] to simplify gating functions is shown in Fig. 4. Let, D be a Boolean expression corresponding to some internal node of a combinational logic circuit. If we perform Boolean division on F using D as a divisor, we can express F as follows: F =DQ+R

A. Related Work on Clock Gating Logic Simplification Even when some gating functions are merged, the extra logic required to implement every F may be still too costly, as Fig. 2 suggests. Two different ways of simplifying the logic are investigated: 1) approximation and 2) division. 1) Approximation: The onset of F can be considered as a don’t-care set, because the functionality of a circuit is not affected whether clock is actually gated or not gated when gating is possible (F = 1). Therefore, F can be approximated by any function F whose onset is a subset of the onset of F.

(4)

where Q and R, respectively, are the quotient and remainder. Then, it is only necessary to implement extra logic for Q and R in addition to the AND and OR gates needed to form (4). This can require considerably less logic than a direct implementation of F. The success of this approach is highly dependent on the discovery of a D (which usually requires many literals) that reduces the complexity of Q and R effectively. Runtime is also an issue because Boolean division must be performed many times.

1340

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

B. Related Work on Boolean Matching The proposed factored form matching, described in Section III, consists of SM and WM; the former is similar to conventional Boolean matching, which we review, but the latter is not studied before thus, constitutes a key contribution. Boolean matching refers to the problem of determining whether two Boolean functions are equivalent while some inputs may be permuted and some inputs may be complemented. Two techniques are proposed: 1) using a signature and 2) resorting to a canonical form. Signature of Boolean function is a quantity to represent some properties of function. The number of minterms of cofactors is used to determine matches under input negation and permutation [16], but the method cannot be applied when the number of minterms for two or more cofactors are the same. The numbers of unate (binate) [17] variables and interchangable variables are also used [18], but the method suffers from large runtime. If some canonical form of Boolean function is defined, matching can be recast as transforming two Boolean functions into canonical forms, which are then compared. Computing canonical forms under input negations can be achieved by performing a recursive Shannon expansion [19]. III. FACTORED F ORM M ATCHING If we have a set of gating functions {F1 , F2 , . . .}, such that each Fi enables (Fi = 0) and disables (Fi = 1) the clock to a set of flip-flops. Each gating function can be represented in a factored form, or equivalently as a factoring tree, which we will assume to be binary. Our task is to find some existing logic of a combinational circuit, which, together with a few extra gates, implements a gating function. Simultaneously, we have to maximize the proportion of the gating function that is provided by existing logic, so as to minimize the number of extra gates. Because the existing logic can also be represented as a factoring tree, the problem can be recast as that of finding the parts of a factoring tree of a gating function that can be replaced by factoring trees taken from the existing logic; we call this process as factored form matching, which consists of SM and WM. An overview of factored form matching is shown in Fig. 5, in which it is performed for each individual gating function Fi or F separately, so as to simplify the notation. A factoring tree of F, denoted by TF , is first obtained (L1) by factoring [20], which is performed by recursive algebraic division using a kernel of F as a divisor.1 Matching is then performed in two steps: 1) SM (L3–L9) and 2) WM (L10–L15).

Fig. 5. Core of the algorithm for factored form matching: T is the factoring tree, Mn is the set of subtrees (in SM) or sets of vertices (in WM) of TF that are to be matched with Tn , and L is the list of the resulting pairs (n, Mn ).

Consider the factoring tree of a gating function F shown in Fig. 6(a), and the factoring trees of three of the existing internal nodes shown in (b)–(d). The subtree of F marked N1 and the tree n 1 are structured in different ways, but they

actually represent the same Boolean expression abc+acd+bd; we say that N1 and n 1 are equivalent [21]. N2 and n 2 are also equivalent; in fact, these trees have the same structure, except for the ordering of children; we say that such trees are syntactically equivalent [21]. Finally, N3 and n 3 are syntactically equivalent, and children are ordered in the same way; rather obviously, we call these identical factored forms. Any of these three types of equivalence are said to provide a strong match in this paper. Finding the most general strong match involves the detection of equivalent factored forms. This, however, is a problem of combinational equivalence checking, which is proven [22] to be co-NP-complete.2 Thus, we focus on finding syntactically equivalent and identical factored forms in this paper. 1) Detection of Syntactically Equivalent and Identical Factored Forms: Pseudocode for the algorithm that we use to detect syntactically equivalent factored forms is shown in Fig. 7. T is a large factoring tree (similar to F in Fig. 6), which may contain one or more subtrees that are syntactically equivalent to a smaller tree t (such as n 1 in Fig. 6). For each vertex v in T , we check whether the size of subtree rooted at v, denoted by Tv , is the same as that of t (L4); note that, we only consider v that is the same as the root of t (L2) by the definition of syntactical equivalence. Then, we label [23] both Tv and t, level by level, starting at the deepest level.

1 If the expression D is one of the kernels of F, then it can be used to derive F = Q D + R by algebraic division. Then, Factor (F) takes the value of Factor (Q)Factor (D) + Factor (R).

2 A problem X is called co-NP-complete if its complement X ¯ is NP-complete. It is believed, but is not formally proven, that polynomial-time algorithms do not exist for problems in this class.

A. Strong Matching

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

1341

Tv

+ x

N2 +

h

x a

a

x

N1 +

+

c

b

x

x d

x x

a

d

a

+

c b

+

a

1: [0, 1]

b

1

x 1: [0, 1] 1

b

+

0: [0, 1]

c a

0

0

Fig. 8. Checking whether two trees are syntactically equivalent by labeling.

x e

t

0: [0, 1]

+

c 0

N3 +

e g

x b

0

+

f

x b

x

x

+ bc

d

d

(a) n1 = abc + d(ac+b)

n2 = (b+a)c + ba

n3 = de + (a+b)(c+d)

+ + x

+

x x

a

d

x c

x

c

b a a

b a

d

e

+

Fig. 9.

Algorithm to detect identical factored forms.

+

b

x

(b)

x

+ +

b

x

a b c

d

c (c)

(d)

Fig. 6. Factoring trees of (a) a gating function F: of internal nodes (b) n 1 , which is equivalent to subtree N1 , (c) n 2 , which is syntactically equivalent to subtree N2 , and (d) n 3 , which is identical to subtree N3 .

We might apply quick factoring to the n i terms to reduce runtime, and good factoring [20] to F to obtain a smaller tree.3 Because different factoring methods may use different divisors, they can yield syntactically equivalent as well as identical factored forms. On the other hand, if quick factoring is applied to both F and the n i terms, we only need to look for identical factored forms, which takes less time. It can readily be shown that the complexity of Syntactically_Equivalent is O(V log V ) and that of Identical is O(V ), where V is the number of vertices. The choice of factoring methods will be explored experimentally in Section IV-B. B. Weak Matching

Fig. 7.

Algorithm to detect syntactically equivalent factored forms.

Fig. 8 shows this process on example trees of depth 2. At depth 2, vertices a and b of Tv are, respectively, labeled 0 and 1; now we look at t and find out that a and b are already labeled, so these vertices inherit the same labels. Rising to depth 1, vertex c is labeled zero and + is labeled one in Tv ; because + is not a leaf, it also acquires the labels of its children. Labeling again at t, the two vertices at depth 1 are + and c, which match Tv , and so these vertices are also labeled one and zero. The roots of the two trees are then assigned the same label of zero, and we can finally declare that Tv and t are syntactically equivalent. Detecting identical factored forms is much simpler. Fig. 9 shows how a tree is converted to a string by a preorder (or other uniquely specified) traversal of t (L1) and Tv (L5). The two strings are then easily compared (L6). 2) SM Algorithm: The procedure Strong_Match called by the algorithm in Fig. 5 itself calls either Syntactically_Equivalent or Identical, depending on how the factoring trees of F and the existing internal nodes n i are obtained.

In searching for a strong match between a subtree of a gating function and the factoring tree corresponding to an existing logic node, we only look for syntactically equivalent and identical factored forms, and so the two trees must have the same structure. This is a rigid requirement, especially for large trees. Consider Fig. 10, in which no subtree of F strongly matches n. The vertices of F within the dotted circles collectively, however, constitute the same expression as n, because F = aeg(d + f (b + c)) + abh + de f g = n(aeg) + abh + de f g. All the nodes n that are not skipped are checked for weak match (L10 of Fig. 5). Example 1: Consider Fig. 10 again, and let the tree n be traversed in postorder. When v 1 is reached, the first subtree of n is identified and we can write the corresponding subexpression as E 1 = b + c. Now, we look at F and search for b + c, checking v 1 , v 2 , and v 3 , which are candidate vertices because they contain the literals b or c. If vertices v 1 and v 3 are examined first. Their lowest common ancestor (LCA) is v 8 , which is a + operator; therefore, they must have a 3 In quick factoring, the level-zero kernel that yields the minimum number of literals is picked for division. The best of all levels of kernels is used in good factoring. Good factoring therefore takes significantly more time than quick factoring.

1342

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

F = (ged + bh)a + fge(d + a(b+c)) E3 + v 8 E2 x x v

n = d + f(b+c) E3 + v3’

6

+ x

x x g

e

d b v7 v1

x

a

h

f v5

g

d

E1 + v1’

f

+

e

E2 x v’ 2

d

x

b

x

c

E1 a

+

v4

b c v2 v3

Fig. 10.

Weak match of a gating function F with an internal node n.

common factor, so that b + c can be identified after that factor is extracted out. We first find two factors F (v 1 ) and F (v 3 ) by following the path from v 1 and v 3 to the LCA and taking the product of the children of the × vertices, which are not on the path. We find that F (v 1 ) = ah and F (v 3 ) = ae f g: because F (v 1 ) = F (v 3 ), the LCA of v 1 and v 3 cannot produce b + c. Now, we move on to examine v 2 and v 3 , and find that F (v 2 ) = F (v 3 ) = 1: thus their LCA, v 4 , corresponds to b + c and is labeled E 1 . In the tree n, the next vertex that is not a leaf is v 2 ; the corresponding subexpression is denoted by E 2 . Now, we must search for E 2 = f E 1 in F. The LCA of v 5 ( f ) and v 4 (E 1 ) is v 6 . Because of this LCA is a × operator, it is easy to see that v 6 can simply be labeled E 2 without comparing the factors of v 5 and v 4 . F (v 5 ) = g and F (v 4 ) = ae, however, will be required later to initialize F (v 6 ). In the tree n, the expression corresponding to v 3 , which is d + E 2 , is identified and labeled E 3 . Because the LCA of v 7 and v 6 , which is v 8 , is a + operator, we need to compare their factors. It is easy to see that F (v 7 ) = aeg. v 6 is, however, not a literal, but corresponds to the expression E 2 = f E 1 ; so F (v 6 ) must be initialized to the product of F (v 5 ) and F (v 4 ), which is also aeg. Because F (v 7 ) = F (v 6 ), we finally label v 8 as E 3 , and declare that weak match is discovered. 1) WM Algorithm: Example 1 gives a good idea of the WM algorithm (denoted by Weak_Match), and its pseudocode is shown in Fig. 11. T is a large tree (like F in Fig. 10), which may contain one or more sets of vertices that are weakly matched to a smaller tree t (similar to n in Fig. 10). Initially, for each vertex v in T and t, we assign its subexpression E v and factor F (v) to v and 1, respectively (L2), subexpression of each leaf becomes itself (such as E v 5 = f ) and factors for all vertices are initialized to one. Now, we traverse the tree t in postorder. For each vertex v visited during the traverse, the expression using subexpressions of its children and operation of itself is assigned to E v (like E v 2 = E 2 = f E 1 ); note that, we only consider v that is not a leaf (L3–L6). Tv is the set of three tuples (vl , vr , and v A ), all vertices of which are in T ; in each tuple, v A is the LCA of vl and vr and is equal to v , and vl and vr have the same subexpression as left and right child of v , respectively (L23–L26).

Fig. 11.

Algorithm to detect weak matches.

For each three-tuple (vl , vr , and v A ) in Tv , we find two factors F (vl ) and F (vr ) by ANDing their initial factors and the expression obtained from Factor (L27–L30), the process of which is addressed in Example 1, and then check if v A can be matched to v ; in other words, check whether v A is a × operation or whether v A is a + operation and simultaneously F (vl ) is equal to F (vr ). If the check passes, E v A is assigned to E v and F (v A ) is initialized either to F (vl ) × F (vr ) when v A is a × operation or to F (vl ) when v A is a + operation, and a set of vl and vr is added to M , which is a set of sets of vertices that are matched to the children of v (L8–L14). If M is empty, because no vertex in T matched to v exists, we return empty set (L15). Otherwise, we pick a pair (V, V ), which, respectively, are extracted from M and M , one by one; M is a set of sets of vertices in T matched to the vertices that are already traversed in t. We then, for each vertex v in V , check if v is a leaf and not contained in V or if v is not a leaf and contained in V; if all v satisfy the check condition,

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

1343

F = a+b+c+d

v10

+ v5

x

x v4

+ b v1 (a)

Fig. 12.

b v2

c v3

a

b v6

+

+

b v7

b v1

c v8

(b)

c v2

E1 + ’ v1

a

+

a

a a

E2 + v’ 2

3

v9

x

x

n = a+b+c

E1 + v

d

b

c

Fig. 13. Weak match problem, in which solution is not discovered by Weak_Match.

Conflicting expressions found during search (a) bc and (b) b + c. F +

we unify V, V , and LCA of V ,4 and add it to Mnew by which M will be replaced (L16–L21). Now, we will address a few issues that are not apparent in Fig. 11. If we wish to find out whether the tree in Fig. 12(a) contains the expression bc. The LCA of v 1 and v 3 is v 5 , which is a × operator, and so these three vertices correspond to bc. Vertices v 2 and v 3 , with their LCA v 4 , also form the same expression. Because v 3 belongs to both sets of vertices, only one set must represent the expression bc, which is a conflict. This, however, does not occur in an algebraic expression, in which no cube contains another; in fact, the expression bc(a + b) corresponding to Fig. 12(a) is not an algebraic one. We can determine whether the tree in Fig. 12(b) contains b + c. If we examine v 6 and v 8 , their factors are the same (a); thus the two vertices, together with their LCA v 10 , make b + c. The same expression is formed by v 7 , v 8 , and v 9 , which creates a conflict. However, this does not occur in an algebraic expression (note that, the tree contains ab twice). 2) Analysis of the WM Problem: Now, we formally define the WM problem. Problem 1: Given two binary trees F and n, the WM problem is to determine whether F contains all the vertices of n, given that the LCA of any pair of vertices in n and the LCA of the corresponding vertices in F are the same. If the LCA of two vertices i, j ∈ F is an OR operator, then F (i ) = F ( j ) must be satisfied. This definition leads to the following: Corollary 1: The Weak_Match algorithm is optimal in the sense that it finds a solution to Problem 1 if one exists. Optimality is guaranteed by the method used to find matches. Consider Fig. 10 again. The LCA of b and c and the LCA of f and E 1 are the same in both F and n while matches are discovered. The LCA of f and b is equal to the LCA of f and the LCA of b and c, which is E 1 , thus the LCA is implicitly made equal. The definition of Problem 1, however, excludes the possibility of some types of weak match. An example is shown in Fig. 13. The subexpression E 1 = b + c at v 1 is matched by vertices v 1 , v 2 , and v 3 . When we continue to search for E 2 = a + E 1 , we do not check the descendants of v 3 , so Weak_Match fails, even though a + b + c does exist in F. Note that, the LCA of a and b and the LCA of a and c are different in F, whereas v 2 is the LCA of both pairs in n, which contradicts to the definition of Problem 1. 4 V is a set of two vertices, v and v . LCA of V thus is the same as LCA r l of vl and vr .

x

N4 +

N1 x

x b

+

f x

+

a

c

x g

b

+ c

a

x d

N3 +

e

N2 x

x

c b

+

a

d

b

n1

n2

b

c

b

d

b

Fig. 14.

b

+

a

x

x

x

x

+

a

n3 +

+

x

c

c

c

+

a b

c

Checking for subtree containment in SM.

C. Factored Form Matching L in Fig. 5 is the list of pairs (n, Mn ); in each pair n is an internal node5 of a combinational circuit and Mn is a set of subtrees (or sets of vertices) of TF that matches with Tn , which is a factoring tree of n (L5). Two lists Lsm and Lwm are initialized and then updated to track strong and WM, respectively (L2). We restrict our Boolean manipulation to algebraic [20] to limit computation time; therefore, if a node n contains literals that are not in F, then n is dropped from further matching attempts (L4). 1) Containment in SM: Strong_Match returns a set of subtrees (L6). We can use the function Containment to check whether some of these subtrees are redundant because they are contained in subtrees that are already determined to be strong matches. Example 2: Consider Fig. 14, in which the tree n 1 is to be matched with subtrees N1 and N2 . We set Mn1 = {N1 , N2 } and Lsm = {(n 1 , Mn1 )} (L9). We continue to determine Mn2 = {N3 }. Then, we perform the containment check: because N2 ⊂ N3 , N2 can be discarded and we update Mn1 = {N1 } (L18) and Lsm = {(n 1 , Mn1 ), (n 2 , Mn2 )}. Similarly, Mn3 = {N4 }, and N1 ⊂ N4 . Discarding N1 makes Mn1 empty. Thus, (n 1 , Mn1 ) can also be discarded from Lsm (L20), and finally Lsm = {(n 2 , Mn2 ), (n 3 , Mn3 )} = {(n 2 , {N3 }), (n 3 , {N4 })}. 5 We use n interchangeably as the name of a node and of the Boolean expression associated with that node.

1344

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

F

F + v3

Nj

Ni

v5 +

d b v1

Proof of Proposition 1.

g

e

x

+

x

f

x

x

Fig. 15.

a

v4 x

Nk

n x v 2

x

g

e

h

d +

d a

N1 x a

+

f

c d

e

b

N5 x a

c x

x v4

g

n4

n3

x

x

+ c (b)

x

n5 +

a

v3

+

+ b c

n

v5 x

x v2 d e

+

x E1

+ f

g

h

a

x E2

+

+ b e

b

x a

a

x b

+

c

g

bd

f

x

+

a

a

bd

c

+ d g

h

(b)

g

Fig. 18. (a) Weak match when the root of n is an AND operator. (b) Reduced F after the match.

+

x

e

f

+

x (a)

a

d

F

v1 n2

x g e

Fig. 17. (a) Weak match when the root of n is an OR operator. (b) Reduced F after the match.

N2 x

b

n1

c

x f

h

N4 +

+

a

x b

(a)

F +

b

x

x

+

f x

b

N3 x

+ E2 x

e

Fig. 16. Finding nested matches: n 1 and n 2 are strong matches and the remainders are weak matches.

We might think that the subtrees that are determined as strong matches might have a partial mutual intersection without one being wholly contained in the other; but this never happens, as we show by proving the following proposition. Proposition 1: Let, n i and n j be strong matches with subtrees Ni and N j of F, respectively. If Ni ∩ N j = ∅, then either Ni ⊆ N j or N j ⊆ Ni . Proof: By definition, a subtree of F is a tree consisting of a vertex in F and all of its descendants. Let, a subtree Nk be an intersection of Ni and N j , as shown in Fig. 15. If Ni is not a proper subtree of N j , and N j is not a proper subtree of Ni . Then, as shown in Fig. 15, the root vertex of Nk has to have two ancestors, the root vertex of Ni and that of N j , but this violates the definition of a tree. Thus, either Ni ⊆ N j or N j ⊆ Ni must hold. 2) Containment in WM: Now, we turn our attention to the possibility of nested matches: each weak match is checked to see whether it is contained in or contains other strong or weak matches (L12–L15). Example 3: Consider Fig. 16: n 1 and n 2 are strong matches with subtrees N1 and N2 , respectively. The set of vertices of F that weakly matches with n 3 includes N1 ; thus n 1 is dropped from the list of strong matches. The vertices that weakly match with n 4 make n 3 redundant. The three vertices of F that weakly match with n 5 constitute a subset of N2 ; thus n 5 is also redundant. Matches n 2 and n 4 remain after the containment check.

D. Implementation of Gating Functions Implementing the gating functions that correspond to strong matches is straightforward. In Fig. 6, for example, F can be expressed by n 2 h + f (n 1 e + gn 3 ), in which the terms n i are considered to be variables. Using weak matches needs some additional steps. Consider Fig. 10, which is redrawn as Fig. 17(a). Recall that, when we try to match d + E 2 in F, we obtain the common factors of v 1 (corresponding to d) and v 2 (corresponding to E 2 ), shown boldly outlined in the figure, i.e., F (v 1 ) = F (v 2 ) = aeg. This is because the LCA of v 1 and v 2 , which is v 3 , is an OR operator, and thus F can be expressed by aeg(d + E 2 ) + R = aegn + R after extracting out the common factor aeg. Now, we find an expression for the remainder R by reducing F; v 4 is not necessary because its children are all common factors (bold outlines) or matches (dotted outlines). The vertex v 5 is subsequently removed and its lone right child becomes a left child of its parent; the resulting left subtree of v 3 is shown in Fig. 17(b). The right subtree of v 3 can be reduced in a similar fashion. The reduced F in Fig. 17(b) is R = abh + de f g. Now, let us consider Fig. 18(a). This time E 1 E 2 is matched by v 1 , v 2 , and v 3 . Because v 3 is an AND operator, a common factor is not necessary, and thus it is simpler to reduce F to determine the remainder R by writing F = (a +b)(e + f )R = n R. The left subtree of v 4 contains all matched vertices; v 4 is thus eliminated and its right subtree becomes a child of its parent v 3 . The right subtree of v 3 can be reduced in a similar way; the reduced F shown in Fig. 18(b) corresponds to R = (c + d)(g + h).

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

1345

TABLE I T EST C IRCUITS (C OLUMNS 2–3), A FTER F LIP -F LOP G ROUPING (C OLUMNS 4–6), N UMBER OF E XTRA G ATES R EQUIRED TO I MPLEMENT

G ATING F UNCTIONS (C OLUMN 7)

40

40

30 % Gates saved

% Gates saved

30 Strong match

20

10

Strong match + weak match

Approx

Strong match (good factoring)

20

10

Strong match

Division

wb_conmax

ps2

systemcaes

fir

i2c

aes

des

b17

b13

b14

b11

b12

b05

s38417

s38584

s35932

s13207

s838

wb_conmax

ps2

systemcaes

fir

i2c

aes

des

b17

b13

b14

b12

b11

b05

s38417

s38584

s13207

s35932

s838

s9234

(a)

s9234

0

0

(b)

Fig. 19. Percentage reduction in the number of gates required to implement gating functions. (a) Comparison of SM, approximation, and division. (b) Comparison of SM, SM using good factoring, and combined strong and WM.

IV. E XPERIMENTAL R ESULTS A. Test Circuits A set of test circuits is compiled from ISCAS and ITC benchmarks, as well as from open cores [24]. The number of combinational gates and flip-flops in the initial netlist, which is obtained by logic synthesis with a 32-nm ASIC gate library, are shown in Table I. 1) Grouping Flip-Flops: As described in Section II, each individual gating function fi is first identified. The gating probability P( fi ) is also obtained by simulation using random input patterns [25]; to be specific, each instance of input patterns, except for reset and clock, is randomly generated and is applied. It is then necessary to decide whether gating i saves power, for which we need to determine the ungated and gated powers, which can be determined from P( fi ) and the power used by the CGC (assuming, for now, that the CGC

only drives i ). The number of flip-flops selected to be gated is shown in Table I. These gated flip-flops must now be grouped to yield a set of merged gating functions (2). This is done with a simple greedy heuristic, which operates on a graph in which each vertex corresponds to a flip-flop. An edge between i and j is associated with a weight (P( fi ) + P( f j ))/2 − P( fi f j ). The first term in this expression is the average gating probability when i and j are gated independently (if they are not merged), and P( fi f j ) is the gating probability when they are gated together (if they are merged). Therefore, a weight shows the loss of gating probability when i and j are merged. The i and j with the smallest edge weight (i.e., corresponding to the smallest loss of gating probability) are selected; the graph is modified by merging the two vertices into a single vertex and removing all the edges that are connected to either i or j , but not to both. Decision is made whether this grouping

1346

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

20 Strong match

Division

d 10

c a

b

a

b c

d 0

-10

wb_conmax

ps2

systemcaes

fir

i2c

aes

des

b17

b05

s38417

s38584

s35932

s13207

s838

s9234

55

30

50

20

45

10

40

0

35

Avg % of AND vertices

40

s838 s9234 s13207 s35932 s38417 s38584 b05 b11 b12 b13 b14 b17 aes des fir i2c ps2 systemcaes wb_conmax

Our factored form matching technique is implemented in SIS [26]. Two additional methods of simplifying gating functions, approximation and division (see Section II-A), are also implemented for comparison. 1) Strong Matching: We first compare SM with two other simplification methods. Fig. 19(a) shows the percentage reduction in gates with the last column of Table I used as a reference. We see that SM always outperforms the other two methods, by reducing 15.3% of the gates on average, while approximation and division achieve 9.4% and 7.5% reductions, respectively. The difference between SM and the other methods is very marked in some of the test circuits, such as s838 and s35932. The trees discovered by SM are likely to be small, especially as we only search for identical factored forms, for instance the average tree found in processing the circuit des consists of 15 vertices and the largest tree has 25 vertices. If a gating function has a well-balanced factoring tree, there is more chance that many of small subtrees will be discovered by SM. In Fig. 20, for example, only one subtree of the skewed tree in (a) can be a strong match; (b) is more balanced, and strong matches with two subtrees are possible. Fig. 21 shows how this applies to the test circuits. Black circles show the difference between results of SM and division of Fig. 19(a). For each factoring tree, we calculate the difference between the minimum possible height of a tree (log2 N + 1 for N leaf vertices) and its actual height, as a measure of balance; the average of that difference over all factoring trees for each circuit is denoted by white boxes. The two plots appear well correlated, which suggests our conjecture is accurate. This observation raises the possibility of artificially rebalancing factoring trees to favor the search of strong matches; but we leave this topic for future investigation. a) Choice of Factoring Methods: As explained in Section III-A2, only identical factored forms are discovered by SM if quick factoring is applied to both the gating functions Fs and the Boolean functions corresponding to internal combinational nodes n i . Another option is to apply good

factoring tree height)

Fig. 21. Difference between SM and division, and the average difference between the minimum possible height of a factoring tree and its actual height.

(SM+WM) SM

B. Assessment of Factored Form Matching

b14

AVG(min height

-20

saves power assuming that the two flip-flops are attached to a single CGC; grouping is undone if there is no saving. This process is repeated until there are no edges left. The number of groups, which represents the number of gating functions, and the average gating probability of the gated flip-flops are shown in Table I. The last column shows the number of gates required to implement the resulting gating functions before applying any simplification methods, which serves as a reference of comparison.

b13

(a) Skewed and (b) balanced factoring trees.

b11

Fig. 20.

(b)

b12

(a)

Fig. 22. Additional reduction in gates obtained from WM (lower curve), and average percentage of AND vertices in factoring trees of internal nodes.

factoring to Fs to obtain smaller factoring trees, and still apply quick factoring to the internal nodes n i to control runtime; both syntactically equivalent and identical factored forms can then be discovered. The two methods are compared in Fig. 19(b). We see that good factoring of F is always better, with an average gate saving of 19.6%; but this extends runtimes, an issue that is discussed further in Section IV-C. This modification seems most effective when many strong matches correspond to syntactical equivalence, e.g., more than 40% of the matches correspond to syntactical equivalence for circuits s9234, s35932, s38584, and i2c. At the other extreme, less than 10% of the matches are syntactical equivalence in circuits such as s13207 and b12; in these cases, the difference between the results obtained by the two methods is marginal. 2) Weak Matching: Full factored form matching, involving both SM (with quick factoring applied to both Fs and the terms n i ) and WM, is applied to all the test circuits, and the results are shown in Fig. 19(b). The average saving is now 24.3%, which is 9% better than that obtained by SM alone. There is wide variation in the extent of this further saving: circuits s838 (19.3%), b11 (16.9%), and b13 (16.9%) benefit a lot from WM; b17 (2.8%) and aes (3.1%) benefit very little. Recall that when the LCA is an OR operator, common factors have to be found and extracted (see Fig. 17) to obtain a new factored form of the gating function; this is not the case when

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

40

1347

TABLE II RUNTIMES ( S ) A FTER SM, SM W ITH G OOD FACTORING (SM: G OOD

Strong match + weak match

FACTORING ) A PPLIED TO G ATING F UNCTIONS , AND C OMBINED SM +

% Gates saved

30

WM, RUNTIME FOR D IVISION IS A LSO S HOWN FOR R EFERENCE 20

10

SIS

wb_conmax

ps2

systemcaes

fir

i2c

aes

des

b14

b17

b12

b13

b11

b05

s38584

s38417

s35932

s13207

s838

s9234

0

Fig. 23. Comparison of full factored form matching (strong match + weak match) and application of SIS to both gating functions and existing combinational logic (SIS). s38417

s38584

1.0

des Power consumption

1.2 1.0

0.8

0.8 Runtime

Before matching Avg gating prob

(a)

(b)

Fig. 24. Effect on runtime, average gating probability, and power consumption of combined SM and WM, for different numbers of flip-flop groups.

300 200 100 0 160

Number of FF groups

400

120

Numbe of FF groups

1.0

140

0.8

0.6

100

0.4

80

1.0

60

0.8

0.6

40

0.4 0.4

20

0.4

After matching

500

0

0.6

Number of nodes

0.6

Slack [ps]

the LCA is an AND operator (see Fig. 18). In practice, this tends to mean that n i terms that contain more AND operators will produce more weak matches. Fig. 22 shows the correlation between additional saving from WM (left y-axis) and the average percentage of AND operators in the n i s (right y-axis). 3) Comparison With Conventional Logic Optimization: Our approach assumes combinational circuit is given and tries to simplify gating functions using combinational circuit as much as possible through matching. An alternative approach would be to consider combinational circuit and gating functions as a whole, which are then submitted to conventional logic optimization. This approach is tried in SIS and compared with our approach of matching; the result is shown in Fig. 23. Matching outperforms SIS in most circuits. Exceptions are s13207, s35932, b17, aes, and i2c. This is an understandable consequence because a key technique in our approach is WM and all five circuits benefit very little from WM as shown in Fig. 22. C. Computation Times Computation times for SM and combined strong and WM are shown in Table II, together with runtimes for SM when good factoring is applied to the gating functions. In the implementation of Factored_Form_Matching shown in Fig. 5,

Fig. 25. Slack histograms of circuit s9234 before and after factored form matching.

an internal node with a factoring tree (L5) containing fewer than eight vertices is empirically eliminated from further matching to reduce runtime, because smaller factoring trees contribute less to reducing the number of literals in the gating function. The runtime of division, another simplification method, is also included in the last column of Table II for reference. Notice that, SM and WM are applied after Factoring() procedure is applied to both a gating function and an internal node, which is not the case in division; this explains smaller runtime of division. Runtime is especially significant in circuits that contain many combinational gates and many flip-flop groups, e.g., des and wb_conmax; these circuits require many existing logic nodes n i to be involved in the matching effort, and also many gating functions. This suggests that runtimes may be shortened by reducing the number of flip-flop groups, which determines the number of gating functions. This idea is tried for three large circuits and the results are shown in Fig. 24. Runtime clearly drops as the number of gating functions decreases; but the average gating probability is also reduced because

1348

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 6, JUNE 2014

TABLE III C OMPARISON OF RTL C LOCK G ATING AND C OMBINED RTL AND G ATE -L EVEL C LOCK G ATING

flip-flops whose aggregate gating probability P( fi f j ) is not high may now be grouped. Power consumption, however, does not necessarily increase, because fewer CGCs are required. When the number of flip-flop groups drops as far as 0.4, the power consumption does increase, because a very low average gating probability masks the benefit of fewer CGCs. The proper grouping of flip-flops to manage runtime and power consumption deserves more investigation. D. Verification of Factored Form Matching Factored form matching alone neither affects the average gating probability nor the numbers of gated flip-flops and CGCs, but only the number of combinational gates; so its effect on power consumption is likely to be very small. We compared the power consumption of each circuit before and after factored form matching, and observed changes between 1% and 4%. Some existing logic nodes continue to contribute to gating functions after factored form matching. The timing slack of these nodes will change because of changes of load capacitance, which may negatively affect circuit timing. One method of addressing this issue is to determine the K most critical paths, and exclude all the nodes on these paths from matching. We implement a heuristic along these lines: all the nodes that have less than 20 ps of slack are excluded from matching. Fig. 25 is a histogram of the slacks in circuit s9234 before and after factored form matching. The changes are marginal: only 32 (out of 615) nodes are used 58 times to implement the gating functions. E. Combined RTL and Gate-Level Clock Gating We explore the extent to which gate-level clock gating can complement RTL clock gating through tests on nine circuits from open cores. The HDL descriptions of these circuits are given to commercial logic synthesis tool [27], which also performs RTL clock gating. The percentage of gated flip-flops and their average gating probabilities are shown in Table III. We apply the procedure described in Section IV-A1 to the ungated flip-flops, and some of them are converted to gated ones during the process of grouping. Table III shows that this increases the number of gated flip-flops and their average gating probability. Gating functions are then synthesized using our proposed matching technique. The resulting circuit area and power consumption, normalized to the corresponding

figures after RTL clock gating, are also shown in the last two columns of Table III. There is large variation in the additional power saved by combined RTL and gate-level clock gating, as large as 16% in ac97 and as small as 2% in tv80. The circuits that benefit a lot tend to have large proportion of flip-flops, thus great portion of power attributes to sequencing elements, and have many ungated flip-flops remain (after RTL clock gating) as candidates for gate-level clock gating, which in turn leads to sizable increase in gating probability. The additional power saving comes at the cost of increased synthesis time (including both conventional logic synthesis and proposed gate-level clock gating), e.g., 2.6 times in ac97. V. C ONCLUSION Gate-level clock gating relieved the designer of the need to provide gating functions. It was used if no clock gating functions were specified, or there was insufficient gating probability for the functions that were provided. The key to gate-level clock gating was to synthesize the smallest possible gating function. We suggest that this can be accomplished by factored form matching, in which gating functions were matched to Boolean functions of internal combinational nodes already in the circuit, so that only the portions of the gating functions that were not matched with existing circuitry had to be realized with additional logic gates. Topics that seemed to merit future work included the modification of the factoring trees of gating functions so that more matches can be discovered and addressing the issue of computation time to improve scalability. The factored form matching may be generalized so that it can also be applied to general logic. R EFERENCES [1] I. Han and Y. Shin, “Synthesis of clock gating logic through factored form matching,” in Proc. Int. Conf. IC Design Tech., Jun. 2012, pp. 1–4. [2] D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC & Custom, Norwell, MA, USA: Kluwer, 2007. [3] S. Unger, “Double-edge-triggered flip-flops,” IEEE Trans. Comput., vol. 30, no. 6, pp. 447–451, Jun. 1981. [4] R. Pokala, R. Feretich, and R. McGuffin, “Physical synthesis for performance optimization,” in Proc. Int. ASIC Conf. Exhibit., Sep. 1992, pp. 34–37. [5] Power Compiler User Guide, Synopsys, Inc., Mountain View, CA, USA, Dec. 2010.

HAN AND SHIN: SIMPLIFYING CLOCK GATING LOGIC

[6] F. Theeuwen and E. Seelen, “Power reduction through clock gating by symbolic manipulation,” in Proc. Symp. Logic Archit. Design, Dec. 1996, pp. 184–191. [7] S. Kim, I. Han, S. Paik, and Y. Shin, “Pulser gating: A clock gating of pulsed-latch circuits,” in Proc. Asia South Pacific Design Autom. Conf., Jan. 2011, pp. 190–195. [8] R. Bryant, “Graph-based algorithms for Boolean function manipulation,” IEEE Trans. Comput., vol. 35, no. 8, pp. 677–691, Aug. 1986. [9] A. Farrahi, C. Chen, A. Srivastava, G. Téllez, and M. Sarrafzadeh, “Activity-driven clock design,” IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 20, no. 6, pp. 705–714, Jun. 2001. [10] C. Chen, C. Kang, and M. Sarrafzadeh, “Activity-sensitive clock tree construction for low power,” in Proc. Int. Symp. Low Power Electron. Design, Aug. 2002, pp. 279–282. [11] S. Wimer and I. Koren, “The optimal fan-out of clock network for power minimization by adaptive gating,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1772–1780, Oct. 2012. [12] L. Benini, G. De Micheli, E. Macii, M. Poncino, and R. Scarsi, “Symbolic synthesis of clock-gating logic for power optimization of synchronous controllers,” ACM Trans. Design Autom. Electron. Syst., vol. 4, no. 4, pp. 351–375, Oct. 1999. [13] A. Hurst, “Automatic synthesis of clock gating logic with controlled netlist perturbation,” in Proc. Design Autom. Conf., Jun. 2008, pp. 654–657. [14] E. Arbel, C. Eisner, and O. Rokhlenko, “Resurrecting infeasible clockgating functions,” in Proc. Design Autom. Conf., Jul. 2009, pp. 160–165. [15] C. Lee, “Representation of switching circuits by binary-decision programs,” Bell Syst. Tech. J., vol. 38, no. 4, pp. 985–999, Jul. 1959. [16] J. Mohnke and S. Malik, “Permutation and phase independent Boolean comparison,” Integr. VLSI J., vol. 16, no. 2, pp. 109–129, Dec. 1993. [17] F. Mailhot and G. De Micheli, “Technology mapping using Boolean matching and don’t care sets,” in Proc. Eur. Design Autom. Conf., Mar. 1990, pp. 212–216. [18] F. Mailhot and G. De Micheli, “Algorithms for technology mapping based on binary decision diagrams and on Boolean operations,” IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 12, no. 5, pp. 599–620, May 1993. [19] J. Burch and D. Long, “Efficient Boolean function matching,” in Proc. Int. Conf. Comput. Aided Design, Nov. 1992, pp. 408–411. [20] R. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. Wang, “MIS: A multiple-level logic optimization system,” IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 6, no. 6, pp. 1062–1081, Nov. 1987. [21] G. Hatchtel and F. Somenzi, Logic Synthesis and Verification Algorithms. Norwell, MA, USA: Kluwer, 1996. [22] E. Goldberg and Y. Novikov, “On complexity of equivalence checking,” Cadence Berkeley Labs, Univ. California, Berkeley, CA, USA, Tech. Rep. CDNL-TR-2003-08026, Aug. 2003. [23] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms. Reading, MA, USA: Addison-Wesley, 1974. [24] OpenCores [Online]. Available: http://www.opencores.org [25] S. Paik, I. Han, S. Kim, and Y. Shin, “Clock gating synthesis of pulsedlatch circuits,” IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 31, no. 7, pp. 1019–1030, Jul. 2012.

1349

[26] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. L. SangiovanniVincentelli, “SIS: A system for sequential circuit synthesis,” Dept. Electr. Eng. Comput. Sci., Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/ERL M92/41, May 1992. [27] Design Compiler User Guide, Synopsys, Inc., Mountain View, CA, USA, Jun. 2010.

Inhak Han received the B.S. and M.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2010 and 2012, respectively, and is currently pursuing the Ph.D degree with the same department. His current research interests include timing and thermal analysis for VLSI circuits and low power design using clock gating, pulsed-latch, and dual edge-triggered flip-flop.

Youngsoo Shin (M’00–SM’05) received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University, Seoul, Korea. He was a Research Associate with the University of Tokyo, Tokyo, Japan, from 2000 to 2001, and from 2001 to 2004, he was a Research Staff Member with the IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. He joined the Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2004, where he is currently a Professor. His current research interests include computer-aided design with emphasis on lowpower design and design tools, high-level synthesis, sequential synthesis, and programmable logic. Dr. Shin was a recipient of several awards, including the Best Paper Award at the 2005 International Symposium on Quality Electronic Design and the 2002 IP Excellence Award from Japan. He has been a member of the technical program committees and organizing committees of many technical conferences, including DAC, ICCAD, ISLPED, and ASP-DAC. He is an Associate Editor of the IEEE T RANSACTIONS ON C OMPUTER A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS and the ACM T RANSACTIONS ON D ESIGN AUTOMATION OF E LECTRONIC S YSTEMS .

Pulser Gating: A Clock Gating of Pulsed-Latch Circuits - kaist