1

Introduction

Code transformations come into play in a situation when a programmer wants much better results than what optimizations of a compiler can provide. Such a situation is common for programmers of mobile computing and communicating systems. They are required to program complex signal processing algorithms for complex platform architectures and yet meet stringent constraints on the performance and energy consumption of the final implementation. Research has shown that a methodical application of global source-to-source code transformations on an original implementation of the algorithm can greatly help in meeting such constraints (for example, [1]). Every stage in an implementation activity brings forth an associated verification problem. Source code transformations are no exception. The problem here is to ensure that the transformed program preserves the functionality of the original program. A pragmatic approach to this problem is to separate the two concerns viz., applying transformations and verifying that they preserve the functional equivalence. This implies an ex post facto solution that requires a program equivalence checking tool. Our work is focused on this requirement. Since, in general, the program equivalence problem is undecidable, we target the most important transformations that are applied on a decidable class of programs. Code transformations considered. Our work particularly targets source code transformations that reduce the accesses to the data memory hierarchy. Data transfers between different layers of the hierarchy are minimized if the data accessed during program execution have temporal and spatial locality. This is best achieved by restructuring transformations on the for-loops in the complete program, called global loop transformations. But often, data-flow dependences come in the way of ∗ {kodambal, † {maurice,

catthoor}@imec.be gerda}@cs.kuleuven.ac.be

Francky Catthoor∗

Gerda Janssens†

achieving locality of data. Then it is required to explore the possibilities of removing these dependences by certain transformations which reorder the computation, either by introduction or elimination of temporary variables to hold intermediate values (sub-expression propagation) or by taking advantage of the algebraic properties of the operators (algebraic transformations). Such restructuring transformations are called global data-flow transformations. In our earlier work [2], we have presented an intraprocedural method for showing the equivalence of two programs whose corresponding program functions are related through loop and sub-expression propagation transformations. The contribution of the present work is in the extension of the method to also allow global algebraic data-flow transformations together with the other two transformations in a single pass of the check. An example problem. Suppose that we are given a pair of program functions as in Figure 1, where one has been obtained by applying loop and data-flow transformations on the other. Both of these functions, when executed, assign the computed values to the elements of the output array variable C[] and terminate. The difference between the two is that, the original function assigns values to C[] as, ∀k : 0 ≤ k < 256, C[3*k] = A[2*k] + g(B[k+1]) + f(A[2*k+1]) + B[k+2];

whereas, the transformed function assigns C[] as, C[3*k] = g(B[k+1]) + B[k+2] + f(A[2*k+1]) + A[2*k];

Obviously, if we can ignore a possible overflow in the evaluation of integer expressions, the integer addition is both associative and commutative. Hence, if the same values are input to the two program functions, the same values are assigned to the elements of the output variable C[] in both the functions. That is, the two program functions are input-output equivalent under the applied transformations. We have developed a method to check this equivalence fully automatically.

2

Program representation

The class of programs that we consider are required to have the following properties: ① Single-assignment form: Every memory location is written only once; ② Static control-flow: There are no while-loops and the data dependent if-conditions are simple enough to be handled by if-conversions; ③ Affine indices: All expressions in the index and the loop bounds are affine; and ④ No pointer references: All references to the memory are with explicit indexing. We rely on a code pre-processing phase in order to ensure that the original program has the above properties before applying code transformations.

/* Original function */ void foo(int A[], int B[], int C[]) { int k, tmp1[256], tmp2[266], tmp3[256];

s1: s2:

s3: s4:

s5:

/* Transformed function */ void foo(int A[], int B[], int C[]) { int k, tmp4[256], tmp5[256];

for(k=0; k<256; k++) tmp1[k] = A[2*k] + g(B[k+1]); for(k=10; k<138; k++) tmp2[k] = B[k-8]; for(k=10; k<266; k++){ if(k >= 138) tmp2[k] = B[k-8]; tmp3[k-10] = f(A[2*k-19]) + tmp2[k]; } for(k=255; k>=0; k--) C[3*k] = tmp1[k] + tmp3[k];

s1: s2: s3:

for(k=0; k<256; k++){ tmp4[k] = f(A[2*k+1]) + A[2*k]; tmp5[k] = B[k+2] + tmp4[k]; C[3*k] = g(B[k+1]) + tmp5[k]; }

}

}

Figure 1: Example programs under loop and data-flow transformations. Given that a program has the above mentioned properties, we can extract the complete data-flow in the program. This extracted data-flow can be represented in the form of an array data-flow graph (ADFG). It is a directed graph, where, nodes represent the variables and occurrences of operators/functions in the program functions, and edges represent the flow dependence (in the direction opposite to the flow of data). For example, Figure 2 gives the ADFGs of the two program functions in Figure 1. The edges outgoing from operator nodes are labeled by the position of the operand for the computation and the thick edges outgoing from the variables are labeled by the numbers of statements, where they are defined.

1

B

f

B

1

2

+ 1

1

s2

1

g

f

2

1

+

s3

s1

tmp2

tmp4 1

2

+ s4

s1 tmp1

tmp3 1

2

+ s5

C

2

+ 1

s2

g

tmp5 1

An important note to make here is that, in an ADFG, for a given path from an output variable (a root node) to an input variable (a leaf node), a composition of the sequence of dependency mappings between variables provides the dependence relation between the output and the input variable (output-input dependency mapping) for that path of computation. Also note that, an ADFG can contain cycles in it. But, for the purposes of this brief article, we restrict ourselves to cycle-free ADFGs.

3

A A

to A and from tmp3 to tmp2 are given below. Mtmp3,A := {[x] → [y] | x = k − 10 ∧ y = 2 ∗ k − 19 ∧ k ∈ D} Mtmp3,tmp2 := {[x] → [y] | x = k − 10 ∧ y = k ∧ k ∈ D} where, D := {[k] | 10 ≤ k < 266 ∧ k ∈ Z}.

2

+ s3

C

Figure 2: The ADFGs of the functions in Figure 1. The distinction when compared to a standard dataflow graph is that, in an ADFG, the data-flow, denoted by a reverse directed edge, refers not just to a single value, but to a set of values. Since the program is required to be in single-assignment form, the values are guaranteed to be assigned to different elements of the array variable defined in the statement. The dependence relation between the sets of values of defined variables and the operand variables is given by the so-called dependency mappings, one for each operand variable of the statement. For example, for statement s4 in the original function, the two dependency mappings viz., from tmp3

Algebraic data-flow transformations

Algebraic data-flow transformations take advantage of the properties of the operators or user-defined functions and modify the data-flow. The algebraic transformations are not restricted to the expression in a statement, they can have a global scope. This can be seen in our simple example in Figure 1, where the algebraic transformations applied are across expressions of multiple statements. The ADFGs of the two functions, as shown in Figure 2, reflect this. Typically, most of such transformations just rely on the associativity and/or commutativity properties of the operators like addition and multiplication on a fixedpoint data-type like integer. The effect of such algebraic transformations on the ADFG is shown in Figure 3, where, operators, ⊕ is associative, ⊗ is commutative and ~ is both commutative and associative. Associativity. As shown in Figure 3(a), the operands are regrouped in a different order with respect to the chain of ⊕-nodes (the associative chain) by the associative transformation. Commutativity. As shown in Figure 3(b), the effect of a commutative transformation is to permute the positions of the operands to the ⊗-node. Combination of associativity and commutativity. As shown in Figure 3(c), the effect of the transformation on the ADFG containing a sub-tree rooted at ~-node, with only ~-nodes (and any variables) as internal nodes, is its substitution with any other sub-tree of the same nature.

p1

p2

p2

p3

p3 p1

p

A p1

p1 p2 p3

p2 p2

p

p

1

p

(a) Associativity

B

p4

p1

p

1

g

f

2

3 4

5

w

x y

+

+

z

C

C

p1 p2 p3 p4

p2

Figure 4: The ADFGs of our example after flattening.

p p p (c) Combination of associativity and commutativity

Figure 3: Algebraic data-flow transformations.

4

1

p4

1

g f 1

p2 p3

A

(b) Commutativity p3

p1

B

p1

Equivalence checking method

The equivalence of the original and the transformed program functions is shown by checking that for every corresponding data-flow path, in the ADFGs extracted from them, have identical ① Computation and ② Outputinput dependency mappings. This is achieved by a synchronized traversal of the two ADFGs, starting from each of the corresponding root nodes (output variables with same names). The traversal is done in lock step between the two ADFGs, by matching corresponding operators and updating the composition of dependency mappings on the paths on both the sides. At any given point during the traversal, the paths already traversed on the two ADFGs are both guaranteed to have the same operator-nodes appearing in the same order on them (modulo flattening). In our earlier method [2], all operators were left uninterpreted and hence progress could not be made beyond a non-matching operator. This prevented equivalence proof of program pairs under algebraic transformations. For example, in the leftmost path of our example ADFGs, the second operator is an addition operator in the original, whereas it is the operator g() on the transformed ADFG. In order to deal with this mismatch we have modified the traversal to incorporate normalization operations that establish a specific normal form. The normalization procedure relies on two operations invoked depending on the property of the operator: ① Flattening for an associative operator: Lookahead traversal of a sub-tree rooted at an associative node in order to collect the end nodes on each of the associative chains. If any intermediate variables are present on the associative chains they are eliminated. ② Matching for a commutative operator: Lookahead traversal of a sub-tree rooted at a commutative node in order to uniquely match the corresponding outgoing paths based on the node labels and/or dependency mappings. When the operator is both associative and commutative, a flattening operation is followed by a matching operation. Going back to our example, we see that the first operator starting from the output variable is an addition

operator. Therefore the flattening operation is applied at the node. This results in the ADFGs shown in Figure 4. A subsequent matching operation provides the correspondences between the outgoing paths of the operator as {(1, z), (2, w), (3, y), (4, x), (5, x)}. Note that the edges 4 and 5, to the node B on the left ADFG have been both matched to the only corresponding edge (x) to node B on the right. This means the union of the output-input dependency mappings for the two paths on the left are accounted for by a single path on the right. The matching is straightforward here as the successor nodes are all distinct, otherwise a lookahead traversal is required to establish a unique match. The synchronized traversal is continued with the matched nodes as the new points of correspondence. On each path, the traversal culminates at the leaf nodes, at which point, it is checked whether the same input variable has been reached on the two ADFGs, if so, whether the output-input dependency mapping computed on the path to that input variable during the traversal is identical. If this check fails, the traversal stops, reporting a failure. If it succeeds, the traversal continues until all the paths of the ADFGs are exhausted.

5

Remarks

We have discussed only the most common algebraic transformations, as other algebraic properties related to identity, inverse, distributivity and evaluation of constants are less common in practice. We believe that our method can be extended to include these in a straightforward way or by adding heuristics to obtain appropriate normal forms. An important aspect of our work is that of providing a debugging support when the transformed program is not equivalent to the original. Our method is able to provide such support by providing information about the cause for failure of equivalence proof.

References [1] Catthoor, F., et. al. Custom Memory Management Methodology – Exploration of Memory Organization for Embedded Multimedia System Design, Kluwer Academic Publishers, 1998. [2] Shashidhar, K. C., M. Bruynooghe, F. Catthoor, G.Janssens. Automatic Functional Verification of Memory Oriented Global Source Code Transformations, IEEE HLDVT, 2003.