Generating Decompilers Peter T. Breuer

Jonathan P. Bowen

Area de Ingenier´ıa Telem´atica Departamento de Ingenier´ıa Universidad Carlos III de Madrid Butarque 15, E-28911 Leganes, Spain

The University of Reading Department of Computer Science Whiteknights, PO Box 225 Reading, Berks RG6 6AY, UK

Email: [email protected] URL: http://www.it.uc3m.es/˜ptb/

Email: [email protected] URL: http://www.cs.reading.ac.uk/people/jpb/

October 1998

Abstract Compiler compilers are in widespread use, but decompiler compilers are a more novel concept. This paper presents an approach for the decompilation of object code back to source code using a decompiler generator. An example decompilation is presented. Potential applications include reverse engineering, quality assessment, debugging and safety-critical code validation or verification.

1

Introduction

A decompiler is a software application which turns a piece of object code or assembler back into the source code that it was generated from. It is of utility in the validation or verification of code in safetycritical applications [7]. In this setting, compilers may be considered inherently untrustworthy and it is important that low-level object code should be checked, because this is the code that actually directs the operation of the processor [20]. Using a validated decompiler to automate the production of readable high-level code, which then can be independently validated, offers a route to greatly increased confidence in the correctness of critical application code. In the industrial environment in particular, the hard evidence provided by a decompiler may be perceived as more convincing than the assurances that come with a proved compiler. The nuclear industry is considering such an approach to check the validity of the object code produced by an unvalidated compiler [40, 51], although they find that, in the current state of the art, some human intervention is necessary in practice. NASA have also used decompilation techniques on software for the space shuttle [43]. Safetyrelated standards provide guidance on the use of this approach for safety-critical software (e.g., see the section on reverse translation in [42]) and these are likely to give increasing motivation for the use of such techniques in safety-critical systems in future. In this paper, we set out an implementation for a decompiler compiler, an application that can generate a guaranteed decompiler from a formal specification of the relationship between source code and compiled object code, and provide an example of its use in the decompilation of a piece of assembler generated by an Occam [28] compiler described in [24]. The assembler code for the example is shown in Figure 1, and it is 148 instructions in length, which is not inconsiderable. It was compiled from the source code shown in Figure 2, and the decompiled output is represented by the courier font portions of the figure. The first complete statement of an underlying mathematical theory of decompilation occurs in our recent article on abstract interpretation of grammars [9]. So far as we have been able to determine, this is the first occasion on which a supporting technology has been put forward, indeed, the first time that 1

0:ldlp(4). 1:ldc(1). 2:ldc(4). 3:in. 4:j(137). nop. 6:ldl(4). 7:stl(3). 8:ldlp(4). 9:ldc(1). 10:ldc(4). 11:in. 12:ldl(4). 13:stl(5). 14:ldc(0). 15:ldl(5). 16:rev. 17:gt. 18:stl(5). 19:ldl(3). 20:stl(6). 21:ldc(0). 22:ldl(6). 23:diff. 24:eqc(0).

25:ldl(5). 26:and. nop. 28:cj(9). 29:ldc(-2). nop. 31:stl(0). 32:ldlp(0). 33:ldc(2). 34:ldc(4). 35:out. 36:j(105). nop. 38:ldl(4). 39:stl(5). 40:ldc(0). 41:ldl(5). 42:rev. 43:gt. 44:stl(5). 45:ldl(3). 46:stl(6). 47:ldc(0). 48:ldl(6). 49:diff.

50:eqc(0). 51:eqc(0). 52:ldl(5). 53:and. nop. 55:cj(9). 56:ldc(-1). nop. 58:stl(0). 59:ldlp(0). 60:ldc(2). 61:ldc(4). 62:out. 63:j(78). nop. 65:ldl(4). 66:stl(5). 67:ldc(0). 68:ldl(5). 69:diff. 70:eqc(0). 71:stl(5). 72:ldl(3). 73:stl(6). 74:ldc(0).

75:ldl(6). 76:diff. 77:eqc(0). 78:ldl(5). 79:and. nop. 81:cj(8). 82:ldc(0). 83:stl(0). 84:ldlp(0). 85:ldc(2). 86:ldc(4). 87:out. 88:j(53). nop. 90:ldl(4). 91:stl(5). 92:ldc(0). 93:ldl(5). 94:gt. 95:stl(5). 96:ldl(3). 97:stl(6). 98:ldc(0). 99:ldl(6).

100:diff. 101:eqc(0). 102:eqc(0). 103:ldl(5). 104:and. nop. 106:cj(8). 107:ldc(1). 108:stl(0). 109:ldlp(0). 110:ldc(2). 111:ldc(4). 112:out. 113:j(28). nop. 115:ldl(4). 116:stl(5). 117:ldc(0). 118:ldl(5). 119:gt. 120:stl(5). 121:ldl(3). 122:stl(6). 123:ldc(0). 124:ldl(6).

125:diff. 126:eqc(0). 127:ldl(5). 128:and. nop. 130:cj(7). 131:ldc(2). 132:stl(0). 133:ldlp(0). 134:ldc(2). 135:ldc(4). 136:out. 137:j(5). 138:ldc(1). 139:cj(1). 140:j(2). 141:stopp. 143:ldc(1). 144:eqc(0). 145:cj(-141). nop. nop.

Figure 1: Assembler code compiled from the Occam source code in Figure 2. Instructions which will occupy more than one byte as machine code are here shown padded with following nops so that the code can be accurately represented as a list in which the position determines the physical address.

2

/* Autopilot example */ /* Number of RAM locations */ # 16. int sprev: int scurr: seq[ input?scurr, while(true, seq[ sprev := scurr, input?scurr, if[ (scurr > 0) /\ (scurr > 0) /\ (scurr = 0) /\ (scurr < 0) /\ (scurr < 0) /\ true -> skip ] ] ) ].

(sprev (sprev (sprev (sprev (sprev

= 0) <>0) = 0) <>0) = 0)

-> -> -> -> ->

output!-2, output!-1, output!0, output!1, output!2,

Figure 2: Occam source code for the assembler shown in Figure 1. The portion shown in italics is not recovered by decompilation; comments, declarations and variable names are lost. The portion shown in courier font is the exact output from a decompiler applied to the assembler code. In addition, the names ‘sprev’ and ‘scurr’ become ‘x3’ and ‘x4’ respectively in the decompiler output.

3

the idea of a decompiler compiler has appeared in the literature. The implementation technique presented here itself has a rigorous formal foundation, and the reason why it is a correct technique will be sketched. Regarding practical aspects, it is likely that, in the case of optimizing compilers at least, too much information is lost in compilation to make a fully automated approach like this feasible [2]. But in safetycritical systems it is usual to generate non-optimized code and thus it is in the latter area that the work presented here may prove most useful.

1.1 Background Automated support for software maintenance is important in making the process tractable [1] and reverse engineering is often cited as a significant component of the effort involved [23]. Reverse engineering certainly encompasses decompilation, but, in contrast, it is usually an interactive activity because the process of abstracting away from the lower-level representation (e.g., program source code) to a higher-level specification is typically uncomputable [12]. At the level of abstraction found in the typical decompilation relation between machine object code and program source code, however, the reverse engineering process can usually be automated, albeit with less efficiency than in the normal compiler direction, and this is what we mean by decompilation. Note that decompilation is not intrinsically a computable process and reverse engineering an uncomputable one; both may be uncomputable if the human or machine process which compiled the code is sufficiently recalcitrant. But machine compilers typically have attributes which make automated decompilation likely to succeed (i.e., terminate with a result), and the theoretical work in [9] shows that if a source code is obtained from a decompiler constructed in the appropriate way, then it is a correct one. On the other side of the coin, however, it is true that in general there are an infinite number of possible source codes that could correspond to a given piece of object code, because much information is lost during compilation. It would be impossible to generate them all, but, in practice, most are variations on a single theme, and one representation suffices. It need not be syntactically the same as the original source code provided that the semantics is recognizably the same. The names of variables, for example, have to be guessed at, but a generic name will suffice at a first pass. Such considerations help make the practical use of decompilers an intrinsically feasible objective. While the term “decompiler” seems fairly well accepted in general [26], to date there has been very little published literature on the subject. Katz and Wong report on decompiling a database data manipulation language into relational queries [31]. Buettner has investigated the efficient decompilation of code generated by compiling clauses of the logic programming language Prolog [13]. Lichtblau has considered how to decompile program control structures by using transformations on graphs [34], including investigation of the efficiency of such techniques [35]. May presents a very simple ‘toy’ decompiler in [37] for a more general readership. Tool-based interactive decompilation of low level code to a specification for reverse engineering in the software maintenance process using formal program transformations has been considered by Ward [48, 49]. More recently there has been interest in the decompilation of Java Bytecode [41]. For a history of decompilation and recent developments, see [17]. The legal and ethical aspects of decompilation are of increasing concern; legislation specifically covering this area is being introduced in many countries. For example in the UK, section 50B of The Copyright (Computer Programs) Regulations 1992 [33] specifically forbids a lawful user of a copy of a program expressed in a low-level language to perform decompilation of the program if he/she “uses the information to create a program which is substantially similar in its expression to the program decompiled”. However there are many issues to consider [14], including the encouragement of open systems. The European Community directive EC91/250/EEC is trying to steer a middle course, but this has slowed its implementation [15]. Meanwhile the US law courts are often considering the legality of reverse engineering, sometimes judging it to be “fair use” of copyrighted code if this is the only way to access parts of the code [25]. 4

The connection between high-level programs and matching low-level object code is in general relational rather than functional and this has sometimes led to its natural formulation in a logic programming language such as Prolog (e.g. [2, 4]). Such programs may be run both as a compiler and as a decompiler with only minimal alterations. Such changes as may be required are for efficiency reasons, to ensure termination, and to account for those arithmetic operations which are irreversible in standard Prolog. Constraint logic programming [36] can be used to make the number of changes required even fewer [2]. It seems entirely feasible that the new technique of implementing higher order functional languages via narrowing [22] means that a compiler can be written in a functional language such as Babel [39] and then, without any changes at all, interrogated about which source codes give rise to given object codes. In industrial use, however, these approaches may be deemed too inefficient or speculatively based. and imperative programming may prove a necessary component in an acceptable technology. To make such a component, we draw upon the idea of a compiler compiler [11, 10, 30], and introduce the novel concept of a decompiler compiler in this paper. Analogously to a compiler compiler, such a tool eases the production of efficient decompilers.

1.2 An approach to decompilation A compiler is generally specified in a manner that gives rise to an implementation automatically. One traditional route is via the well-known UNIX parser generator yacc [30], for example, and a yacc specification consists of a description of the source language decorated with C programming language [32] commands that will build the appropriate object code when an application program script is parsed. The yacc utility converts the specification to C code for an automaton-based parser. The route we suggest for building a decompiler is analogous. The decompiler is described in a format that can be converted automatically to working code, for example, C. The decompiler description format consists of a source language specification decorated with corresponding portions of object code. We present the decompiler description language here in a language independent manner (in Section 2). A real example of a decompiler is presented in Section 3. The following sections of this paper detail the decompiler compiler technique for the decompilation of low-level object code back to high-level source code. That is, Section 4 gives a stand-alone semantics for the decompiler description language, and Section 5 gives a two-stage implementation technique which does not depend on any particular implementation language, operating system or hardware. The first stage generates code which calls on four library functions. The second stage consists of defining the library functions. For any given supporting language and environment (say, C and UNIX), the library must be coded in a way that ensures the intended functionality is achieved, but that is the limit of the dependencies here. The method is an extension of other ideas of reverse engineering conceived by the authors on the ESPRIT II REDO project [5, 46], a European collaboration which addressed some of the problems of software maintenance, and reverse engineering in particular [6].1 Section 5.1 also demonstrates the correctness of the method with respect to the intended semantics. A real decompiler generator must eventually generate executable code, and a concrete implementation for the C programming language is described in the next sections. This language is very popular has been found to be useful for software engineering applications [27]. Section 5 introduces the small link library containing the type ACTIVELIST, with four constructor functions, and Section 7 describes the generation of decompiler code in C. The implementation given essentially emulates lazy lists, a concept borrowed from the functional programming paradigm, in which entries are not evaluated until needed [45]. Code for a lazy functional programming language is easier to produce, and would operate satisfactorily, but would probably be regarded as a less convincing platform for a utility of this nature in an industrial environment. 1

See the URL http://www.comlab.ox.ac.uk/archive/redo.html for on-line information concerning the REDO project, including decompilation issues.

5

2

The decompiler description language DeCoDe

Consider the compilation relationship between source code and object code. For example, the following states informally that ‘skip’, as used in the language Occam, is one of the possible source codes (in fact, the only one, but that is impossible to deduce without the complete description of the compiler relation) that may give rise to object code of the shape ‘beginning with a jump over all remaining instructions’: program([j(N); Ins2 ;:::; InsN ])

::=

skip

:::

This is a ‘decorated’ Backus-Naur Form (BNF) [30] syntax description. The decorations have been printed in typewriter font and the rest is bare BNF. A decompiler compiler utility should convert a description like the above to a decompiler with a functionality that generates a set of possible source codes given an object code as input:

([ ( );

program j N Ins2

;:::; InsN ]) = fskipg [ :::

where the curly brackets denote sets, and the ‘[’ symbol denotes a union of sets. The square brackets denote lists, here with comma-separated elements. If the input object code has the shape [j(N); Ins2 ; :::; InsN ], then skip will be somewhere amongst the set of source codes output. We expect to generate from a description of the compilation relationship between source and object code a working decompiler that responds to an object code pattern by outputting (individually, upon request) precisely all the source codes that could have given rise to it through compilation. In the present example, the decompiler is called program and it has the type of a map from object codes to a set of source codes: program

:: object

code

! set of(source

)

code

although this is a slight abstraction, because an implementation will produce a list rather than a set. Implicit or explicit ordering of the output is difficult to avoid. However, we are not interested in the output order (in the present paper). It may be useful in practice to order the outputs with the least complex first, but we do not know if that is the order which best reflects the likelihood of the source code having been the real progenitor of the object code given. Moreover, theoretical considerations suggest that attempts at imposing an extrinsic ordering run foul of the Halting Problem [9], so sets are an appropriate typing abstraction. Note that the number of source codes output is usually infinite for trivial reasons: source code variable names go unrecorded in the object code, and therefore, strictly speaking, all possible naming variants should appear in the reconstructed source code; SKIP instructions may compile to the empty object code, so all possible variants with extra SKIP instructions should be listed, and so on. However, it may be sufficient in practice to present a unique example from each significant class of variations, and such a strategy can reduce the distinct outputs to a finite number. Elimination of variations is no more difficult than elimination of exact repetitions. However, again, the theory in [9] suggests that a posteri attempts at reducing the total numbers of variations lead to problems in achieving good termination characteristics. The best practical advice we have is to write each decompiler specification so as to avoid the introduction of any trivial variants, perhaps naming variables v1, v2, etc., according to their order of first appearance in the source code, and assuming a uniform style of expression bracketing, for example. We shall require the precise format of description given in Figure 3, and this is the language that we have called DeCoDe (for Decompiler Compiler Description). The figure uses a commonly used semiformal format for language definitions in which alternative phrases are separated by a ‘j’ and the pattern of the allowed syntax appears on the right of each ‘::=’ sign. According to the figure, DeCoDe consists of 6

Ddesc

::=

dform1 ::= dex1 . dform2 ::= dex2 . . . .

Dform

::=

did( a id1,a id2,. . . )

Dex

::=

dalt1 ; dalt2 ; . . .

Dalt

::=

dseq1, dseq2, . . . @ a ex

Dseq

::=

j

fa

ex g

a id

did( a ex1,a ex2, . . . )

a id(a ex1;a ex2;::: )  A ex

Did  A id  A ex

Figure 3: A description of the decompiler description language DeCoDe. The terminals are Did, A id and A ex. a sequence of definitions of the form f ::= e:, where the LHS f is an identifier i with variables v as arguments, and the RHS e is a possibly compound expression. Expressions consist of one or more alternates, separated by semi-colons. Each alternate is a comma-separated sequence of sequents, terminating in an ‘’ sign and an attribute language expression x. Each sequent is either a condition f c g, where c is an attribute language boolean expression, or a generator, of the form v i(x1; x2;::: ), where the xi are more attribute language expressions. DeCoDe is technically an attribute grammar description language [29], in which the attribute data-types come from a low-level language with identifiers A id and expressions A ex. For the utility that we have actually constructed, this language is C, but it could be another language. The fragment of description considered above appears as follows in DeCoDe: program(O)

::= f condition1(O) g  skip ; :::

where condition1 is an attribute language predicate which checks that the attribute O (for Object code) has the structure of a list of object code instructions of some length N, in which the head instruction is a j(N). Skip is the attribute language expression which constructs the parse tree or string corresponding to the Occam skip statement. As illustrated above, terminal symbols like skip in the BNF description of the source codes correspond to singleton outputs SKIP from the decompiler. But multiple clauses in the BNF induce multiple outputs, and non-terminals correspond to compound set or list comprehensions [47] with several generators. To illustrate these correspondences, we look at a slightly more complicated example. A sequence of source code instructions is related to a sequence of object code instructions through compilation, as described by the following informal fragment:

( ++O2) ::= :::

program O1

( ) 9 program (O2)

program O1

:::

where ‘9’ constructs a sequence of statements from two sequence fragments. 7

Interpreted as a decompiler, the function produced has to match the object code pattern in every possible way, and generate a list of corresponding source code patterns. The output can be described as a set comprehension:

( ++ ) = ::: [ f j s1 2 program (O1); s2 2 program (O2)g [ :::

program O1 O2 s1 9 s2

The comprehension should be read as “the set of s19 s2 such that s1 is a member of the set program(O1) and s1 is a member of the set program(O2)”. But this style of non-unique pattern-matching is rarely available to the programmer, even in modern functional programming languages. In the simple language of DeCoDe, it may be rendered as follows:

( ) ::= ::: ; O1

program O

O2 S1 S2

; :::

( ); ( ; ); ( ); ( ) ( ; )



init O diff O O1 program O1 program O2 seq2 S1 S2

where seq2 is the attribute language function which constructs the parse tree or string corresponding to an Occam sequence of two programs. Note that source code constructs can be attributes too, as well as object code. The object code attributes are inherited, and the source code constructs are synthesised, in the usual terminology of attribute grammars. The subsidiary functions init and diff will be described below. The construction illustrates the use of the ‘x Y’ notation in DeCoDe to introduce a new code called x from a given list of codes Y. Diff(foo; bar) is the singleton containing foo n bar, the list-difference between foo and bar. For the purposes of its use here, it may always be supposed that the second parameter, bar, will represent an initial segment of the first parameter, foo. Both parameters are, in practice, attribute structures representing sequences of object codes. A suitable definition of diff in DeCoDe is:

(

;

diff Long Short

) ::= listdiff(Long; Short):

where listdiff is the attribute language function which traces along the longer list structure until it reaches the end of the shorter, and then returns the remainder. The function init (foo) delivers all the initial segments of the sequence foo. We will use init as an example throughout the rest of this paper. It is convenient to define init(List) thus in DeCoDe:

(

init List

) ::= f nonempty(List) g; Tail init(tl(List))  cons(hd(List); Tail) ;  emptylist:

(1)

where nonempty is an attribute language predicate that checks that the List parameter is an attribute structure representing a nonempty sequence (of object code instructions). The attribute language functions hd and tl respectively return the instruction at the head of an attribute structure representing a sequence of instructions, and the structure representing the tail, and the attribute language function cons builds a longer sequence structure by prepending a new member to the front of an existing sequence. Emptylist is the attribute structure representing an empty sequence of instructions. (Note that ‘;’ separates alternative 8

productions). The DeCoDe description corresponds to the following set comprehension in a functional language: init List

= f hd List : Tail j List 6= [ ] ; Tail

(

) g[f [ ] g

init tl List

(2)

and then, for example:

[ ; ; ] = f [1; 2; 3]; [1; 2]; [1]; [ ] g

init 1 2 3

This concludes the introduction to DeCoDe. In the next section we apply the language to the description of a real compiler relation.

3

An example of a real decompiler

The DeCoDe description of a decompiler in Figure 4 contains description fragments already discussed in Section 2. It is a decompiler based on the compiler specification in [24], which is proven correct for the subset of Occam considered. A formal description of the subset is given in Figure 5. The full language allows blocks (seq statements) which may be empty or singleton, and the subset allows only sequences of at least two codes, which may not themselves be blocks. This makes sense here because the grouping of blocks of blocks has no semantic effect. Any piece of compiled code, whether it came from code with blocks in or not, can be decompiled to equivalent source code without blocks in. So, in order to normalize the appearance of source code as much as possible, we may as well outlaw multi-level blocks. Moreover, the subset also disallows sequences containing skip codes. The rationale is again that source code with skips in is equivalent to code without, so we may as well standardize by outlawing them. That way, decompilation will not produce many trivial variants. Compiled code which comes from source with skips in will be decompiled to source code without them. Exceptionally, there is one situation is which a skip is valid, and that is when it is the only line of code in a program. Accordingly, the single line skip program may itself form a complete script, or the body of a while loop or one of the alternates in an if statement. This particular subset also has no variable declarations! We will assume that all variables are declared at the head of an Occam script. The only type allowed for them is integer. It is true that variable declarations in the full language serve to generate layered scopes, but any script is equivalent to one in which the variable references have been disambiguated by renaming, and, from the compiled code, the scoping is no longer visible, so, again for the purpose of readily obtaining a normalized form, we dictate that no declarations appear in the interior of a script. Finally, the subset also disallows empty if statements. These are semantically equivalent to stop statements, and the latter must be used instead. The decompile description in Figure 4 encodes more of the source language to object code relationships than have been described so far, and these will be sketched below. Note first that the generic constructor singleton(x) is introduced in the figure as a means of declaring a local variable. Instead of writing let O1

= srcpart(O) in :::

the variable O1 is set to range over the singleton list of srcpart(O) instead. It would be possible for eCoDe syntactic sugar like the let form above to be built in, but for simplicity here the only allowable D syntax is 9

program(O) P program0(O) P program1(O) P program2(O)

::= ; ; .

 P  P  P

::= ; .

 skip  skip

program1(O) fcondition3(O)g V singleton(objvar(O)), 01 singleton(srcpart(O)), E expr(O1) fcondition4(O)g, O1 singleton(outpart(O)) E expr(O1) fcondition5(O)g flength(O)>0g, Gs guards(O) fcondition6(O)g, O1 singleton(looppart(O)), O2 singleton(testpart(O)), S program(O1), E expr(O2) fcondition7(O)g, V singleton(inpart(O))

::=

; ; ; ;

. program2(O) Ss programs(O), flength(Ss)>1g

::= .

; .

guard(O) fcondition8(O)g, O1 singleton(boolpart(O)), O1 singleton(codepart(O)), B bool(O1), S program(O2)

::= .

bool(O)  assign(V,E) fcondition20(O)g fcondition21(O)g fcondition22(O)g, O1 singleton(pospart(O)),  output(E)  stop B bool(O1) fcondition23(O)g, O1 singleton(leqpart(O)),  if(Gs) O1 singleton(reqpart(O)), E1 expr(O1), E2 expr(O2) fcondition24(O)g, O1 singleton(lltpart(O)), O1 singleton(rltpart(O)),  while(E,S) E1 expr(O1), E2 expr(O2)  input(V) ...

::= ; ; ; ;

 case(B,S)

 true  false  not(B)

 eq(E1,E2)

 lt(E1,E2)

.

 seq(Ss)

programs(O) fnonempty(O)g, O1 init(O), O2 diff(O,O1), S program1(O1),  cons(S,Ss) Ss programs(O2) fempty(O)g  emptylist

::=

::= ; .

program0(O) fcondition1(O)g fcondition2(O)g

;

guards(O) fcondition5(O)g,  emptylist O1 init(O), O2 diff(O,O1), S guard(O1),  cons(S,Ss) Ss guards(O2)

expr(O) fcondition30(O)g, V singleton(varpart(O)) fcondition31(O)g, K singleton(intpart(O)) fcondition32(O)g, O1 singleton(laddpart(O)), O1 singleton(raddpart(O)), E1 expr(O1), E2 expr(O2) ...

::= ; ;

.

Figure 4: The decompiler description for Occam.

10

 var(V)  const(K)

 plus(E1,E2)

program

guard

::=

j j j j j j j j j

::=

seq [ program , program , . . . ] input var output expr while ( bool , program ) var := expr if [ guard , . . . ] channel ? var channel ! expr stop skip bool -> program

expr

::=

j j

var integer expr + expr

... bool

::=

j j j

true false ˜ bool bool /\ bool

...

j j

expr = expr expr < expr

... Figure 5: A subset of the programming language Occam. In this subset a seq block must contain at least two other codes, and they may not themselves be seq blocks or skips. If statements may not be empty (which would be equivalent to a stop) and arithmetic constants and variables are restricted to integer values.

O1

(

( )); :::

singleton srcpart O

and the generic singleton list is defined in DeCoDe as follows:

( ) ::=  x :

singleton x

The description of the decompiler includes several attribute language level condition predicates, and several deconstructor functions. The latter pick out parts of the object code attribute which the condition predicate identifies as being present, as shown in Table 1. For example, the objvar function picks out the variable address V in an attribute structure representing an object code sequence in which the last instruction is stl(V), identified by the predicate application condition3(O).

4

An implementation independent semantics for DeCoDe

We give a set-valued semantics to DeCoDe, shown formally in Figure 6. This is independent of any imeCoDe as set plementation in a particular programming language. The idea is to interpret the clauses of D e o e comprehensions, and clauses with alternates as set unions. The semantics of the DC D definition of init is the binding of the identifier init to a function which produces a set of initial sublists as output given the List attribute as input:

(

) = f hd(List) : Tail j Tail 2 init(tl(List))g[f emptyList g

init List

(3)

The figure formally defines the semantics via fixed point operators in the domain of set-valued total functions, but this just means that equations involving sets are solved in the standard way, and the smallest solution is taken if many are possible. Formally, init is the least fixed point of the right hand side of the equation (3) viewed as a transformation: 11

condition number 1 2 3 4

object code pattern []

[j(N); ins1 ;:::; insN;1] [|ins1;:::; {z insn}; stl(V)] O1 [|ins1;:::; {z insn}; Stl(0); Ldlp(0); Ldc(2); Ldc(4); Out] O1

5 6

[Stopp; Nop] 0 0 [j(M); Nop; |ins1;:::; {z insM}; ins {z insN}; cj(;M ; N ; 2); Nop] | 1;:::; O1

7 8

O2

[Ldlp(x); Ldc(1); Ldc(4); In] [|ins1;:::; {z insN}; cj(M + 3); nop; ins | 1;:::; {z insM}; j(N); nop] O1

O2

Table 1: Object code patterns detected by predicates condition1–condition8.

ZF [ ] :: A[ ]

:: :: ::

Ddesc Dex Dalt A ex

! ! ! !

binding set of(A val) set of(A val) A val

ZF [ f1 ::= e1 ;:::; fn ::= en ] ZF [ ] ZF [ a; as ] ZF [ i l; ss  x ] ZF [ f c g; ss  x ] ZF [  x ]

= = = = = =

:=  f: (ZF [ ei ] )i=1 n fg ZF [ a ] [ ZF [ as ] SfZF [ ss  x ] j i 2 l g if A[ c ] then ZF [ ss  x ] else fg f A[ x ] g

f

;:::;

eCoDe. Figure 6: A formal set-theoretic semantics for the decompiler description language D

12

f1,f2,. . . 2Sform, e1,e2,. . . 2Sex

Sdesc

::=

f1 = e1 f2 = e2 . . .

Sform

::=

i( v1,v2,. . . )

Sex

::=

+ as a+ []

a2Salt, as2Sex

Salt

::=

s  p. a s

s2Sseq, p2Spat, a2Salt

Spat

::=

Sseq

::=

j j j j

i2Id, v1,v2,. . . 2A id

v2A id k2A const

v k

x2A ex i2Id, x1,x2,. . . 2A ex

x i( x1,x2,. . . )

Figure 7: A restricted language of set combinators: Set.

=  f: T[f] where T[f] =  List: f hd(List) : Tail j Tail 2 f(tl(List))g[f emptyList g and the  notation means that f = init both satisfies the equation f = T[f] init

and that init is the least total set-valued function f to do so, ordering the possible f by increasing values of the set-valued result f(List) at each point List (in this ordering, f(List) = f g for all List is bottom). In general, the semantics binds all the identifiers defined in the script simultaneously. So mutual recursion is allowed between different definitions in DeCoDe. Init is only simply recursive; it depends on no other definition which depends on it. The binding turns identifiers i into functions on a (notionally object code) attribute a, returning a set of (notionally source code) attribute values fa1 ; a2 ;:::g:

ZF [ script ] (i)(a) = fa1 ; a2;:::g

; ;

a a1 a2

2A

val

where A val is the set of data values available in the underlying language of attributes, and which must therefore contain structures representing source and object codes.

5

A generic implementation technique

Actually implementing a decompiler description entails turning it into a program which returns the elements of the specified sets of source code constructs, given the appropriate object code as parameter. There is a combinator-based way of doing this that is appropriate for a decompiler compiler to make use of, because then only the individual combinators and the generation method need validation in order to ensure correctness. Correctness here means that the implementation provably satisfies its formal specification. The specification is provided by the set-valued semantics ZF [ ] for DeCoDe given in Section Semantics. Moreover, it is important to give a stand-alone implementation (i.e., one that does not rely on any other platform, such as, perhaps, a non-strict functional programming environment) because, in an industrial context, the correctness of the resulting decompiler should be seen to be as independent as possible of the eCoDe expressions: correctness of other architectural layers. Accordingly, we define a transformation of D 13

Comb[ ] ::

::

Dex Ddesc

! !

Sex Sdesc

to a combinator-based presentation of sets, the language of which is defined in Figure 7. There are four combinators:

++



[]

:



Implementing sets (as lazy lists, or otherwise) then requires the combinators to be implemented correctly, which induces a map of Set presentations onto sets:

Eval[ ] ::

::

Sex Sdesc

! set of (A ! binding

val)

and the evaluation of definitions as bindings of identifiers to sets is induced too. The implementation of recursively bound functions should be via the least fixed point operator. This is not as hard as it sounds, even in languages which are notionally eager evaluators of data structures, because functions and compound structures are usually passed by reference, not by value. The correctness of the decompiler will then eventually depend on the following equation, (4), which says that the intended semantics ZF [ ] is achieved by first compiling into combinators via Comb[ ] , then evaluating the combinator code according to Eval [ ] , which will be defined below:

Eval[ ] o Comb[ ] = ZF [ ]

(4)

(although subset, rather than equality, would also be an acceptable criterion for correctness) and a practical decompiler compiler can implement this scheme by using Comb[ ] to generate the combinator-based code X for a decompiler, and making the correct library implementations Eval[ ] of the combinators available to be linked in when the code X is compiled into a working decompiler utility. We specify the meaning of the combinators of Set as follows:

Eval[ a ++ b ] Eval[ [] ] Eval[  x ] Eval[ s v: a ] Eval[ s k: a ]

= = = = =

Eval[ a ] [ Eval[ b ]

(5)

fg f[A[ x ] g fEval[ a ] j v 2 Eval[ s ] g if k 2 Eval[ s ] then Eval [ a ] else f g

(6) (7) (8) (9)

where A[ ] is the evaluator of attribute language expressions. Although there is a distinction drawn between  v: and  k: here, where v is a variable name, and k is a constant, it is inessential. The translation of s k: a could be written like the translation of s v: a, as SfEval[ a ] j k 2 Eval[ s ] g, provided that a pattern-match against k is understood. The evaluation of recursion equations is again by least fixed point semantics. And all global identifiers will be defined by such equations, so the evaluation of atomic forms is as the set that will eventually be the least fixed point of the equations. I.e., no special treatment is required:

Eval[ f1 = e1 ::: fn = en ] = f :=  f: (Eval[ ei ] )i=1 Eval[ i(x1 ; x2 ;:::) ] = i(x1 ; x2;:::)

;:::;

14

n

(10) (11)

Comb[ ] ::

:: ::

Ddesc Dex Dalt

! ! !

Comb[ f1 ::= e1 : ::: fn ::= en ] Comb[ ] Comb[ a; as ] Comb[  x ] Comb[ fcg; ss  x ] Comb[ v l; ss  x ]

Sdesc Sex Salt

= f1 = Comb[ e1 ] ::: fn = Comb[ en ] = [] = Comb[ a ] ++ Comb[ as ] = x =  c  True:Comb[ ss  x ] = l  v:Comb[ ss  x ]

Figure 8: The conversion of decompiler compiler descriptions to Set combinators.

We formally specify the interpretation Comb[ ] of DeCoDe into set combinators as shown in Figure 8. The resulting translation of the definition of init into combinators is as follows, for example: init(List)

= + +

 (nonempty(List))True: init(tl(List)) Tail:  (hd(List) : Tail)  (emptylist)

and using Eval [ ] to evaluate this gives (3). How expressive are Set and DeCoDe? We will not prove it here, but Set can express any set of(A val)valued function f of an argument x in A val, provided that f (x) is the range of a computable total function g(x; n), as n ranges through the natural numbers f0; 1; 2;::: g. This does not depend on the completeness or otherwise of A ex with respect to computability. If A ex is complete in this respect, however, then it can emulate g by g, say, and the following DeCoDe description of f (x) generates the set fg(x; n) j n 2 N g: f (x)

::= fn(x; 0) : fn(x; n) ::=  g(x; n) ; v fn(x; n +1)  v : So DeCoDe is computationally complete if A ex is too. It is also complete independently of A ex, but this is harder to show.

5.1 Correctness We formally state that the proposed implementation scheme, via the sense stated above. That is, (4) holds.

Comb[ ] and Eval[ ] , is correct in

Proposition 1 The direct set-theoretic semantics for DeCoDe (given in Figure 6) coincides with the natural set theoretic interpretation (5–9) of the translation into the restricted language of set combinators given in Figure 3. I.e.

ZF [ e ] = Eval[ Comb[ e ] ] for e in the part of the language called Dex. 15

The asserted equality is not syntactic identity, but ‘equivalence under the axioms of set theory’. The proof is by straightforward structural induction. This proposition requires and entails a simultaneous result on the part of the language called Dalt. Corollary 1

Eval[ Comb[ ss  x ] ] = ZF [ ss  x ]

and the equality is again in sets. So the translation Comb[ ] of the decompiler language into the language Set is correct, provided that the combinators are implemented correctly (i.e., as specified by Eval[ ] ).

6

Implementing the four set combinators in C++

Once it is known that the decompiler generation scheme is correct, it suffices to describe the generation of combinator code in a particular language, confirm that it implements the scheme, then implement the combinators as library functions. We describe an implementation in the object-oriented language C++ [44], using ‘active lists’. These are objects (in the object-oriented sense [27]) which respond to the messages FIRST and NEXT with an appropriate element of the set that they represent, possibly changing the internal state in the process. If there are no more elements to report, the response should be EMPTY. The abstract specification is as follows: MESSAGE RESULT activeList

::= ::= =

FIRST

A val

j j

MESSAGE

NEXT EMPTY

! state ! (RESULT; state)

An active list a implements a set  (and s is a feasible state of a) if the following rules are satisfied:

= (v; s) with v 6= EMPTY implies v is an element of  and s is a feasible state of a; if `s is a feasible state of a and a NEXT `s = (v; s) with v 6= EMPTY implies v is an element of  and

(a) a FIRST (b)

s is a feasible state of a;

(c) if v 2  then there is a feasible state s of a and a message m with a m `s

= (v; s).

The feasible states are precisely those that can be reached by a sequence of NEXT messages from the feasible state induced by the FIRST message, which resets the object. The elements are those values returned by the object while it is left in a feasible state. With this definition, it is possible to show that the library implementations of the four combinators shown in Figure 10 are correct. For example, the object called zero implements the empty set. It has trivial internal state (i.e., state = ( )). And it has no feasible states (amongst this single possibility!) because it always replies EMPTY to any message. So it can have no values v in whatever set it implements, which must therefore be the empty set. The object eta( ; x), where x is an attribute value, implements the singleton set constructor  x. It too has trivial internal state, and it is the only feasible state. It either replies EMPTY (to NEXT) or x (to FIRST), so the set of values it produces from feasible states is just the singleton f x g. The union of sets is rather more complicated to implement, and requires internal state s, a product of the internal states of the component lists, plus a local counter k :: f0; 1; 2; 3; 4g, to define which set it should next take a value from. If l1(a1) and l2(a2) are active lists implementing sets 1 and 2 respectively, and they have internal states s1 :: state1 and s2 :: state2 then the object dplus(k; ; la; l2; a1; a2) implements the union 1 [ 2, and it has internal state taken from 16

ddesc

::=

f1 ::= e1 . f2 ::= e2 . . . .

dform

::=

i( v1,v2,. . . )

dex

::=

v l1 @ v ; w l2 @ w

dact

::=

i( x1,x2,. . . )

f1,f2,. . . 2dform, e1,e2,. . . 2dex

i2Id, v1,v2,. . . 2A id

v,w2A id, l1,l22dact v,w2A id, l1,l22dact v2A id, c2A ex, l2dact x2A ex v2A id, l2dact

j v l1 , w l2 @ w j fcg,v l@v j @x j v l@v

i2Id, x1,x2,. . . 2A ex

Figure 9: A minimal sublanguage of DeCoDe, lacking multiply compound expressions. All proper dact subexpressions of a dex expression except in the last listed alternative here necessarily carry the same local variables as the root.

state

= (f0; 1; 2; 3; 4g; state1; state2)

The local counter k has the following significance: 0 1 2 3 4

= = = = =

both l1(a1) and l2(a2) are empty. l1(a1) is nonempty but l2(a2) is empty. l2(a2) is nonempty but l1(a1) is empty. both are nonempty, and l1(a1) is next to be called. both are nonempty, and l2(a2) is next to be called.

The feasible states include the (1; s1; ) such that s1 is a feasible state of l1(a1), and (2; ; s2) such that s2 is a feasible state of l2(a2). The object responds to the FIRST message with the FIRST reply from l1(a1), unless it is EMPTY, in which case it tries l2(a2). To NEXT messages, it responds alternately with replies taken from l1(a1) and l2(a2) alternately. It is possible to implement the functionality using just k mod 2 as the local counter, at the cost of some extra trial and error, but much shorter code, and it is this implementation which is shown in Figure 10 (for brevity). S Likewise, the star(t; ; l1; l2; a1; a2) function implements a set comprehension f2 (t) j t 2 1 g, given that l1(a1) implements 1 and l2(t; a2) implements  (t), for any t from 1 . These implementations are not unique to C++. They can be written in C, for example, by making the message just another argument to the library function. And therefore they can be written in any suitable low-level language, especially one for which a proved compiler exists.

7

Generating Set for C++

We give an implementation C [ ] for C++ of the translation Comb[ ] of DeCoDe into Set combinators. First of all we require that DeCoDe scripts be transformed so that every nontrivial subexpression is labelled and declared individually. That will ease the task of explanation here by rendering the mechanics of the assignment of internal labels invisible (but the actual decompiler utility performs this process on the fly). The resulting subset DeCoDe of DeCoDe is shown in Figure 9. The sublanguage allows declarations of the form i() ::= e where e is either of the form v 0i(0 )  v, i1()  v; w i2()  w, a simple variable introduction and use, v or is a simple alternation v 17

i1(); w i2()  w, a simple conditional, fcg; v i()  v, or a valuation,  x. In all cases but the first, the same environment variables  must appear on the RHS of the definition as on the LHS. The reduction of a DeCoDe description to a DeCoDe description can be validated by use of the setvalued semantics for DeCoDe given in Figure 6, but, in any case, the reduction only depends on the declarative properties of DeCoDe, and can be carried through by any reasonable subexpression labelling scheme. A definition which has the form:

(

) ::= v

init List

(

foobar List

)  v:

causes the compilation of an init object which makes a call to the foobar object. This is the code: init(List)->Message

f

return foobar(List)->Message;

g

which we abstract to LET(init; (List); foobar; (List)), using the LET macro defined in Figure 11. It can be understood as a binding of init(List) to a Set combinator expression labelled foobar. The full functionality of the compilation process C [ ] is shown in Figure 11. The code produced is semantically correct combinator code. For example, the compilation of the DeCoDe declaration:

(

) ::= v a1(List)  v ; w a2(List) w: according to Comb[ ] ought to be foobar(List) = a1(List) v:  v + + a2(List)w:  w foobar List

but we can use the equation (12) below, derived from the set semantics of DeCoDe: a v

:v=a

(12)

to reduce the combinator code to

(

foobar List

) = a1(List) ++ a2(List)

and this is precisely the combinator code produced by C [ ] , namely a C++ implementation of the +’ combinator applied to a1 and a2, all done with local variables (List): assignment of foobar to the ‘+

(

;(

); ; );

PLUS foobar List a1 a2

The code generated by the PLUS macro includes a call to the C++ object dplus. It may be confirmed that the other translations of C [ ] correspond in similar fashion to the combinator expressions predicted by Comb[ ] .

8

Summary

The theory behind a generic technique for compiling decompilers has been presented. A practical decompiler compiler generating C++ code has been described, making use of object-oriented implementations of lazy lists. A formal description language presented here enables source code to object code relations to be expressed succinctly, and the description is converted by the utility to correct code via the theoretically sound technique discussed. In practice, difficulties can always arise in the decompilation of optimized code. The approach is probably best suited to the (re)discovery of the control structure of the program code in an object program, and as such may also be useful in debugging (disassembly of the machine code is currently the norm here, but in the future, decompilation of the code to a higher-level representation may be more satisfactory to software engineer). The technique described is intended to be of particular use in the validation and maintenance of safety-critical systems where correctness and fault avoidance are paramount. Standards in this area are likely to provide increasing motivation for the application of such an approach. 18

RESULT dplus(state,l1,l2,a1,a2)->Message; int *state; ACTIVELIST l1,l2;f RESULT result; BOOLEAN flag switch(Message)f case FIRST: switch(result=l1(a1)->FIRST)f case EMPTY: *state=1; return l2(a2)->FIRST; default: *state=2; return result; g case NEXT: switch(flag=*state,state=!*state)f default:switch(result=l1(a1)->NEXT)f case EMPTY: return l2(a2)->NEXT; default: return result; g case 0: switch(result=l2(a2)->(flag==2?FIRST:NEXT))f case EMPTY: return l1(a1)->NEXT; default: return result; g g g g

++ library function implementing the ‘+ ’ (concatenation) of active lists.

Figure 10(a): C

RESULT eta(v)->Message; RESULT v;f switch(Message) f case FIRST: return v; case NEXT: return EMPTY; g g

++ library function implementing the ‘’ (singleton) of active lists.

Figure 10(b): C

RESULT zero()->Message;f return EMPTY; g

++ library function implementing the ‘[]’ (empty) of active lists.

Figure 10(c): C

RESULT star (temp,l1,l2,a1,a2)->Message; RESULT *temp; ACTIVELIST l1,l2;f RESULT result; switch(Message)f case FIRST: switch (*temp=l1(a1)->FIRST))f case EMPTY: return EMPTY; default: return l2(*temp,a2)->FIRST; g case NEXT: switch(*temp)f case EMPTY: return EMPTY; default: result=l2(*temp,a2)->NEXT; until (result!=EMPTY) switch (*temp=l1(a1)->NEXT))f case EMPTY: return EMPTY; default: result=l2(*temp,a2)->FIRST; g return result; g g g Figure 10(d): C library function implementing the ‘ ’ operation on active lists.

19

C [ ddesc ] ::

cpp code

C [ d1 d2 ::: ] C [ i() ::= i0 (0): ] C [ i() ::= v i1()  v ; w i2()  w: ] C [ i() ::= v i1() ; w i2()  w: ] C [ i() ::= f c g ; v i0 ()  v: ] C [ i() ::= x: ]

= C [ d1 ] ; C [ d2 ] ; ::: = LET(i;; i0 ;0 ) = PLUS(i;; i1; i2) = VSTAR(i;; v; i1; i2) = KSTAR(i;; c; i0 ) = ETA(i;; x)

LET i i0 PLUS i i

( ;; ;0 ) = i() -> Messagef ACTIVELIST i0 ; return i0 (0) -> Message; g ( ;; 1; i2) = i() -> Messagef static int state; ACTIVELIST i1; i2; return dplus(&state; i1;; i2;) -> Message; g VSTAR(i;; v; i1; i2)= i() -> Messagef static RESULT v; ACTIVELIST i1; i2; return star(&v; i1;; i2; + v) -> Message; g KSTAR(i;; c; i0 ) = i() -> Messagef ACTIVELIST i0 ; return (c?i0 () -> Message:zero() -> Message); g ETA(i;; x) = i() -> Messagef return eta(x) -> Message; g Figure 11: A C++ implementation C [ to Set.

] of the translation Comb[ ] of a minimal subset of DeCoDe

20

Acknowledgements The original research for this article was initiated when one of the authors, Peter Breuer, was a Visiting Research Fellow at BT Research Laboratories, Martlesham Heath, Ipswich, Suffolk, UK. Jonathan Bowen was previously funded by the UK Engineering and Physical Sciences Research Council (EPSRC) on grant number GR/J15186.

References [1] K. Bennett, Automated support of software maintenance, Information and Software Technology Vol 33 No 1 (1991) pp 74–85 [2] J.P. Bowen, From programs to object code and back again using logic programming: Compilation and decompilation, Journal of Software Maintenance: Research and Practice, Vol 5 No 4 (December 1991) pp 205–234 [3] J.P. Bowen et al., An invitation to formal methods. IEEE Computer Vol 29 No 4 (April 1996) pp 16– 30 [4] J.P. Bowen and P.T. Breuer, Decompilation. Chapter 9 in [46]. [5] J.P. Bowen, P.T. Breuer and K.C. Lano, A compendium of formal techniques for software maintenance, IEE/BCS Software Engineering Journal Vol 8 No 5 (September 1993) pp 253–262 [6] J.P. Bowen, P.T. Breuer and K.C. Lano, Formal specifications in software maintenance: From code to Z++ and back again, Information and Software Technology Vol 35 No 11/12 (November/December 1993) pp 679–690 [7] J.P. Bowen and V. Stavridou, Safety-critical systems, formal methods and standards, IEE/BCS Software Engineering Journal Vol 8 No 4 (July 1993) pp 176–187 [8] P.T. Breuer and J.P. Bowen, Decompilation is the efficient enumeration of types. In M. Billaud et al. (eds), Journ´ees de Travail WSA’92 Analyse Statique, BIGRE 81–82, IRISA-Campus de Beaulieu, F-35042 Rennes cedex, France (1992) pp 255–273 [9] P.T. Breuer and J.P. Bowen, Decompilation: The enumeration of types and grammars. ACM Transactions on Programming Languages and Systems (TOPLAS) Vol 16 No 5 (September 1994) pp 1613– 1647 [10] P.T. Breuer and J.P. Bowen, The PRECC Compiler-Compiler. In E. Davies and A. Findlay (eds). Proc. UKUUG/SUKUG Joint New Year 1993 Conference, Oxford, UK (6–8 January 1993) pp 167– 182 (Available from UKUUG/SUKUG Secretariat, Owles Hall, Buntingford, Herts SG9 9PL, UK) [11] P.T. Breuer and J.P. Bowen, A PREttier Compiler-Compiler: Generating higher order parsers in C. Software—Practice and Experience Vol 25 No 11 (November 1995) pp 1263–1297 [12] P.T. Breuer and K.C. Lano, Creating specifications from code: Reverse engineering techniques, Software Maintenance: Research and Practice Vol 3 No 3 (September 1991) pp 145–162 [13] K.A. Buettner, Fast decompilation of compiled Prolog clauses. In E. Shapiro (ed), Third International Conference on Logic Programming, Springer-Verlag, Lecture Notes in Computer Science Vol 225 (1986) pp 663–670 21

[14] A. Bundy and H. MacQueen, The new software copyright law, The Computer Journal Vol 37 No 2 (1994) pp 79–82 [15] M. Cheek, Decompilation questions slow implementation of EC software laws, IEEE Software Vol 10 No 2 (March 1993) p 97 [16] C. Cifuentes, Interprocedural data-flow decompilation. Journal of Programming Languages Vol 4 No 2 (1996) pp 77–99 [17] C. Cifuentes et al., The Decompilation Page, Centre for Software Maintenance, The University of Queensland, Australia (1998) URL: http://www.csee.uq.edu.au/csm/decompilation/ [18] C. Cifuentes and A. Fitzgerald, Australian recommendations on computer software protection. The Computer Journal Vol 37 No 7 (1996) pp 566–576 [19] C. Cifuentes and K.J. Gough, Decompilation of binary programs. Software—Practice and Experience Vol 25 No 7 (1995) pp 811–829 [20] D.L. Clutterbuck and B.A. Carr´e, The verification of low-level code, IEE/BCS Software Engineering Journal Vol 3 No 3 (1988) pp 97–111 [21] P. Curzon, A Structured Approach to the Verification of Low Level Microcode, Computer Laboratory, University of Cambridge, Technical Report 215 (February 1991) [22] J.C. Gonz´alez Moreno, M.T. Hortal´a Gonz´alez and M. Rodriguez Artalejo, On the completeness of narrowing as the operational semantics of functional logic programming. In Proc. Computer Science and Logic ’92, Springer-Verlag, Lecture Notes in Computer Science Vol 702 (1993) pp 216–230 [23] P.A.V. Hall, Overview of reverse engineering and reuse research, Information and Software Technology Vol 34 No 4 (1992) pp 239–249 [24] C.A.R. Hoare, He Jifeng, J.P. Bowen and P.K. Pandya, An algebraic approach to verifiable compiling specification and prototyping of the ProCoS level 0 programming language. In Esprit ’90 Conference Proceedings, Kluwer Academic Publishers (1990) pp 804–818 [25] IEEE, Court approves reverse engineering, IEEE Software Vol 10 No 2 (March 1993) p 100 [26] V. Illingworth (ed), Dictionary of Computing, 3rd edition, Oxford University Press (1990) [27] D. Ince, Object-Oriented Software Engineering with C++, McGraw Hill International Series in Software Engineering (1991) [28] INMOS Limited, Occam 2 Reference Manual, Prentice Hall International Series in Computer Science (1988) [29] T. Johnsson, Attribute grammars as a functional programming paradigm. In G. Kahn (ed), 3rd Conf. on Functional Programming Languages and Computer Architecture, Springer-Verlag, Lecture Notes in Computer Science Vol 274 (1987) pp 154–173 [30] S.C. Johnson and M.E. Lesk, Language development tools, The Bell System Technical Journal Vol 57 No 6 Part 2 (July/August 1978) pp 2155–2175 [31] R.H. Katz and E. Wong, Decompiling CODASYL DML into relational queries, ACM Transactions on Database Systems Vol 7 No 1 (1982) pp 1–23 22

[32] B.W. Kernighan and D.M. Ritchie, The C Programming Language, 2nd edition, Prentice-Hall Software Series (1988) [33] E. Leigh (Parliamentary Under Secretary of State, Department of Trade and Industry, UK), The Copyright (Computer Programs) Regulations 1992, Statutory Instruments, SI. 1992/3233, HMSO Publications, London (16 December 1992) [34] U. Lichtblau, Decompilation of control-structures by means of graph-transformations. In H. Ehrig, C. Floyd, M. Nivat and J. Thatcher (eds), Mathematical Foundations of Software Development, Volume 1: Colloquium on Trees in Algebra and Programming (CAAP’85), Springer-Verlag, Lecture Notes in Computer Science Vol 185 (1985) pp 284–297 [35] U. Lichtblau, Recognizing rooted context-free flowgraph languages in polynomial-time. In H. Ehrig, H.-J. Kreowski and G. Rozenberg (eds), Graph Grammars and Their Application to Computer Science, Springer-Verlag, Lecture Notes in Computer Science Vol 532 (1991) pp 538–548 [36] K. Marriott and P.J. Stuckey, Programming with Constraints: An Introduction, MIT Press (1998) [37] W. May, A simple decompiler, Dr Dobbs Journal Vol 13 No 6 (1988) p 50 [38] O.J. Mengshoel, Definite Clause Grammars for Knowledge Interchange during Knowledge Acquisition, draft paper for 8th Knowledge Acquisition for Knowledge-Based Systems Workshop (KAW94). Knowledge Engineering and Image Processing Group, SINTEF DELAB, N-7034, Trondheim, Norway (1993) [39] J.J. Moreno Navarro and M. Rodr´ıguez Artalejo, Logic programming with functions and predicates: the language BABEL, Journal of Logic Programming Vol 12 (1992) pp 191–223 [40] D.J. Pavey and L.A. Winsborrow, Demonstrating equivalence of source code and PROM contents, The Computer Journal Vol 36 No 7 (1993) pp 654–667 [41] T.A. Proebsting and S.A. Watterson, Krakatoa: Decompilation in Java (Does Bytecode Reveal Source). In Proc. 3rd USENIX Conference on Object-Oriented Technologies and Systems (COOTS), Portland, Oregon, USA, 16-19 June 1997, USENIX Association, Berkeley, California, USA (1997) pp 185–197 [42] RIA, Safety Related Software for Railway Signalling. BRB/LU Ltd/RIA technical specification No 23, Consultative Document, Railway Industry Association, 6 Buckingham Gate, London SW1E 6JP, UK (1991) [43] A. Spector and D. Gifford, Case study: The space shuttle primary computer system, Communications of the ACM Vol 27 No 9 (September 1984) pp 872–900 [44] B. Stroustrup, The C++ Programming Language, 2nd edition, Addison-Wesley Publishing Company (1991) [45] D.A. Turner, Functional programming and Miranda, IFIP Transactions A – Computer Science and Technology Vol 12 (1992) pp 32–41 [46] H. van Zuylen (ed), The REDO Compendium: Reverse Engineering for Software Maintenance, John Wiley & Sons (1993)

23

[47] P. Wadler, List comprehensions. In S.L. Peyton-Jones (ed), The Implementation of Functional Programming Languages, Prentice Hall International Series in Computer Science (1987) [48] M. Ward, Abstracting a specification from code, Journal of Software Maintenance: Research and Practice Vol 5 (1993) pp 101–122 [49] M. Ward, Program analysis by formal transformation, The Computer Journal Vol 39 No 7 (1996) pp 598–618 [50] B.A. Wichmann, A.A. Canning, D.L. Clutterbuck, L.A. Winsborrow, N.J. Ward and D.W.R. Marsh, Industrial perspective on static analysis. Software Engineering Journal Vol 10 No 2 (1995) pp 69–75 [51] L.A. Winsborrow and D.J. Pavey, Assuring correctness in a safety critical software application. High Integrity Systems Vol 1 No 5 (1996) pp 453–459

24

Generating Decompilers

It is of utility in the validation or verification of code in safety- ... 23:diff. 24:eqc(0). 25:ldl(5). 26:and. nop. 28:cj(9). 29:ldc(-2). nop. 31:stl(0). 32:ldlp(0). 33:ldc(2).

166KB Sizes 5 Downloads 303 Views

Recommend Documents

Electricity Generating
Dec 4, 2017 - จากการไม่มีก าลังการผลิตใหม่ๆในปี 2561 การเติบโตก าไรของ. บริษัทจึงไม่น่าตื่นเต้นมากนà

Electricity Generating - Settrade
Mar 6, 2018 - Hong Kong. 41/F CentralPlaza, 18 Harbour Road, Wanchai, Hong Kong ... KGI policy and/or applicable law regulations preclude certain types ...

Generating Wealth Through Inventions
Oct 28, 2016 - Office, intellectual property-based businesses and entrepreneurs drive ... trademark cancellations and domain name disputes; and preparing ...

Generating Wealth Through Inventions
Oct 28, 2016 - new business model for businesses that cannot realistically compete, or that do not wish to ... A patent creates a legal barrier preventing entry into the technology segment it defines. ... barrier to entry provides many benefits:.

Generating Google™ maps - Mark McClure
complete information is presented at the Google Maps™ API reference [2]. ... and display a Google map: an HTML file for the webpage and a javascript file that ...

Generating Google™ maps - Mark McClure
If you view the HTML file in a web browser, you should see the ..... We can illustrate the algorithm using the first 360 points of the Mt. Mitchell path. dataXML ...

Leveraging Contextual Cues for Generating Basketball Highlights
Permission to make digital or hard copies of part or all of this work for ... ums and gymnasiums of schools and colleges and provides ...... Florida State, 9th.

Generating Semantic Graphs from Image ...
semantic parser generates a unique semantic graph. G representing the descriptions of .... pseudo-code 1, shows that if Gcomb is empty then Gnext,. i.e. the next ...

Generating Arabic Text from Interlingua - Semantic Scholar
Computer Science Dept.,. Faculty of ... will be automated computer translation of spoken. English into .... such as verb-subject, noun-adjective, dem- onstrated ...

Leveraging Contextual Cues for Generating Basketball Highlights
[9] C. Liu, Q. Huang, S. Jiang, L. Xing, Q. Ye, and. W. Gao. A framework for flexible summarization of racquet sports video using multiple modalities. CVIU,.

Generating Links by Mining Quotations
Jun 21, 2008 - Scanning books, magazines, and newspapers has become a widespread activity ... tomatically add links by mining popularly quoted passages.

Generating Complete, Unambiguous, and Verifiable ...
architecture, design, implementation, and testing of the resulting system suffers. ... data. A typical simple scenario for withdrawing funds from the ATM might go something .... preconditions, these preconditions would not apply to exceptional paths,

Times Publishing generating incremental revenue ... Services
Times Publishing Company, founded in 1888, is a newspaper publisher based in Erie, Pennsylvania. GoErie.com (the company's primary site running Google. Consumer Surveys) averages 5 million monthly page views. Digital monetization for Times Publishing

GENERATING QUALITY STRUCTURED CONVEX ...
May 7, 2009 - The variational problem of generating structured grids in the plane has ... To answer it, we must pose a practical definition of quality in the direct ..... with useful information about the geometrical difficulty of a region in the ...

CREATIVE HYPOTHESIS GENERATING IN ...
ABSTRACT. To correct a common imbalance in methodology courses, focusing almost ...... invisible college of researchers working on different ranges of a relation, as in ... what appeared to be two successive negatively accelerated curves; then specia

Generating Links by Mining Quotations
Jun 21, 2008 - run on a digital library of over 1 million books and has been used by ... mation represents 500 years of printing that preceded the digital era.

Helpless Spectators: GENERATING SUSPENSE in ...
Screenplay address suspense in passing without any analysis of the com- ..... rewards, like new weapons and tools not available before certain goals are reached. ..... Visual Digital Culture; Surface Play and Spectacle in NewMedia Genre.

Generating Sentences from a Continuous Space
May 12, 2016 - interpolate between the endpoint sentences. Be- cause the model is trained on fiction, including ro- mance novels, the topics are often rather ...