HANDLING CONCATENATION IN TRACE- AND MODEL-CHECKING

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences

2009

By Joachim Baran School of Computer Science

Contents Abstract

10

Declaration

11

Copyright

12

Acknowledgements

13

1 Introduction

15

1.1 1.2

1.3

Verification Techniques . . . . . . . . . . . . . . . . . . . . . . . .

18

1.1.1

Applicability . . . . . . . . . . . . . . . . . . . . . . . . .

19

Formal Verification . . . . . . . . . . . . . . . . . . . . . . . . . .

21

1.2.1

Verifiable Properties and Behaviours . . . . . . . . . . . .

21

1.2.2

Complexity and Decidability . . . . . . . . . . . . . . . . .

26

1.2.3

Runtime Verification . . . . . . . . . . . . . . . . . . . . .

27

1.2.4

Model Checking . . . . . . . . . . . . . . . . . . . . . . . .

28

1.2.5

Trace Composition . . . . . . . . . . . . . . . . . . . . . .

31

Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2 Formal Languages 2.1

34

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.1.1

Relationship of Formal Languages to Verification . . . . .

35

2.1.2

Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . .

37

2.2

Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.3

Formal Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

2.3.1

Context-Free Grammars and Their Representations . . . .

39

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.4

3 Trace Composition in Eagle 3.1

42

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

44

3.1.1

Verifying Trace-Composition Properties . . . . . . . . . . .

47

3.1.2

Calculus and Implementation . . . . . . . . . . . . . . . .

52

3.1.3

Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . .

52

3.2

The Runtime-Verification Logic Eagle . . . . . . . . . . . . . . .

53

3.3

Interdefinability of Sequential Composition and Concatenation . .

55

3.3.1

Sequential Composition in Terms of Concatenation . . . .

55

3.3.2

Concatenation in Terms of Sequential Composition . . . .

58

Deterministic Cut Operators . . . . . . . . . . . . . . . . . . . . .

61

3.4.1

Syntax and Semantics of Deterministic Cut Operators . . .

62

3.4.2

Expressiveness of Deterministic Cut Operators . . . . . . .

65

On-line Monitoring of Deterministic Cut Operators . . . . . . . .

69

3.5.1

Eagle’s On-Line Monitoring Algorithm . . . . . . . . . .

70

3.5.2

Eagle’s Monitoring Algorithm Extended . . . . . . . . .

77

3.5.3

On-line Monitoring Complexity . . . . . . . . . . . . . . .

80

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

3.4

3.5

3.6

4 Model-Checking Context-Free Properties 4.1

4.2

4.3

4.4

4.5

85

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

4.1.1

Model-Checking Nested Context-Free Behaviour . . . . . .

87

4.1.2

Chapter Outline . . . . . . . . . . . . . . . . . . . . . . . .

91

Automata over Infinite Words . . . . . . . . . . . . . . . . . . . .

91

4.2.1

Regular Language Representations . . . . . . . . . . . . .

91

4.2.2

Visibly-Pushdown Language Representations . . . . . . . .

93

4.2.3

Motivation for a Grammatical Representation . . . . . . .

96

A Grammatical Representation of Visibly Pushdown Languages .

99

4.3.1

Balanced and Quasi Balanced Grammars . . . . . . . . . .

99

4.3.2

A Grammatical Representation of ωVPLs . . . . . . . . . 102

4.3.3

ωVPL and ωRL(qBL)+h Coincide . . . . . . . . . . . . . 108

Beyond Visibly Pushdown Properties . . . . . . . . . . . . . . . . 117 4.4.1

Limitations of Visibly Pushdown Languages . . . . . . . . 117

4.4.2

A Generalisation of Injector and Injected Languages . . . . 120

4.4.3

Factor Avoiding Languages with a Decidable Language Inclusion Problem . . . . . . . . . . . . . . . . . . . . . . . . 121

4.4.4

Examples of Context-Free Property Verification . . . . . . 132

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 3

5 Conclusions 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Trace-Composition Operators in the Runtime-Verification Eagle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Model-Checking Context-Free Properties . . . . . . . . . .

136 . . . . 136 Logic . . . . 137 . . . . 139

A Proofs

142

Index

151

Bibliography

153

Word Count: 51572 4

List of Figures 1.1

Illustration of call-stack behaviour . . . . . . . . . . . . . . . . . .

23

1.2

Illustration of data-stack behaviour . . . . . . . . . . . . . . . . .

24

1.3

Illustration of 1 : 1- and 1 : 2-matchings, i.e. counting properties .

25

1.4

Illustration of trace-composition behaviour . . . . . . . . . . . . .

32

2.1

Representation of a behavioural trace in terms of a formal language word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Terminological relationship between verification and formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

Representations of language classes by automata, grammars and logics for infinite word languages . . . . . . . . . . . . . . . . . .

36

3.1

Runtime-verification overview . . . . . . . . . . . . . . . . . . . .

43

3.2

Illustration of trace cuts by sequential composition and concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.3

Traces of a fail-safe system . . . . . . . . . . . . . . . . . . . . . .

50

3.4

Sequential composition expressed by concatenation . . . . . . . .

56

3.5

Concatenation expressed by sequential composition (simplified)

.

58

3.6

Examples of deterministic cut operator applications . . . . . . . .

63

3.7

An alternative deterministic cut operator . . . . . . . . . . . . . .

64

4.1

Overview of the model-checking process . . . . . . . . . . . . . . .

86

4.2

Embedded 1 : 1- and 1 : 2-matchings . . . . . . . . . . . . . . . .

89

4.3

Structure of matchings in visibly-pushdown languages . . . . . . .

95

4.4

A binary-tree parsing program and its behaviour expressed by automata and grammars . . . . . . . . . . . . . . . . . . . . . . . .

97

Language classes and their automata-, grammar- and logic-representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

2.2 2.3

4.5 4.6

A binary-tree parsing program and its behaviour expressed by automata and grammars . . . . . . . . . . . . . . . . . . . . . . . . 102 5

4.7 4.8 4.9 4.10 4.11 4.12 4.13

Behaviour of a binary-tree parsing program expressed by automata and grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Description of an ω-visibly pushdown automaton for recognising the behaviour of a binary-tree parsing program . . . . . . . . . . 106 Illustration of embedded 1 : 1- and 1 : 2-matching behaviour . . . 118 Lattice of the Boolean algebra of Example 4.7 . . . . . . . . . . . 123 Removal of ambiguity in transitions labelled by surrogate terminals 128 Complementation of an injector language . . . . . . . . . . . . . . 130 Lattice of the Boolean algebra of Example 4.9 . . . . . . . . . . . 133

6

List of Examples 1.2 1.2 1.2 2.3 3.1 3.1 3.1 3.1 3.1 4.1 4.1 4.3 4.3 4.3 4.4 4.4 4.4 4.4

Call-Stack Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . Data-Stack Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . Counting Properties . . . . . . . . . . . . . . . . . . . . . . . . . . Grammatical Representation of Data-Stack Behaviour . . . . . . . Program Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . Safety and Liveness in Eagle . . . . . . . . . . . . . . . . . . . . Call- and Data-Stack Behaviour and Counting Properties in Eagle Trace-Length Restricted Operators (Mixfix Operators) in Eagle . Conditional Concatenation in Eagle . . . . . . . . . . . . . . . . Data-Stack Behaviour in Model-Checking . . . . . . . . . . . . . . Counting Properties in Model-Checking . . . . . . . . . . . . . . . Balanced Grammar over Finite Words . . . . . . . . . . . . . . . . Quasi Balanced Grammar over Finite Words . . . . . . . . . . . . ωRG(qBG)+h representation of an ωVPL . . . . . . . . . . . . . . Embedded 1 : 1- and 1 : 2-Matching Behaviour . . . . . . . . . . . The Role of Surrogate Terminals in Complementation . . . . . . . Surrogate Terminals in Complementation Revisited . . . . . . . . . Verification of Data-Stack Behaviour . . . . . . . . . . . . . . . . .

7

23 24 25 40 45 46 49 50 51 88 89 100 101 107 117 122 129 133

List of Definitions, Lemmas, Theorems and Corollaries Definition 2.1: Context-Free Grammars, [HMU01] . . . . . . . . . . . .

40

Definition 3.1: Syntax of Eagle, [BGHS04b] . . . . . . . . . . . . . . .

53

Definition 3.2: Semantics of Eagle, [BGHS04b] . . . . . . . . . . . . .

54

Definition 3.3: Trace-Checking, [BGHS04b] . . . . . . . . . . . . . . . .

55

Theorem 3.1: Sequential Composition in Terms of Concatenation . . . .

57

Lemma 3.1: Decidability of Trace-Checking the Empty Trace . . . . . .

59

Theorem 3.2: Concatenation in Terms of Sequential Composition . . . .

60

Definition 3.4: Syntax of Eagle with Mixfix Operators . . . . . . . . .

62

Definition 3.5: Semantics of Eagle with Mixfix Operators . . . . . . .

62

Theorem 3.3: Maximal Trace-Length Operators are Syntactically Expressible in Eagle . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Theorem 3.4: Minimal Trace-Length Operators are Syntactically Expressible in Eagle . . . . . . . . . . . . . . . . .

68

Definition 3.6: Runtime-Verification Calculus of Eagle, [BGHS03] . . .

71

Definition 3.7: Runtime-Verification Calculus of Eagle with Mixfix Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

Theorem 3.5: Semantical Equivalence between Eagle’s Logic and Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

Theorem 3.6: Trace-Composition Operators with O(|σ|2 ) Complexity .

81

Theorem 3.7: Trace-Composition Operators with O(|σ|) Complexity . .

82

Definition 4.1: B¨ uchi-Automata, [B¨ uc62] . . . . . . . . . . . . . . . . .

92

Definition 4.2: Muller-Automata, [Mul63] . . . . . . . . . . . . . . . . .

92

Definition 4.3: Visibly-Pushdown Automata, [AM04] . . . . . . . . . . .

94

Definition 4.4: Pseudo-Runs of ω-Visibly Pushdown Automata, [AM04]

95

Definition 4.5: Balanced Grammars over Finite Words, [BB02] . . . . . 100 Definition 4.6: Quasi Balanced Grammars over Finite Words . . . . . . 101 Expressive Equivalence of Balanced and Quasi Balanced Grammars . . 102 8

Definition 4.7: ω-Regular Grammars . . . . . . . . . . . . . . . . . . . . Definition 4.8: ω-Regular Grammars with Injected Quasi-Balanced Grammars under a Morphism . . . . . . . . . . . . . . . . . . . . . . . . . . Definition 4.9: Monadic Second-Order Logic with a One-Successor Relation, [B¨ uc62] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition 4.10: Monadic Second-Order Logic with a Call/Return-Matching Relation, [AM04] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lemma 4.2: Quasi Balanced Grammars under a Superficial Mapping are Expressible in Monadic Second-Order Logic of Minimally Well-Matched Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition 4.11: Immediate Matching Context-Free Grammars . . . . . Lemma 4.3: Visibly Pushdown Automata of Minimally Well-Matched Words are Expressible by Immediate Matching Context-Free Grammars Definition 4.12: Translation from Immediate Matching Context-Free Grammars to Balanced Grammars with a Superficial Mapping . . . . . Lemma 4.4: Translational Correctness of Definition 4.12 . . . . . . . . . Theorem 4.1: Language Equivalence of Matching-Avoiding ω-Regular Languages with Injected Quasi Balanced Languages under a Superficial Mapping and ω-Visibly-Pushdown Languages . . . . . . . . . . . . . . Corollary 4.1: Language Equivalence of Matching-Avoiding ω-Regular Languages with Injected Balanced Languages under a Superficial Mapping and ω-Visibly-Pushdown Languages . . . . . . . . . . . . . . . . . Definition 4.13: Injector and Injected Languages . . . . . . . . . . . . . Definition 4.14: L-Factor Avoidance . . . . . . . . . . . . . . . . . . . . Theorem 4.2: Closure of ωRL[BpfCFL] under Union, Intersection and Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

103 104 109 110

111 114 114 114 115

115

116 120 121 131

Abstract Software is constantly determining and influencing our lives and considerable advances are being made in the medical and military sectors as well as in space exploration. Although the software development in these sectors undergoes extensive testing, examples exist where such systems have failed and led to the loss of human lives or put tax-payers money at risk. In this thesis we address the verification of system behaviours against behavioural specifications that permit the detection of software bugs. A system’s behaviour denotes sequences of system actions, where a single sequence – also known as a trace – stands for a particular execution of the system, and all traces of the system constitute its model. As such, traces denote behaviours, e.g. matchings between calls and returns of imperative program behaviour (call-stack behaviour), that can be expressed in specifications by operators that permit the composition of traces. We are studying the behavioural properties expressible by trace-composition operators for trace- and model-checking. For trace-checking, i.e. the verification of a trace against a specification, we consider the runtime-verification logic Eagle. Eagle defines two tracecomposition operators, where it is an open question whether one of the operators might be obsolete. We show that both operators can be expressed in terms of each other and we provide translations in both directions. For the efficient verification of traces variants of the composition operators are then introduced whose evaluation can be carried out deterministically. We prove that the deterministic variants do not increase Eagle’s expressiveness, but they can be evaluated more efficiently in Eagle’s calculus, where a complexity analysis is provided for all trace-composition operators in Eagle. For model-checking, i.e. the verification of a system’s model against a specification, we examine behavioural properties and we introduce formalisations of datastack behaviour and counting properties. We show that these behaviours cannot be described nor verified by current model-checkers. We present a grammatical representation of the current automata-theoretic model-checking approach that is based on syntactical structure of Eagle formulas. From this grammatical representation we derive a language theoretic representation, which we then extend so that model-checking data-stack behaviour and counting properties is decidable. 10

Declaration No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

11

Copyright i. The author of this thesis (including any appendices and/or schedules to this thesis) owns any copyright in it (the “Copyright”) and s/he has given The University of Manchester the right to use such Copyright for any administrative, promotional, educational and/or teaching purposes. ii. Copies of this thesis, either in full or in extracts, may be made only in accordance with the regulations of the John Rylands University Library of Manchester. Details of these regulations may be obtained from the Librarian. This page must form part of any such copies made. iii. The ownership of any patents, designs, trade marks and any and all other intellectual property rights except for the Copyright (the “Intellectual Property Rights”) and any reproductions of copyright works, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellectual Property Rights and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property Rights and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and exploitation of this thesis, the Copyright and any Intellectual Property Rights and/or Reproductions described in it may take place is available from the Head of School of School of Computer Science (or the Vice-President).

12

Acknowledgements I would like to thank everyone who made my PhD studies a memorable experience. Especially, I would like to thank my supervisor Howard Barringer for the support that I have received from him and for being patient with me when it was needed. I am also grateful for the help given to me by my advisor David Rydeheard, which I very much appreciated. I am also grateful to my parents, who gave me enough freedom to explore the world, even when it may not always have seemed a sensible thing to do. I thank my late father for being the person he was and I thank my mother for the financial support that made my life considerably easier. I wish many thanks to Juan Antonio Navarro-P´erez who was an astonishing colleague to work with, who always had time for some interesting discussions, but more importantly, who became a close friend of mine. I would also like to thank Elton Ballhysa, whom I am not only good friends with too, but whose determination about making decisive choices deserves my honest admiration. I also would like to thank Jonathan Davies and Sarah Williams for giving me inspiration and for encouraging me to broaden my horizon. I am also more than thankful that Sarah took the burden of proof-reading this thesis and for providing me with valuable feedback. Finally, I would like to thank the EPSRC for providing me with a studentship in the first place, without its financial support I would have not been able to conduct this study.

13

Was lange w¨ahrt, wird endlich gut.† German proverb



It was worth it in the end. [CT98]

14

Chapter 1 Introduction Mankind’s creativity is perhaps the driving force behind our technological success, with considerable advances being made in the medical sector, in the military sector and also in space exploration. The rapid development cannot exclusively be credited to engineering though; it has also to be credited to management efforts which created a logistical basis for the creation of vastly complex computer systems. Examples of such systems are the Therac-25 radiation therapy machine, the MIM-104 Patriot missile defense system and the Mars Exploration Rovers Spirit and Opportunity. Whilst the Therac-25, MIM-104 and Mars Exploration Rover Spirit are certainly great inventions of our time, they also cast a dark light on engineering since each of these systems was put into use with software errors that were undetected prior to their release. The consequences of these errors in order of their severity were: • three people died after accidentally receiving severe radiation overdoses by Therac-25 radiation therapy units, [LT93], • 28 people died due to an unintercepted Scud tactical ballistic missile near the end of the Persian Gulf War, where the missile was spotted but neglected by a malfunctioning MIM-104, [YD02, p.42], • the Mars Exploration Rover Spirit entered an indefinite loop in which the system was continuously reset, until NASA managed to install a software patch more than two weeks later after the first occurrence of the problem, [RN05]. 15

16

CHAPTER 1. INTRODUCTION

It is disputable whether any of these errors could have been spotted during the development phase of the systems, but it is unquestionable that the benefits of detecting and removing them would have saved human lives in the first two cases. The presence of the errors given in the examples is rather surprising, since medical, military and space/avionics systems are expected to be particularly fail safe due to the potentially disastrous outcomes on human lives or financial budgets as just described. Considerable investments, both in time and money, are made in the development of these systems to ensure that the systems operate as expected, or in other words, in a way that conforms with the system’s specification. We are taking a closer look at the previous examples, where we highlight why these systems actually failed, so as to establish cases to which we come back when addressing various verification techniques later in this chapter. Therac-25: Units failed due to an improbable but possible input sequence of commands, due to a race condition, and also due to an arithmetic overflow. We cannot be certain, but the first error seems to have been overlooked by the vast number of possible input combinations, which simply cannot be tested manually in a reasonable amount of time. The race condition was caused by specific input sequences too, which only occurred when the sequence was entered too quickly. Since it took the operators of the Therac-25 some time to get used to the machine, this error did not show up immediately when the system was first released. Finally, the arithmetic overflow error turned out to be the only straightforward programming mistake, which was based on setting a Boolean variable to the constant “true” by incrementing it by one.1 This resulted in the repetitive setting of the variable to “true” eventually leading to an overflow. MIM-104: The failure of the Patriot missile radar system was caused by the accumulation of a rounding error that was inherent in the calculations of the system as it was implemented, [Mar92]. Even though this error was present as soon as the system was started, i.e. rounding errors would accumulate immediately, the rounding error only became a significant problem after a couple of hours of system runtime. Even then the error would still not be apparent immediately, since the system would seem to work smoothly, but unfortunately, incoming missile tracks would be computed incorrectly, so that after the first missile was spotted 1

Non-zero values of the value were interpreted as the Boolean constant “true”.

17 the wrong part of the sky would be searched for the initiation of intercepting counter measures. Such a “lost” missile was then simply regarded as a spurious track by the MIM-104 and ignored.

Mars Exploration Rover Spirit: After 18 sol2 into its mission, the Mars Exploration Rover Spirit entered a reboot-loop that was caused by the rover’s internal safety system which detected a suspended thread. Since the system was designed so that no thread would ever be suspended, this was considered a fatal error, where it was assumed that a reboot would bring the system back to a safe operating state. Each time, a thread suspension would be caused by a memory allocation failure in the file system management module, which tried to expand its in-memory copy of the external flash memory’s directory structure. Due to the way the file system was implemented, the directory structure could only grow larger, because the deletion of files would not remove a directory entry, but rather flag the entry as being deleted so that its place could be taken by a new entry later on. After a reset, the file system of the flash memory would be mounted again, and thus, leave only little RAM available so that the next operation that increases the directory structure would again lead to a system failure. While these are the symptoms of the error, the actual cause were two wrongly set system parameters: dynamic memory allocation was not forbidden and there was no upper bound set for the maximum amount of memory that can be allocated by a thread. These examples illustrate that even in areas where systems are developed with greatest care, it is not a simple matter to eliminate all errors during a system’s development since each conditional choice of a computer program stands for two possible actions that the software can take during runtime. The actual action that is then observed whilst a program is running is part of the software’s behaviour, which accumulates all actions that are observable by any execution of the program. It is simply impractical to exhaustively evaluate the behaviour of a system by manual inspection, there exist verification techniques that automate the process.

2

1 sol denotes one martian day, which is 39 minutes 35.244 seconds longer than a day on earth.

18

1.1

CHAPTER 1. INTRODUCTION

Verification Techniques

In order to verify the correct functioning of a system, the behaviour that is exhibited while it is in operation is compared to an a priori defined specification of the system’s purported behaviour. A whole range of techniques are available to undertake the comparison, each of which allow us the verification of certain behavioural properties, in a sense that each verification technique addresses a specific verificational aspect of the behaviour of the system under inspection. In the following, we are providing an overview of prominent verification techniques: software testing, runtime-verification, model-checking and theorem proving. Software Testing Of all the verification techniques presented here, software testing is the one that is the most loosely defined. Software testing involves the execution of the system under inspection, which is then fed with random data or predefined inputs, whilst a human or computer agent matches the program’s actual output to the program’s predicted or expected output, [Jor02]. Discrepancies between the outputs have then to be investigated manually, where the observed behaviour has to be traced back to the corresponding program code by an engineer. Runtime-Verification The runtime-verification approach is similar to the software testing approach, but permits a deeper analysis of the system under inspection. Rather than simply monitoring the system’s output, the program code is augmented with monitoring code which allows a runtime-verification tool to constantly inspect a program’s behaviour against it’s purported behavioural specification, [Dru06]. Runtimeverification is a fully automatic technique, so that no user interaction is required during the actual verification process. There also exist efficient implementations of runtime-verification tools that can be executed while the system is deployed, where a specification violation would trigger a reset mechanism that brings the system back to a safe operating state.

1.1. VERIFICATION TECHNIQUES

19

Model-Checking In model-checking, the complete behaviour of a system is validated against its purported specification. For hardware systems, which are physical implementations of finite-state automata, model-checking can be applied directly, and in the case of computer programs, the model-checking is performed on an abstraction of the original program, [BBF+ 98, CGP99]. The restrictions imposed on the abstracted program are mainly the approximation of arithmetic as propositional values and a simplistic view on memory access. Just as for runtime-verification, model-checking can be carried out as a fully automatic process, where a specification violation usually causes the verification process to stop immediately, so that a counter-example can be presented to the engineer for eradication of the violation. Theorem Proving Theorem proving addresses all behaviours of a system as well, but the correctness of the system is established by proving that manually inserted pre- and postconditions in the program code are satisfied, [Oui08]. Since there are theoretical restrictions on the provability of programs, automatic verification can only be carried out for programs that are simplified in a certain way. This is comparable with the abstraction that takes place in the case of model-checking, otherwise the proof is restricted to partial correctness, which means that it is not guaranteed that the verified program code will terminate, but if it terminates then the proven properties hold.

1.1.1

Applicability

Each of the presented verification techniques is suited for detecting certain errors in systems, but none of the techniques can be seen as an omnipotent verification approach. For each of the systems presented at the beginning of this introduction, we give an example of a suitable verification technique that could have led to the detection of the errors before the systems were deployed. Therac-25: With the current state-of-the-art abstractions that are used in model-checking, the arithmetic overflow in the Therac-25 units could have possibly been detected and there are also model-checking approaches that allow the

20

CHAPTER 1. INTRODUCTION

detection of race conditions, which were also present in the Therac-25 units. MIM-104: The rounding error in the MIM-104 could probably have been detected using theorem proving, assuming that the requirements imposed on the calculations would have addressed it appropriately, but the error would have most likely not been detected in a model-checking approach due to the abstraction of arithmetic operations. With the runtime-verification approach, it might have been possible to track the accumulated rounding error. Additionally, checking could have been against a specification that requires it to fall into a certain interval and upon violation of this specification, the monitoring runtime-verifier could have then signalled a malfunction to the operator of the machine.

Mars Exploration Rover Spirit: For the Mars Rover Spirit, it is actually known that the system was inspected using a software testing and model-checking approach, [GHJX08]. However, it turned out that the software long-term test of 10 sol was not long enough, since the error which led to the reboot cycle appeared after 18 sol, and the model-checking approach failed in the sense that the used model-checking tools did not generate usable abstractions or did not scale very well. As described in [GHJX08], one solution to overcome difficulties with the abstraction of the software was to execute the flight software whilst performing a home made model-checking approach. This is, in fact, an excellent practical example of runtime-verification even though it did not lead to the detection of the error, since the detection of the suspended thread was observed and interpreted in such a way that the system was triggered to enter a safe state by performing a system reset. It was however not enough to perform a system reset, whereas a more sophisticated repair mechanism could have fixed the Mars Rover’s software bug.

The common assumption that computer programs are expressively equivalent to Turing machines makes their verification theoretically impossible, since behavioural properties, such as safety constraints, can be used to formulate reachability constraints which are reducible to the undecidable halting problem of Turing machines. As a consequence, each of the presented verification techniques is a compromise between the expressiveness of program behaviour and expressiveness of verifyable properties.

1.2. FORMAL VERIFICATION

21

Another factor that has high practical relevance is the suitability for automation, since user interaction during the verification process would require that certain man hours be invested. All of the presented verification techniques allow us to carry out an automated verification to a certain degree. However only the runtime-verification and model-checking approach are fully automated per se, i.e. after being presented with a behavioural trace or an abstracted behavioural model and purported behavioural specification, the runtime-verification or modelchecking can be carried out autonomously.

1.2

Formal Verification

Formal verification describes the area of verification techniques where the verified system, as well as its purported specification, are described formally in welldefined syntax and semantics and where the verification itself is undertaken by an algorithm that itself is well-defined. A formal model of a system is understood to describe the system’s semantics, so that the model permits the derivation of the system’s behaviour as it can be observed by an external agent during the system’s execution. Behaviours or parts of behaviours that follow a certain behavioural pattern, but are generic in the sense that the pattern can be parametrised to match specific behaviour, are referred to as behavioural properties. The study of behavioural properties and system behaviours that are typical for certain systems constitute the foundation of formal verification, as building blocks for the composition of larger behavioural specifications.

1.2.1

Verifiable Properties and Behaviours

While specifications can be formulated freely, certain structures of specifications have been studied in the research literature. By taking an isolated view of certain kinds of specifications, it is possible to investigate the behaviour they are describing, as well as undertaking complexity studies which are relevant to practical applications of verification. The earliest approaches to the behavioural verification of programs were concerned with establishing semantical correctness of programs, proving that their

22

CHAPTER 1. INTRODUCTION

executions would terminate and checking for program equivalence from a behavioural perspective, [FL79]. Programs would commonly be described by abstractions of the basic constructs of imperative programs, such as atomic actions, a non-deterministic choice-operator and a conditional loop-operator, but behavioural interpretations of hardware systems were also considered, [HMM83a, HMM83b]. Other work focused on interactions in concurrent systems, which is of particular practical interest due to the complex interactions that can occur in these systems, [Lam77, Lam79, OL82, BKP84]. While the areas that have been addressed by the various approaches vary considerably, they have set the foundations for studying interesting behavioural properties. The prominent behavioural properties that are of practical interest today, and which are still attracting research interests, [Sis94, KV01], are as follows: • safety properties: something3 bad never happens, [OL82], • liveness properties: something good eventually happens, [OL82] • fairness properties: some property holds in every execution step, [GPSS80]. Other properties establish the relation between occurrences of certain events as they can be found in the behaviour of imperative programs. The modelling of such behaviour was first undertaken in an abstract language theoretic setting in [BEM97, EKS01] and [Obd02]. Applications for checking these properties can be found in non-parallel system models, where the latter two items are defined and investigated in this thesis: • call-stack behaviour : matching between calls and returns as they are observed on the stack of procedural programs, [AM04], • data-stack behaviour : matching between data values as they are observed on data stacks or first-in/last-out queues, • counting properties: the relationship between numbers of appearances of one action to another action, for example, a sent resource request is to be followed by one or two resource grants. Examples of call- and data-stack behaviour and counting properties are given in the following, where their relationship to corresponding formal languages is 3

The specification designer has the freedom to define “something” as any possibly observable system behaviour.

1.2. FORMAL VERIFICATION

function getFilename() string d = getWorkingDirectory() concat(d,"Story.txt") return d

c3

d

r3

r1

cl os e re tu rn

r2

pe nd

b

en

a

ap

c2

op

c1

ma ge ge in tW tF or il ki en ng am Di e re ct or y co nc at re tu rn

function main() string n = getFilename() file f = open(n) append(f,"The End") close(f) return

23

time (a) Pseudo code

(b) Behaviour over time4

Figure 1.1: Illustration of call-stack behaviour shown. We start by focusing on call-stack behaviour, since it is the foundation from which data-stack behaviour and counting properties can be derived, and furthermore, call-stack behaviour serves as an excellent introductory basis of matching behaviour in the context of software programs. Example 1.1 (Call-Stack Behaviour) Call-stack behaviour is related to the behaviour of imperative programs with functions, where one can establish the relationship between the invocation of a function and the function’s return to its caller. Figure 1.1 depicts a pseudo code program whose execution starts with the function main(), and follows the program’s corresponding behavioural trace under a straightforward abstraction. The call of a function, i.e. a function’s invocation, and the return which gives the control back to the function’s caller, are coupled in a way that there is a 1 : 1-matching relation between them. This is visualised with the help of arcs as in Figure 1.1(b). Such 1 : 1-matchings are not only restricted to function call and returns, but they can also be used to describe matchings between actions such as the opening and closing of a file; the latter of which is denoted by the matching of c3 and r3 in Figure 1.1(b), since both actions can be interpreted as a call and return respectively. Even though 1 : 1-matchings establish the relationship between two actions, call-stack behaviour implicitly classifies the actions that can appear at the origin 4

In the following we are going to omit the descriptive arrow that indicates the start and continuation of time, and we only give the actions of the unabstracted program below the trace in this section.

a)

p

po

(r et ur ns

(r et ur ns

po p

a b)

b c)

c d)

d

po p

c

et ur ns

b

(r

a

(r et ur ns

d

po p

CHAPTER 1. INTRODUCTION

pu sh (a ) pu sh (b ) pu sh (c ) pu sh (d )

24

Figure 1.2: Illustration of data-stack behaviour and end of an arc into two classes, namely into a set of actions denoting calls and a set of actions denoting returns. As such, certain 1 : 1-matchings cannot be expressed as call-stack behaviour, for example, data-stack behaviour. Example 1.2 (Data-Stack Behaviour) Basic operations on a stack involve the pushing of symbols onto the stack, which can be retrieved later in reverse order by popping them off the stack again. Figure 1.2 depicts such behaviour, where a series of symbols is pushed onto/popped off the stack. The matching between the occurrences of the pushed and popped symbols can be expressed in terms of 1 : 1-matchings, which we have visualised by the arcs in Figure 1.2, where the abstraction is chosen to reflect the data as it is used in the program as program actions on the trace. Data-stack behaviour is different from call-stack behaviour in that the observed data values at the origin and the end of an arc are the same, and therefore the behaviours describe incomparable properties of a trace. The previous two examples focused on matchings between two individual actions, which allowed us two draw non-overlapping arcs between the matchings as shown in Figure 1.1 and Figure 1.2. As it has been shown in [LST95], arcs can be drawn in the described way over all words of context-free languages, so that it is straightforward that the visualisations we have chosen here are applicable to represent both call- and data-stack behaviour, as these behaviours are within the context-free languages. When considering counting properties, or 1 : n properties as we also called them, it is beneficial to consider another form of visual representation, one where arcs are still non-overlapping, but multiple arcs are permitted

1.2. FORMAL VERIFICATION

25

1 : 1matching

r

g

g gr an t

g

gr an t

r

1 : 2-matching

re qu es t gr an t re qu es t

(1)

Figure 1.3: Illustration of 1 : 1- and 1 : 2-matchings, i.e. counting properties to originate from the same position within a word. We are not proposing a new formalism to visualise matchings here, especially as our representation opposes the findings of [LST95]. However, the visualisation we have chosen in Figure 1.3 to represent 1 : 1- and 1 : 2-matchings highlights that the action r is followed by 1 or 2 occurrences of g, respectively. The next example elaborates this behaviour further. Example 1.3 (Counting Properties) Software implementations of communication protocols or asynchronously operating program threads do not necessarily embody strict 1 : 1-matching behaviour besides the obvious call- and data-stack behaviours. Instead one may observe a loose matching pattern among observable actions. In Figure 1.3, we depicted such a loose matching pattern, where a sent out resource request is followed by either one or two granted permissions to access the resource. More concretely, r could stand for a client’s print-request that is sent out to a server that has control over a cluster of printers, where a g represents the server’s affirmative response with a handle to an available printer. A practically relevant specification requirement could then be that a request is always successful, since the server could operate printing queues, but no more than two grants are sent out per request. Limiting the maximum occurrence of actions g that can follow after one occurrence of a request r is a regular property. In other words it can be encoded by a finite-state machine which switches states to count the number of occurrences of g, so that in the case of 1 : 2-matchings behaviour the third g that follows an r causes the machine to enter a failure state that signals a violation of the specification. This is a practical approach if we only consider one client, but a

26

CHAPTER 1. INTRODUCTION

rather simplistic approach to an actual client/server related program. By nesting requests and grants, for example by assuming there are further matchings in the gap marked by (1) in Figure 1.3, the matchings between requests and grants become a context-free property, and as a consequence, the counting can no longer be carried out by a finite state automaton. From a practical point of view, the verification also has to be efficient in the sense that the algorithm does not only produce an answer to the verification problem after a certain time, but also does not need too much time or memory to complete.

1.2.2

Complexity and Decidability

Verification algorithms have to be efficient in that the required resources such as time and memory requirements are in a reasonable relation to the benefits of the verification. From a na¨ıve point of view, the verification of microprocessor systems and the software thereof, should be easy. Since computers are only equipped with finite memory, we can interpret the memory changes, as observed with every clocked step of the processor, as a finite state automaton. While this is certainly true for hardware designs, it is contrary to the common view that software programs are potentially as expressive as Turing machines. This discrepancy is due to the easier design and implementation of software in expressively rich models, since the formulation of something as simple as a counter, in terms of a finite state process, is rather cumbersome. For verification purposes, the view of computer programs as Turing machine programs cannot be maintained, because the verification of certain properties would reflect a solution of the halting problem of Turing machines, which is undecidable. It is therefore desirable to choose a formalism to describe system models, which in conjunction with the formalism used for the specification is • expressive enough to allow us the modelling of behaviour as general as possible, and • constrained in such a way that there exists a verification algorithm that is also decidable. A decidable verification algorithm enables the formal verification in theory, but for practical purposes the complexity of the algorithm is of great interest as

1.2. FORMAL VERIFICATION

27

well. As described earlier, the finite memory that is present in computer systems permits only a finite number of possible system states, so that the whole system can be described as a finite state automaton. However, the verification of this automaton is non-trivial due to the vast state space it possesses. The two major verification approaches that we address in this thesis tackle the problem differently: runtime-verification addresses the verification of a system’s on-line behaviour, i.e. the behaviour as observed during the system’s execution, and therefore does not require a reduction of the state space, as the verification happens as the states are traversed. Model-checking addresses the verification of all the system’s behaviours off-line, so that the original system has to be abstracted in such a way that the verification is still feasible, and yet, still correct. For runtime-verification, where the specification is validated whilst the program under inspection is executed, the challenge is to keep the actual verification process efficient. This means that with each step of the program, the verification must also advance to a new resolute state, one where the evaluation cannot rely on costly approaches such as backtracking to determine truth values for past time events. In the case of model-checking, the state space of a real system can be enormously large, even when modelling simple hardware systems or short computer programs. Specifications however do not address every aspect of the system under investigation, and, as such, describe only certain behaviours in the system and neglect others. As such, it is sufficient to consider an abstract system model, which features the behavioural aspects that are addressed in the specification whilst other behavioural aspects are discarded from the abstract system model, so that the state-space is effectively reduced by orders of magnitude. Special encodings of the abstracted system model lead to further reductions of the memory required to represent the abstract system model. This allows us to model-check extremely large state-spaces, [BCM+ 90].

1.2.3

Runtime Verification

Runtime verification focuses only on one observed behaviour of a system at a time, which is of special interest to software programs where the verification can take place during the execution of the program. In the latter case, we speak of on-line monitoring and assume that the violation of the specification would initiate an error recovery procedure, which is not further discussed in this thesis.

28

CHAPTER 1. INTRODUCTION

In on-line verification an efficient evaluation of whether the specification is violated at some point is paramount. The resource overhead due to the additional verification has to be so small, that the application under inspection can function and can be used normally. Since the specification is evaluated as events from the program’s execution are observed, no assumptions about the occurrence of loops or other structures leading to the observed behaviour can be made. This means that the fulfillment of a specification can only be made after the observation comes to a stop which means the program terminates.

Expressiveness Specifications for runtime verification have to evaluate efficiently, and therefore they are commonly formulated in logics for which an efficient on-line monitoring calculus exists. This does not imply that the logic itself has to be restricted in its expressiveness. In fact, the runtime verification logic Eagle, which is studied in this thesis, is expressively equivalent to a Turing-machine. It is therefore not possible to check if an arbitrary Eagle specification is satisfiable, i.e. whether there is any model which would satisfy the specification, because this would be an answer to the halting problem of Turing machines.

1.2.4

Model Checking

Model-checking is a branch of formal verification, where it is determined whether the behaviour of a given system model satisfies a given specification. In case the system behaviour does not fulfil the specification requirements, a concrete counter-example is provided that clarifies which part of the specification is violated. Applications of model-checking are mainly found in the verification of nonterminating systems with parallel processes. A typical example would be the verification that a certain system is deadlock free. However, recent developments in model-checking are directed towards expanding the class of systems that can be verified, without particularly focusing on carrying forward previous results on efficient model-checking algorithms or model-checking parallel processes, [AM04, AM06, ACM06, Cau06].

1.2. FORMAL VERIFICATION

29

Interpretation of Time Behavioural properties are temporal properties which reflect propositions that may change over time. Time itself can be interpreted in various ways, which changes the semantics of the behavioural properties we are addressing. The commonly used notions of time assume that the flow of time is either discrete or continuous, and that the flow is either linearly progressing or allowing for branches of possible alternate futures. For describing the behaviour of software as well as digital hardware, it is common to favour the discrete time approach, which coincides with the clocked or synchronous behaviour of these systems. The interpretation of whether time proceeds in a linear or branching way is not only of theoretical interest, but also of heated and ongoing debate in the computer science community. The main arguments are that linear time is more easily accessible when it comes to specifications, but leads to high computational complexity in verification. In contrast, branching time is less accessible for engineers formulating the specifications, but is verifiable at relatively low computational complexity. For the systems in question in this thesis, we restrict ourselves to discrete and linear time flows, since they intuitively resemble the control-flow of actual systems which are typically deterministic, and we can establish a stronger connection to formal languages to compare the expressiveness of our approach. Expressiveness When considering certain formalisms to describe system models and specifications, we can compare their expressiveness, i.e. their expressive power in terms of which systems are actually describable and which properties can be stated in specifications. The behaviour of a system is represented as a set, which contains all observable behaviours of the system’s executions. The behaviour of an execution is given by snapshots of the system’s current-state interpreted over a discrete time-flow. A state reflects a set of atomic propositions that are true at the particular time. Time is usually interpreted as being either linear or branching, i.e. a state may either lead to a single successor state or multiple successor states respectively. Here, we adopt the linear-time approach, and as such, a system’s behaviour can be viewed as a set of sequences over sets of atomic propositions.

30

CHAPTER 1. INTRODUCTION

In order to carry results from formal language over to model-checking, it is common practice to refer to the sets of atomic propositions by uniquely assigned letters, so that the sequences which represent behaviours become words over the latter letters. When the system behaviour is interpreted in this way, then modelchecking is a synonym for language inclusion in formal languages, [FL79, HPS81, Wol83]. In formal language theory, the language inclusion and related problems have been studied rather extensively, so that the results of this field can be carried forward to model-checking immediately. Infinite word languages are classified in a similar way to Chomsky’s language hierarchy for finite word languages, where the correspondences are straightforward, [CG77a, CG77b]. More importantly, we can utilise the classification to infer expressiveness results for model-checking, i.e. determine the kind of properties that are expressible for certain classes of systems and specifications. In the following we use the prefix ω to clarify that we are referring to a language class over infinite words, e.g. we are referring to the regular languages over infinite words as being ω-regular. Most work on model-checking has been done for system models and specifications that are both ω-regular. ω-regular system models are often represented by B¨ uchi-automata, [AS87, BCM+ 90, Sis94, Esp97, BJNT00, EH00, KV01],5 whereas ω-regular specifications are typically formulas of a temporal logic, [FL79, OL82, Wol83, BB86, BCM+ 90, AEM04]. ω-regular specifications are of particular practical importance since they are suitable for verifying properties such as those we mentioned earlier: safety, liveness and fairness. These properties are of special importance in the verification of systems with parallel processes, where the interplay of processes makes a manual verification very difficult and testing various executions of the system are insufficient as a proof of correctness. There are also model-checking applications for non-parallel systems; we will return to this point later. When concerning system models which represent procedural computer programs, e.g. software written in the programming languages C/C++/C# or Java, system models in form of B¨ uchi-automata is insufficiently expressive for representing the behaviour of procedure calls. The behavioural modelling of procedures requires a call-stack which is a context-free property expressible by pushdown automata. 5

In [Sis94], B¨ uchi-automata are simply referred to as “finite state automata”.

1.2. FORMAL VERIFICATION

31

Model-checking ω-regular properties is indeed feasible for ω-context-free system models and can be done rather efficiently, [BS94, BEM97, EHRS00]. However, when one wants to verify arbitrary ω-context-free properties of the system model, i.e. behavioural properties which appear in ω-context-free models but not in ω-regular models, then model-checking becomes undecidable, [CG77b]. From a practical perspective, ω-context-free system models are too expressive. For example, calls and returns of procedural computer programs are reflected by a simple context-free matching structure, which is expressible by a subset of the ω-context-free languages. These matchings are describable by the languages of visibly pushdown automata, which are properly included in the ω-context-free languages but more expressive than the ω-regular languages, and more importantly, their model-checking problem is decidable, [AM04]. As such, visibly pushdown automata allow us to give equi-expressive behavioural specifications of system model behaviours, which include context-free properties such as call-stack behaviour. However, we show in the remainder of this thesis that other important behavioural properties such as data-stack behaviour and counting properties are not expressible by visibly pushdown automata. We then proceed to give an extension which permits the model-checking of these properties, where we rely on compositional aspects of these behaviours.

1.2.5

Trace Composition

Call- and data-stack behaviour and the counting properties, which we introduced in examples 1.1, 1.2 and 1.3, are behavioural properties over finite words. The arcs we drew over matchings emphasise this, since every arc only spans over a finite number of actions. In order to express such behaviour we consider the composition of behavioural properties in specifications, so that the composition of their behaviours assembles a system specification. For example, data-stack behaviour may appear in a behavioural producer and consumer pattern, where a number of producer processes push data onto a stack, which is latter popped off the stack completely by a single consumer process. This behaviour can be expressed in terms of trace composition, which we have depicted in Figure 1.4 on the next page, where we write x · y to denote that the occurrence of the action x is followed by an occurrence of the action y later on the trace. The brackets (1) and (2) in the figure denote how a trace-composition operator can be used in a recursive definition to express matching behaviour and bracket

32

CHAPTER 1. INTRODUCTION

a

|

b

c

d

d

c

b

a

b

d

d

b

{z } (1) | {z }(2) DataStack → ε or DataStack → ε or DataStack → a · DataStack · a or DataStack → a · DataStack · a or DataStack → b · DataStack · b or DataStack → b · DataStack · b or DataStack → c · DataStack · c or DataStack → c · DataStack · c or DataStack → d · DataStack · d or DataStack → d · DataStack · d or | {z } (3) . . . · DataStack · DataStack · . . .

Figure 1.4: Illustration of trace-composition behaviour (3) shows an application of trace composition for the specification of repetitive behaviours.

1.3

Thesis Outline

In the following chapters we focus on the expressiveness of runtime-verification and model-checking. We set the context of our work in chapter 2, where we give a general introduction to formal languages, their respective formalisms for generating/recognising sequences and their corresponding relationships to runtimeverification and model-checking terminology. Chapter 3 addresses the runtime-verification logic Eagle, where we address the expressiveness of Eagle’s two trace-compositional operators “sequentialcomposition” and “concatenation”. These binary operators split the trace under inspection, where each of the operator’s arguments is evaluated on a sub-trace respectively. We address the open problem of whether sequential composition and concatenation are equiexpressive and we prove that this is the case. Furthermore, we give translations in both directions, where the formulas involved are part of the guarded fragment of Eagle, which is the subset of Eagle that is covered by Eagle’s implementation as a runtime-verification tool. Next, we focus on the non-deterministic semantics of the trace-compositional operators, which means

1.3. THESIS OUTLINE

33

that the complexity of evaluating these operators is non-linear in the size of the trace under inspection. In order to overcome this problem, deterministic variants of the trace-compositional operators are introduced. We show that they do not increase Eagle’s expressiveness, where we provide translations of the new operators into (the unextended) Eagle. We then carry on and extend Eagle’s runtime-verification calculus on which Eagle’s implementation is based on, and we prove that some of the new operators can be evaluated more linearly in the size of a trace under inspection. These results have already been presented in [BB08]. In Chapter 4 we turn towards model-checking, where we build upon Eagle’s rule-based rewriting techniques. Rule-based rewriting techniques are represented by grammars in the area of formal languages, which we utilise to formulate a grammatical representation for the recently formulated language class of ω-visibly pushdown languages. The latter permit the specification and verification of callstack behaviour as it can be observed in the behaviour of imperative programs written, for example, in C,C# or Java. We have published this grammatical representation already in [BB07]. We then take grammatical representation as a basis for extending out work on model-checking to a language theoretic approach. We introduce new interesting verification properties, such as data-stack behaviour6 and counting properties and illustrate their practical relevance by providing appropriate examples. We then prove that said properties are not expressible by ω-visibly pushdown languages, but that they can be model-checked by a language extension that we give. We therefore extend model-checking beyond the expressiveness of ω-visibly pushdown languages to a larger class of languages, which cover practically relevant properties in a verification context. Finally, Chapter 5 summarises our work, where we recall the aims and achievements of this thesis. For both runtime-verification and model-checking, we are also pointing out potential further work that could build upon the results presented here.

6

Data-stack behaviour permits the pushing/popping of arbitrary symbols onto the stack, whereas call-stack behaviour matches symbols representing calls to symbols that are representing returns.

Chapter 2 Formal Languages and Formal Language Representations The verification of system behaviour against purported behavioural specifications is closely linked to the study of formal languages, since behavioural traces can be interpreted as sequences of letters which form formal language words. We have depicted the straightforward translation between a trace and a word in Figure 2.1, where every action simply becomes a letter and vice-versa, whilst the time at which an action occurs is reflected by the position of the corresponding letter within the word. In formal languages, structural properties of occurrences of letters in words are studied, similar to the verification related 1 : 1-matchings and other properties, but it is also the study of suitable representations to express particular properties. Language classes, which classify sets of words according to their specific structural properties, have commonly several representations in forms of various notational terms, such as automata, grammars or even logics, the latter of which

c1

c2

a

b

r2

c3

d

r3

r1

(a) Behavioural trace of Example 1.1/Figure 1.1(b)

c1 c2 abr2 c3 dr3 r1 (b) Formal language word corresponding to Figure 1.1(b)

Figure 2.1: Representation of a behavioural trace in terms of a formal language word 34

2.1. INTRODUCTION a language class

a behaviour class a model’s beh. a trace a trace a trace

35

another model’s beh. another trace another trace ...

(a) Terms in verification

a language a word a word a word

another language another word another word ...

(b) Terms in formal languages

Figure 2.2: Terminological relationship between verification and formal languages are usually not commonly associated with formal languages. Each representation serves as a blue-print for describing words and each notation is suited to express particular aspects of words in a concise manner, but more importantly, different representations allow us to take another point of view on how some properties are expressed notation wise. The latter can lead to a better understanding of the behavioural properties that we are trying to verify, e.g. how call-/data-stack behaviour and counting properties appear in words – besides other properties such as safety, liveness, etc.

2.1

Introduction

Formal languages address words that are sequences of letters, and formal language theory is concerned about how these sequences can be described by suitable notations, where comparisons are made between the expressiveness of different notations in terms of the letter sequences that are definable within them. The sequences of letters are referred to as words, which can be either of finite or infinite length, and a set of words is said to form a language. Depending on whether a language contains only finite or infinite words, we speak of a finite language or an infinite language, where we do not consider languages that contain both finite and infinite words. Words of formal languages and formal languages themselves are respectively related to behavioural traces and system behaviours, which we will address in more detail in the following.

2.1.1

Relationship of Formal Languages to Verification

Runtime-verification and model-checking address the verification of system behaviours, which are traces of actions that can be observed during a program’s

36

CHAPTER 2. FORMAL LANGUAGES Representation Grammar ω-context-free grammars, [CG77b]

Language class ω-context-free languages

Automaton ω-pushdown automata, [CG77a]

Logic monadic second-order logic with matching relation, [LST95]

ω-visiblypushdown languages

ω-visibly-pushdown automata, [AM04]

—1

monadic second-order logic with call/returnmatching relation, [AM04]

ω-regular languages

B¨ uchi- and Muller-automata, [B¨ uc62, Mul63]

ω-regular grammars2

monadic second-order logic, [B¨ uc62]

Figure 2.3: Representations of language classes by automata, grammars and logics for infinite word languages execution, which are directly related to words of formal languages as depicted in Figure 2.2 on the previous page. In the case of runtime-verification, we are addressing only a single trace, and as such, we have established the relationship between words of formal languages and traces in runtime-verification already. Nevertheless, we will later show the similarity between the runtime-verification logic Eagle, which we address in Chapter 3, and grammars of formal languages, which are introduced later in this chapter. Grammars as well as automata are wide-spread notational formalisms for the classification of formal languages, where instances of them can be used as notational tool for denoting specifications or they can be used as a visualisation aid of concrete languages. Whilst formal languages can also be expressed by logics, we only give formal definitions of two fragments of monadic second-order logic in Chapter 4, taken from [B¨ uc62] and [AM04] respectively, which will serve as a translational tool to prove the expressive equivalence between ω-visiblypushdown languages and our corresponding grammatical representation, [BB07]. Since the emphasis of our studies is about behavioural expressiveness, we do not focus more closely on alternative logical representations, except for the runtimeverification logic Eagle that we address in Chapter 3, which might as well be 1

We present a grammatical representation of ω-visibly-pushdown languages in Chapter 4. The expressive equivalence between ω-regular grammars that are given in this thesis and B¨ uchi-automata is trivial, since we are merely swapping terminology, so that states become non-terminals and state transitions are right-linear grammar production rules. A similar grammatical representation, which is less expressive than ω-regular languages though, has been given in [Wol83]. 2

2.2. FORMAL LANGUAGES

37

seen as an extension of context-free grammars. Figure 2.3 on the previous page gives an overview of the language classes over infinite words that we address in the following and it shows their formal definability in terms of automata, grammars and logics.

2.1.2

Chapter Outline

We give a formal definition of finite and infinite languages in Section 2.2 where we also define operations on languages and operations on words of languages, as well as regular expression. In Section 2.3, we introduce formal grammars and we define the grammar that forms the basis of Eagle’s rule-definitions in Chapter 3 and is used in Chapter 4 to extend the expressiveness of model-checking.

2.2

Formal Languages

We introduced formal languages, [Tho97, HMU01] and [Sta97a]/[Sta97b], as an alternative representation of traces, or more precisely, we treated the atomic actions appearing on a trace as letters, which when concatenated in the order as they occur on the trace would form a word. Formally, we have to reverse this introduction, and we define words to be the concatenation of letters taken from a finite non-empty set Σ, which is also called the alphabet Σ. The concatenation of two letters a and b, a, b ∈ Σ, is written as ab or as ba, where it is said that a precedes b in the former case and a succeeds b in the latter case. Any sequence of letters of Σ is called a word and we write w to denote an arbitrary word over the alphabet Σ. The length of a word w, which we denote by |w|, is equal to the number of concatenated letters in w, where ε denotes the distinguished empty word of length zero, i.e. the empty word consists of no letters at all. In the following, we also consider words of infinite length, by which we mean that |w| equals the first infinite ordinal ω. The concatenation of two words w1 and w2 , where w1 precedes w2 , is denoted by w1 w2 , whilst the resulting word w of the concatenation depends on the length of w1 : • w = w1 w2 in case w1 is a finite word, i.e. |w1 | < ω, and • w = w1 in case w1 is an infinite word, i.e. |w1 | = ω.

38

CHAPTER 2. FORMAL LANGUAGES

A set of words constitutes a formal language L, and as we wrote earlier, we only address languages that contain only finite or infinite words, but not both. Languages can be finite or infinite sets or words, which should be noted, but since it makes no difference in this thesis, we use the terms finite language and infinite language to refer to languages of finite and infinite words respectively. The language that contains no words is called the empty language and it is consequently denoted by ∅ in set-theoretic fashion. Languages can be concatenated as well, where the concatenation of two arbitrary languages L1 and L2 is denoted by L1 · L2 and the resulting language L of the concatenation is the set of words w1 w2 for any two w1 ∈ L1 and w2 ∈ L2 , the latter which we write as {w1 w2 | w1 ∈ L1 , w2 ∈ L2 }. Since languages are merely sets, the set operations of union, ∪, and intersection ∩, can be applied to them as well, whilst for the definition of set complementation, written as L for an arbitrary language L, we have to define the iterated application of concatenation first. The finitely repeated iteration of the concatenation operation on a language L S is denote by L+ , where it is defined by the infinite union n≥1 L · . . . · L}, with | · L {z n

n ranging over the natural numbers. The Kleene-closure L∗ is a generalisation of the former iteration, defined as {ε} ∪ L+ . Considering that the alphabet Σ can be treated as a non-empty language over finite words of length one, the complementation L of an arbitrary finite or infinite language L is given by Σ∗ \ L and Σω \ L respectively, with \ denoting set difference. Expressions that make use of the introduced operations on languages, e.g. {a}∗ · L or L1 ∩ L2 , are regular expressions, whose expressiveness coincide with the regular languages over finite and infinite words, respectively. With the formalisms presented, it is not possible to denote matching behaviour as we have introduced it earlier, since the formal presentation addressed regular and ω-regular expressions only. Matchings can be expressed by the counted iteration of symbols, where we write an to denote that a is repeated exactly n-times, where n is not bound to a certain number, but instead, it is treated as a parameter for natural numbers. Actual matchings can then be given of the form cn rn , n ≥ 1, c ∈ Σc and r ∈ Σr , which denotes the n matchings between calls and returns. In the next sections and chapters, alternative language representations are given, most of which exceed the expressiveness of the regular and ω-regular languages. For all representations, we will use the notations L(A), L(G) and L(ϕ), to

2.3. FORMAL GRAMMARS

39

denote the languages defined by an arbitrary automaton A, an arbitrary grammar G and an arbitrary logic formula ϕ, respectively.

2.3

Formal Grammars

In this thesis, formal grammars play an important role in the description of behavioural properties. Grammars were first introduced in a seminal paper by Chomsky, [Cho56], where they were used to formally describe the structure of English sentences. Grammars are word generation systems that are defined as rewriting systems over a finite number of letters and non-terminals, where production rules determine possible rewriting steps for the derivation of a word which only consists of letters. A finite set of production rules guides the rewriting process, where the rules are a mixture of letters and non-terminals. In the following, we will refer to letters as terminals, which is a term commonly used in contemporary formal language theory, [HMU01]. We are introducing the most general and most expressive grammars used in this thesis in the following, which we will be of importance: • to describe the structure of behavioural specifications in the runtime-verification logic Eagle, Chapter 3, and • to denote call- and data-stack behaviour as well as counting properties for applications in model-checking, Chapter 4.

2.3.1

Context-Free Grammars and Their Representations

A grammar generates a finite word by deriving it due to successive applications of productions rules, which denote the substitution of one non-terminal by a sequence of terminals and non-terminals determined by the grammar’s production rules. We denote non-terminals by capital letters A, B, . . . ∈ V , terminals by lower-case letters a, b, . . . ∈ Σ and possible production rules are of the form A → α, A ∈ V and α ∈ (Σ ∪ V )∗ . Any word is derived from a designated initial symbol S, S ∈ V , where we denote an actual rewriting sequence by S → aA → abB → . . . → w, with a, b ∈ Σ; A, B, S ∈ V and w ∈ Σ∗ . In the following, we write d(w) to generally denote a finite sequence S → α → β → . . . → w, α, β ∈ (V ∪ Σ)∗ and w ∈ Σ∗ .

40

CHAPTER 2. FORMAL LANGUAGES

Definition 2.1 (Context-Free Grammars, [HMU01]) A context-free grammar over finite words is a structure G = (V, Σ, P, S) , where • V is a finite set of non-terminals A, B, . . ., • Σ is a finite set of terminals a, b, . . ., • V and Σ are disjoint, • P is a finite set of productions of the form V × (V ∪ Σ)∗ , • S denotes a designated starting non-terminal S ∈ V , and The language L(G) of a context-free grammar G over finite words is denoted by the set of finite words {w | d(w)}. 

Example 2.1 (Grammatical Representation of Data-Stack Behaviour) In Figure 1.4 we depicted a trace compositional interpretation of data-stack behaviour based on Example 1.2/Figure 1.2. We can encode this behaviour in form of a context-free language: G = (V, Σ, P, S), with • V = {A, S} • Σ = {a, b, c, d} • and productions in P : S→A A → aAa A → aa

S → AS A → bAb A → bb

A → cAc A → cc

A → dAd A → dd

The language L(G) denotes the set of finite words { (wwr )+ | w ∈ Σ+ }, where we use the notation wr in order to denote the reversal of the word w. In the following, we will not explicitly state the set of non-terminals V , but instead we assume that every non-terminal which is appearing in a production of the grammar in question is also in V . Furthermore, we contract productions with same left-hand sides so that, for example, S → A and S → AS is expressed by S → A | AS.

2.4. SUMMARY

2.4

41

Summary

We gave a definition of formal languages in this chapter and we have introduced operations on languages and on words of these languages. Additionally we presented syntax and semantics of regular expressions, which will serve as a crisp notation for regular languages in the following chapters. Finally, we gave a grammatical representation of formal languages that is relevant for understanding Eagle’s rule-based rewriting rules and it will serve as the foundation for our expansion of the expressiveness of model-checking languages.

Chapter 3 Trace Composition in the Runtime-Verification Logic Eagle Runtime-verification addresses the verification of single system behaviours against their specifications, where the system behaviour is either monitored and verified during the system’s execution, or alternatively, the verification takes place on recorded traces of previous executions of the system under inspection. A trace, which represents the behaviour of a system’s execution, is denoted by a finite sequence of state snapshots of the system while it is executed. A state snapshot basically consists of the system’s variables and their bindings as they occur during runtime. The actual verification can be carried out in two modes: on-line monitoring and off-line monitoring. On-line monitoring describes the runtime-verification where the specification is evaluated in parallel to the execution of the program under inspection. In contrast, off-line monitoring is performed on recorded traces of program executions. Both operational modes have their advantages: on-line monitoring enables the immediate response to specification violations, bringing the system back to a safe state, whilst off-line monitoring can make use of all available resources to carry out the verification. As such, on-line monitoring has to be carried out efficiently, so that the monitored system is not significantly slowed down due to the verification process, or even worse, its behaviour is influenced by latencies due to resources that are taken up by the verification which is happening in parallel. 42

43

Figure 3.1: Runtime-verification overview

We focus on the runtime-verification logic Eagle, [BGHS04b], whose semantics is also formalised as an on-line monitoring calculus, where a fragment of the latter with a decidable runtime-verification problem has been implemented as a Java runtime-verification framework. An overview of the on-line runtimeverification is depicted in Figure 3.1. Eagle allows us to formulate specifications that are very close to grammatical representations. The latter is of special interest, since Eagle is equipped with two trace-composition operators that are capable of expressing grammatical rules, but it is not known whether they can be formulated in terms of each other, i.e. whether one of the operators is actually redundant. In the following, we prove that both operators for composing traces are expressively equivalent, and we provide translations that can be used in the fragment of Eagle that is implemented in the runtime-verification framework. Both trace-composition operators are non-deterministic in the sense that more than one composition of traces may satisfy their specification, which in practical runtime-verification problems can be resource expensive to evaluate, and thus, is undesirable. We introduce deterministic variants of the trace-composition operators and study their expressiveness, where we show that the semantics of the deterministic operators do not exceed Eagle’s expressiveness. We then go on to investigate how suitable the deterministic variants are for actual on-line runtimeverification by providing evaluation rules for them in Eagle’s calculus and look at the complexity in terms of evaluation steps required to verify the deterministic

44

CHAPTER 3. TRACE COMPOSITION IN EAGLE

variants on a trace. As a result, it is shown that half of the mixfix variants evaluate more efficiently than their non-deterministic counterparts, while the other half has the same evaluation complexity as the non-deterministic operators.

3.1

Introduction

Temporal logics are propositional logics with additional operators especially defined for expressing behavioural properties [Pnu77, HPS81, HKP82, Wol83, BB86, BB87]. Popular linear-time temporal logics define two operators to reason over time: “until” and “next”. The “until” operator is satisfied when a certain formula holds until a future time at which another formula becomes true, whereas the “next” operator requires that in the next moment of time a certain formula is true. For fixed-point temporal logics [Koz83, BB87], where fix-point constructors are part of the logic’s language, it is sufficient to have only the “next” operator in order to define other common temporal logic operators. Although common temporal logics, like propositional temporal logic, [GPSS80], extended temporal logic, [Wol83], and the modal µ-calculus, [Koz83], are quite expressive, they define no operator analog to the most common principle in imperative programming: sequential composition. Sequential composition allows one to glue two traces together, where the last state of the first trace overlaps with the first state of the second trace. A sequential composition formula is then satisfied at the start of a trace, if the trace can be cut into two sub-traces, overlapping as above, on which both its operands hold respectively. Even though the non-fixed-point logics propositional dynamic logic and compositional temporal logic are equipped with sequential composition [FL79, BKP84], their expressiveness does not rise beyond ω-regular properties.1 This is also the case for interval temporal logic, which is formalised over finite and infinite traces, but interval temporal logic explicitly focuses on sequential composition only in order to increase the readability of formulas [Mos97, Mos00, Mos05]. Trivially, fixed-point logics extended by sequential composition are more expressive than the popular temporal logics, i.e. they can express ω-context-free properties and beyond [MO99, LS02, Lan04]. 1

ω-regular expressiveness depends on the interpretation of propositions [CHMP81]. In this paper propositions are considered to be local, so the above statement is true. Furthermore, propositional dynamic logic is the propositional variant of dynamic logic [Pra76, Pra79], where sequential composition has been investigated already.

3.1. INTRODUCTION

45

Example 3.1 (Program Behaviour) Consider the following pseudo-code of a program that simply increments the variable i from its initial value 0 until it reaches the value 3: int i = 0; while(i < 3) i = i + 1; The behaviour of this program at runtime, which happens to be uniquely defined in this example, is expressed by the trace h{i = 0}, {i = 1}, {i = 2}, {i = 3}i, i.e. the program’s behaviour represents a snapshot of the program state at each point of the program’s execution. Specifications in runtime-verification are almost exclusively formulated in a temporal logic, where their expressiveness falls in the range from specifying basic liveness and safety properties up to Turing completeness. While the logic’s expressiveness implies the price to pay for an efficient verification algorithm, e.g. verification with a Turing complete logic is inherently undecidable, it does not imply that the verification of a specification in a very expressive logic leads to high evaluation costs. From a theoretical point of view, it is desirable to consider logics that are bound to a certain expressiveness so that upper limits on their evaluation complexity can be derived, so that worst case scenarios can be established before an actual verification is carried out. From a practical point of view, the ability to express specific requirements can be more important than the upper bound complexity that is associated with the evaluation of requirements of such expressiveness in general. Despite a high expressiveness, the memory and time required to undertake an actual verification depend on the specification itself, where the evaluation may indeed perform much better than the upper complexity bound. Eagle is Turing-complete, which is due to the possibility of formulating infinite recursions with the possibility of passing natural numbers as parameters and performing basic arithmetic operations on them. As such, it is feasible to encode a two-counter machine, alias Minsky-machine, in Eagle, which is known to be computationally equivalent to a Turing-machine. We are briefly illustrating Eagle’s syntax and semantics here, so that we can illustrate Eagle’s expressiveness by examples. Specifications in Eagle are split into two sets: monitoring formulas and rule definitions. The monitoring formulas form the actual specification, where a trace

46

CHAPTER 3. TRACE COMPOSITION IN EAGLE

adheres to the specification if all monitoring formulas are satisfied on that trace. The rule definitions are named formulas with optional parameters that can be used to define recursive formulas, or they can be used as simple syntactic sugar. Both monitoring formulas and rule definition can make use of Eagle’s operators, which are • propositional operators (Boolean operators): and (∧), or (∨), negation (¬), • temporal operators: next (#), previous ( ), and • trace-composition operators: sequential composition ( ; ), concatenation (·), and atomic formulas can be • Boolean constants: true (True), false (False), • expressions: – abstract expressions without further definition, (expression), – Presburger-arithmetic expressions over runtime-environment integer variables, e.g. x < 5, (x ∗ 3) ≤ (y + 1) or x = y, – Java expressions that are evaluated in Eagle’s Java-implementation, e.g. queue.getLength() <= 10. For the contributions of this chapter, i.e. the investigation of the expressiveness of Eagle’s trace-composition operators and their deterministic variants, we only use Presburger-arithmetic, while we use Java expressions when giving practical examples. The examples in the following will be considered to be informal, in the sense that we do not follow the exact syntax of Eagle as we formally define it later on, but we rather list monitoring formulas and rule definitions line by line. A small example is given in the following. Example 3.2 (Safety and Liveness in Eagle) Safety and Liveness properties are expressible as reachability properties, where the safety property denotes that a bad system-state is never reached and the liveness property denotes that a good system-state is eventually reached. We use the named rules Deadlock() and Terminate() for expressing that a deadlock has occurred and that the system under inspection has terminated respectively, but we do not give formal definitions in order to keep this example concise and

3.1. INTRODUCTION

47

accessible. A specification of safety and liveness properties in Eagle is then given by: min Eventually(Form F ) = F ∨ #Eventually(F ) mon Safety = ¬Eventually(Deadlock()) mon Liveness = Eventually(Terminate()) The specification is structured as follows: • min Eventually(Form F ) defines a fix-point formula with a parameter F . The fixed-point formula is satisfied, when F is true in the current evaluation step, where the evaluation is otherwise repeated on the next state on the trace, • mon Safety denotes that it is not the case that a eventually a deadlock occurs on the trace, • mon Liveness denotes that eventually the system under inspection terminates. Whilst all traces in runtime-verification are considered to be finite traces, the liveness-property is still of practical interest. For example, at the end of every trace, a propositional variable could be used to indicate whether the system stopped regularly, i.e. Terminate() evaluates to True, or whether the system stopped irregularly, i.e. Terminate() evaluates to False.

3.1.1

Verifying Trace-Composition Properties

Trace-composition properties express the relationship of two or more properties regarding their position on the trace under inspection. In this thesis, we address trace-composition operators in Eagle that have been studied in various other contexts before. Since the common temporal logics that are used in verification, e.g. propositional temporal logic, extended temporal logic and the modal µ-calculus, are not equipped with trace-composition operators, we provide some practical examples of trace-composition properties first. The effect of a trace-composition operator can be seen from two perspectives, one of which interprets how the operands are satisfied on the trace, while the more common view is to assume that the trace under inspection is actually decomposed into two sub-traces that satisfy the operands respectively. In the latter

48

CHAPTER 3. TRACE COMPOSITION IN EAGLE

Sequential composition: p

q

p

q

models p ; q and decomposes into a left sub-trace that models p and q

a right sub-trace that models q Concatenation: p

q

models p · q and decomposes into

p

a left sub-trace that models p and q

a right sub-trace that models q Figure 3.2: Illustration of trace cuts by sequential composition and concatenation case, we are speaking of the composition operator cutting the trace under inspection into two sub-traces. The various possible cuts that are possible on a trace distinguish the operators we are investigating here: Sequential composition splits the trace so that the last state of the first trace overlaps with the first state of the second trace, whilst concatenation defines the cut of the trace so that there is no overlapping part, but the two traces glued together form the original trace again (see Figure 3.2). Sequential composition and concatenation are closely related, as was already pointed out in [CHMP81] where concatenation was simply defined in terms of sequential composition.2 This cannot be done in Eagle so easily, which is shown in detail in Section 3.3. Here, we only illustrate the use of trace-composition operators by addressing the trace examples of Chapter 1, which demonstrate the expressive power of Eagle’s concatenation operator. Since the we do not make use of parametrised rule definitions, the presented languages are expressively equivalent to context-free languages. For example, a the following rule min main() = c1 · getFilename() · r1 can be considered a syntactic variant of the grammatical production Amain → c1 · AgetFilename · r1 where c1 and r1 denote terminals and Amain and AgetFilename denote non-terminals. 2

Chandra et al. referred to them as “chop” and “chomp” respectively.

3.1. INTRODUCTION

49

Example 3.3 (Call- and Data-Stack Behaviour and Counting Properties in Eagle) The examples of call-stack behaviour, data-stack behaviour and counting properties are all expressible in Eagle, where we can elegantly formulate the specifications using the logic’s concatenation operator: Call-stack behaviour: (see Figure 1.1/Example 1.1 in Chapter 1) min main() = c1 · getFilename() · r1 min getFilename() = c2 · getWorkingDirectory() · concat() · r2 min getWorkingDirectory() = a min concat() = b min open() = c3 · append() · r3 min append() = d mon CallStackBehaviour = main() The specification represents a rewriting system, where names of functions are interpreted as placeholders which are substituted by their function body behaviours. The function behaviours are modelled as sequences of program actions, where the concatenation operator separates them temporally. Data-stack behaviour: (see Figure 1.2/Example 1.2 in Chapter 1) min Stack() = Empty() ∨ a · Stack() · a ∨ b · Stack() · b ∨ c · Stack() · c ∨ d · Stack() · d We only gave an excerpt of a trace in Figure 1.2, and as such, we cannot give a complete specification of a data-stack behaviour here. However, with the rule definition min Stack(), we have expressed a specification that can match the part of the shown trace. We use the rule Empty(), which only holds on the empty trace, to stop recursions of min Stack(). For example, the evaluation of a · Stack() · a can be reduced to a · Empty() · a which in turn becomes a · a. Counting properties: (see Figure 1.3/Example 1.3 in Chapter 1) min grantConstraint() = Empty() ∨ r · grantConstraint() · g ∨ r · grantConstraint() · g · grantConstraint() · g

50

CHAPTER 3. TRACE COMPOSITION IN EAGLE

Similar to the specification of data-stack behaviour, we can express the counting property that an occurrence of a letter r is followed by either one or two matching occurrences of g by straightforward context-free grammar-like rules as shown above. Both sequential composition and concatenation are non-deterministic operators, in the sense that the operands may be satisfied by several cuts at different positions on a given trace, which means that the evaluation of those operators is potentially costly, since with every state it has to be examined whether a cut could be made so that operands are satisfied respectively. We extend Eagle with deterministic variants of the concatenation and sequential composition operators, where these variants impose a restriction on either the left or right operand so that the cut point defines either the shortest or longest possible satisfiable cut trace. We provide some examples of specifications that benefit from these new operators.

Figure 3.3: Traces of a fail-safe system Example 3.4 (Trace-Length Restricted Operators (Mixfix Operators) in Eagle) Consider a fail-safe system, that on the occurrence of an error eventually resets itself and enters a predefined “good” system state. In Figure 3.3(a), an acceptable observation trace is depicted, where “ok” denotes that the system is in a good state, “err” denotes the occurrence of an error and “rst” denotes a reset of the system. We allow a reset to occur with a finite delay after an error has occurred. We can formulate this behaviour by the following specification: max ErrHandler(Form F ) = dF e · (berr ∧ Eventually(rst)c · ErrHandler(F )) mon FailSafe = ErrHandler(Always(ok))

3.1. INTRODUCTION

51

The formulas dF1 e · F2 and bF1 c · F2 are constrained variants of the concatenation formula F1 · F2 , where the cut has to be placed so that there is no longer or shorter sub-trace satisfying F1 , respectively. For the specification above it means that a trace without erroneous behaviour is completely labelled with “ok”s. If an error occurs, i.e. a state labelled by “err” occurs, then the good behaviour resumes after a reset, i.e. “rst”. While this specification can also be written without formulas of the form dF1 e· F2 and bF1 c · F2 , specification without the new operators are not necessarily as succinct as specification that make use of dF1 e · F2 and bF1 c · F2 , and furthermore, will incur a significant monitoring cost penalty for the compositional recodings. Example 3.5 (Conditional Concatenation in Eagle) We introduce a conditional concatenation operator, bF1 c → · F2 , based on the operator bF1 c · F2 which we used in the previous example. Let bF1 c → · F2 be the syntactic abbreviation for ¬(bF1 c · True) ∨ (bF1 c · F2 ). Informally, bF1 c → · F2 can be interpreted as “as soon as F1 is satisfied, do F2 afterwards” – provided that F2 is not True, since in that case the formula is a tautology. Consider a nested locking pattern, where we wish to detect when a thread t takes a lock l1 and does not release it until t has taken a different lock l2 , after which we verify another property ϕ. Using the newly defined operator, we can formulate the corresponding specification as Always(block(t, l1 ) ∧ U ntil(¬release(t, l1 ), lock(t, l2 ) ∧ l1 6= l2 )c → · ϕ) The latter specification is not an Eagle monitoring formula, since data parametrisation in Eagle is bound to evaluating the current state. However, we can formulate a semantically equivalent monitoring formula in Eagle: mon NestedLck = Always(isLock() → Nested(getThread(), getLock())) with the rule definition max Nested(int t, int l) = bUntil(¬release(t, l), isLock() ∧ getLock() 6= l)c→ · ϕ Since Eagle is implemented in Java, we rely on the methods isLock(), getLock(), getThread() and release() with the obvious semantics and we use integers as handles for threads and locks. It should be noted that getLock() returns the last lock obtained by the current thread, so that its return value when called in the monitoring formula NestedLck and its return value when called in the rule Nested(. . .) eventually differ. The shortest trace-length restriction in the concatenation formula of the rule

52

CHAPTER 3. TRACE COMPOSITION IN EAGLE

definition Nested(. . .) ensures that we match the first occurrence of a newly obtained lock, i.e. the rule parameter l and the return value of getLock() differ,where it is also ensured that the previous lock is not released yet.

3.1.2

Calculus and Implementation

An operational description for evaluating an Eagle monitoring formula on traces is given in the form of a calculus, which is suitable for on-line monitoring in the sense that it does not require any backtracking. To evaluate a specification that contains more than one monitoring formula, each monitoring formula is evaluated by a separate instance, where the specification is satisfied if and only if the evaluation of every instance results into True. However, the calculus is only an alternative representation of Eagle’s semantics, so that the evaluation of an arbitrary trace is indeed undecidable due to Eagle being Turing complete. The actual implementation of Eagle, which is derived from the calculus, overcomes the undecidability problem by working on a syntactical sub-set of the logic, so that infinite recursions cannot occur and any inspection of a finite trace eventually terminates. This decidable runtime-verification fragment of Eagle restricts the occurrence of rules to fall within the scope of a temporal operator, so that an infinite recursion during the evaluation of a particular state on the trace cannot occur, [ABG+ 05]. Operands of previous operators are evaluated immediately, where the result is then carried forward to the future so that no backtracking is necessary, and the evaluation of operands of next operators is simply postponed to the future. As such, the evaluation always proceeds to a next state and since any trace under inspection is of finite length, the evaluation will terminate eventually. In the following sections we will introduce new operators to Eagle, where we also show that these operators can be formulated in terms of current Eagle formulas, and additionally, that these formulas fall into the fragment of Eagle just described. It is therefore possible to carry the results of this chapter forward to the actual implementation of Eagle, where we later address this point again by taking a closer look at the efficiency of such an approach.

3.1.3

Chapter Outline

A formal definition of Eagle is given in Section 3.2. In Section 3.3 it is proven that sequential composition and concatenation are definable in terms of each

3.2. THE RUNTIME-VERIFICATION LOGIC EAGLE

53

other. In Section 3.4 Eagle is extended by deterministic cut operators, where it is shown that those operators are definable in unextended Eagle. In Section 3.5 Eagle’s calculus is extended by the deterministic cut operators and it is proven that when we augment Eagle’s calculus by the new operators, the asymptotic space complexity of on-line monitoring for the variants with restrictions applied to the left operand is no worse than the asymptotic space complexity of the sub-formulas. Section 3.6 summarises this chapter.

3.2

The Runtime-Verification Logic Eagle

Eagle is a temporal logic that is tailored to meet the needs of specification designers by allowing one to formalise structured specifications similar to modularised programming languages. Named rule definitions can be used to define custom operators or to assign sub-formulas with meaningful names. In fact, Eagle syntax and semantics are very similar to imperative and functional programming languages, so that specifications in Eagle are almost self-explanatory.3 as we have demonstrated in the previous examples. Eagle is a temporal logic based on recursively defined temporal predicates (rules) with four primitive temporal operators, #, , ·, ; . Formally: Definition 3.1 (Syntax of Eagle, [BGHS04b]) Specifications in Eagle are formed by a pair hD, Oi, where D is the declaration part and O the observer part. Rule definitions R define named parametrised rules N . Monitors M specify the requirements. D ::= R∗

O ::= M ∗

R ::= {min | max} N (T1 x1 , . . . , Tn xn ) = F

M ::= mon N = F

T ::= Form | primitive type F ::= False | True | xi | expression | ¬F | F1 ∧ F2 | F1 ∨ F2 | # F | F | F1 · F2 | F1 ; F2 | N (F1 , . . . , Fn )

Names of rule definitions as well as monitoring formulas are unique, i.e. there exists only one rule definition or monitoring formula for any name N in an arbitrary specification hD, Oi.  Formulas are evaluated over discrete finite traces of observation states. A sequence of states s1 , s2 , . . . , sn constitutes a trace σ of length |σ| = n. In order 3

Apparently, any specification can be obfuscated to a degree where it is not self-explanatory.

54

CHAPTER 3. TRACE COMPOSITION IN EAGLE

to keep track of the positions on the trace, states will be enumerated incrementally starting with one. σ [i,j] denotes then the sub-trace si , si+1 , . . . , sj of a trace σ. For sub-traces, the numbering of states will again begin from one. We write σ(i) to denote the i-th state of the trace. The empty trace, i.e. the trace of length 0, is abbreviated as ε. Definition 3.2 (Semantics of Eagle, [BGHS04b]) For trace σ = s1 s2 . . . s|σ| , the satisfiability relation σ, i |=D F , with 0 ≤ i ≤ |σ|+1, is defined as σ, i |=D True σ, i 6|=D False σ, i |=D expression

iff

1 ≤ i ≤ |σ| and evaluate(expression)(σ(i)) == true

σ, i |=D ¬F1

iff

σ, i 6|=D F1

σ, i |=D F1 ∧ F2

iff

σ, i |=D F1 and σ, i |=D F2

σ, i |=D F1 ∨ F2

iff

σ, i |=D F1 or σ, i |=D F2

σ, i |=D #F

iff

i ≤ |σ| and σ, i + 1 |=D F

σ, i |=D F

iff

1 ≤ i and |σ| ≥ 1 and σ, i − 1 |=D F

σ, i |=D F1 · F2

iff

∃j.i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and σ [j,|σ|] , 1 |=D F2

σ, i |=D F1 ; F2

σ, i |=D N (F1 , . . . , Fn )

iff

iff

∃j.i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and σ [j−1,|σ|] , 1 |=D F2   if 1 ≤ i ≤ |σ| then      σ, i |=D F [F1 /x1 , . . . , Fn /xn ], where      (N (T1 x1 , . . . , Tn xn ) = F ) ∈ D     if i = 0 or i = |σ| + 1 then  if (max N (T1 x1 , . . . , Tn xn ) = F ) ∈ D then      σ, i |=D True,      if (min N (T1 x1 , . . . , Tn xn ) = F ) ∈ D then     σ, i |=D False

 Whenever D follows from the context, we write |= instead of |=D . It should be noted that at trace boundaries, i.e. the absent states at index 0 and |σ| + 1, only the logical constant True and rules defined as max evaluate to true, while expressions or trace-composition operators evaluate to False. Also, once the trace has been left, i.e. a step has been made onto the boundary of the

3.2. THE RUNTIME-VERIFICATION LOGIC EAGLE

55

trace, it is possible to step back into the trace, but stepping beyond the boundary (stepping to indices -1 and |σ| + 2) evaluates to False. A specification hD, Oi is satisfied by a trace σ if all monitoring formulas of the specification are satisfied on σ. Each monitoring formula is evaluated from position one, regardless of the trace length. A trace is said to model a specification, if the specification is satisfied by the trace. The latter we denote by σ |= hD, Oi. Definition 3.3 (Trace-Checking, [BGHS04b]) A given trace σ satisfies a specification hD, Oi if all monitoring formulas hold on the trace from position one, i.e. σ |= hD, Oi iff ∀(mon N = F ) ∈ O. σ, 1 |=D F.  In the remainder of the thesis, the rule max Limit() = False is assumed to be part of every specification. It evaluates to True on the boundaries of a trace, i.e. when the current state is either 0 or |σ| + 1, otherwise it is False.

3.3

Interdefinability of Sequential Composition and Concatenation

Interdefinability of operators, i.e. expressibility of an operator in terms of another operator due to syntactical transformations, simplify definitions, proofs and implementations of a logic. A proof or implementation has only to focus on one of the operators then, where the obtained results can be carried forward to other operators. For the logic Eagle, we show that sequential composition and concatenation are equally expressive and we prove that sequential composition can be syntactically formulated in terms of concatenation and vice-versa. In Section 3.3.1 below we define sequential composition recursively in terms of concatenation. The other direction, however, is not so straightforward: Section 3.3.2 outlines our elimination procedure and argues its correctness.

3.3.1

Sequential Composition in Terms of Concatenation

A sequential composition formula F1 ; F2 can be expressed in terms of concatenation by simulation of the former operator’s semantics using a fixed-point rule

56

CHAPTER 3. TRACE COMPOSITION IN EAGLE

Sequential composition: p

q

p

q

models p ; q and decomposes into a left sub-trace that models p and q

a right sub-trace that models q In terms of concatenation: n z }| { p

q

models a left sub-trace of length n, q

which is followed by a right sub-trace |

{z

n−1

}

with a leading gap of n − 1 states

Figure 3.4: Sequential composition expressed by concatenation definition. The coding is as follows: the semantics of the sequential composition operator are modelled by guessing a cut on the trace, measuring – figuratively speaking – the length of the left sub-trace, and then starting the right sub-trace with a leading gap that is one state shorter than the length of the left sub-trace. We have visualised this translation in Figure 3.4. For the translation itself, which undertakes the length measuring, we define and add the new rule min SequentialComposition(Form F1 , Form F2 ) to every specification. The sequential composition operator can then be removed from arbitrary formulas, by substituting each sub-formula of the form F1 ; F2 by an application of the rule SequentialComposition(F1 , F2 ) ,where the rule is given as: min SequentialComposition(Form F1 , Form F2 ) = (((F1 ∧ #Limit()) · True) ∧ (Limit() · (F2 ∧ Limit()))) ∨ #SequentialComposition( F1 , F2 )

We defined the rule SequentialComposition(F1 , F2 ) as a minimal fixed-point, so that it will not be satisfied on the empty trace or the boundaries of a trace. This behaviour coincides with the semantics of the sequential composition operator. For non-empty traces, the first application of the rule body splits the trace, so that F1 is evaluated on a sub-trace with its boundary in the next state and F2 is evaluated on a sub-trace with its boundary in the previous state. Hence, the evaluation of F1 and F2 overlaps at the index at which the rule is evaluated. Additionally, the rule body contains a recursion #SequentialComposition( F1 , F2 ) that

3.3. INTERDEFINABILITY OF COMPOSITION OPERATORS

57

repeats the just described splitting of the trace, but now the sub-trace boundary for F1 is shifted one index to the right, and likewise, the evaluation of F2 begins one index later. The recursion finally terminates when the boundary of the trace is reached, on which SequentialComposition(F1 , F2 ) was first invoked. Theorem 3.1 (Sequential Composition in Terms of Concatenation) For every formula F of Eagle, we can give a semantically equivalent formula F 0 of Eagle, where F 0 contains no sequential composition sub-formula. Proof. Sequential composition can be expressed by concatenation, such that every occurrence of F1 ; F2 is substituted by SequentialComposition(F1 , F2 ), where min SequentialComposition(Form F1 , Form F2 ) = (((F1 ∧ #Limit()) · True) ∧ (Limit() · (F2 ∧ Limit()))) ∨ #SequentialComposition( F1 , F2 ) For the empty trace, neither the sequential composition formula nor the rule application hold: the sequential composition operator needs at least one shared state on which its operands hold, whilst the translation cannot hold on the empty trace since it is defined as a minimal fixed-point formula. On an arbitrary non-empty trace σ, the rule body can be expanded to the disjunction |σ|−i−1

_ n=0

#n ((( n F1 ∧ #Limit()) · True) ∧ (Limit() · (F2 ∧ Limit())))

where i denotes the index at which the rule is evaluated. Let k denote a fixed value of n, i.e. 0 ≤ k < |σ| − i, then the sub-formula ( k F1 ∧ #Limit()) · True is evaluated at i + k, which denotes that F1 is evaluated at i on the non-empty subtrace σ [1,i+k] . For the same k, the sub-formula Limit() · (F2 ∧ Limit()) causes the evaluation of F2 on the non-empty sub-trace σ [i+k,|σ|] , i.e. the sub-traces overlap as expected.  This result can be carried forward to any arbitrary Eagle-specification, where one subsequently replaces occurrences of sequential composition formulas – from innermost sub-formulas to outermost sub-formulas.

58

CHAPTER 3. TRACE COMPOSITION IN EAGLE

Concatenation: p

q

models p · q and decomposes into

p

a left sub-trace that models p and q

a right sub-trace that models q In terms of sequential composition: p

models a left sub-trace for p, q

a separating mock-trace, and q

a right sub-trace for q

Figure 3.5: Concatenation expressed by sequential composition (simplified)

3.3.2

Concatenation in Terms of Sequential Composition

Concatenation can be expressed in terms of sequential composition as well. However, due to the semantics of the concatenation operator, there is no single substitution mechanism for substituting all occurrences of concatenation sub-formulas by equivalent sequential composition sub-formulas. For concatenation, one or even both operands can hold on the empty trace, while sequential composition requires that its operands hold on sub-traces of non-zero length. Therefore a substitution of a concatenation sub-formula by an equivalent sequential composition formula has to take into account that one or both of the concatenation’s sub-formulas might hold on the empty trace. Depending on which of the two operands of the concatenation sub-formulas can hold on the empty trace, different sequential composition formulas have to be substituted. A simplified visualisation of the translation is given in Figure 3.5, where none of the concatenation formula’s parameters hold on the empty trace. In the following, it will be proven that for a given Eagle formula, it can be determined if it holds on the empty trace (Lemma 3.1). From this particular result it follows immediately that concatenation is expressible in terms of sequential composition, such that for each combination of concatenation sub-formulas which may or may not hold on the empty trace, a suitable sequential composition formula can be substituted (Theorem 3.2). We show that it is sufficient to inspect an Eagle formula syntactically, in

3.3. INTERDEFINABILITY OF COMPOSITION OPERATORS

59

order to verify whether it would be satisfied on the empty trace or not. More importantly, rule applications do not have to be substituted by their rule bodies at any point, which would otherwise lead to the undecidability of the problem. The latter is due to the possible encoding of a Turing-machine or equivalent device in Eagle.4 Lemma 3.1 (Decidability of Trace-Checking the Empty Trace) For an arbitrary formula in Eagle, it is decidable whether it is satisfiable on the empty trace. Proof. For an arbitrary formula we can inductively determine whether it holds on the empty trace or not. Base case: • False does not hold on the empty trace, • True does hold on the empty trace, • expression does not hold on the empty trace, • N (. . .), with (min N (T1 x1 , . . . , Tn xn ) = F ) ∈ D, is not satisfied on the empty trace, • N (. . .), with (max N (T1 x1 , . . . , Tn xn ) = F ) ∈ D, is satisfied on the empty trace. Inductive step: • F1 ∨F2 is satisfied on the empty trace, if either F1 or F2 or both are satisfied on the empty trace, • F1 ∧ F2 is satisfied on the empty trace, if F1 and F2 are satisfied on the empty trace, • ¬F is satisfied on the empty trace, if F is not satisfied on the empty trace, • #F and F are not satisfied on the empty trace, since it is not possible to step beyond a trace’s boundaries, • F1 · F2 is satisfied on the empty trace, if both F1 and F2 are satisfied on the empty trace, 4

It is in fact straightforward to implement a Minsky machine in Eagle, which is Turing complete [Min61].

60

CHAPTER 3. TRACE COMPOSITION IN EAGLE • F1 ; F2 is not satisfied on the empty trace, since at least one state is required to denote the overlapping of traces.

For an arbitrary formula F we can therefore decide whether it holds on the empty trace or not by considering F ’s sub-formulas that match the base cases above, and then expand these results to larger sub-formulas using the inductive steps until F is covered.  We give a translation from any formula F1 · F2 to an equivalent concatenationfree formula, which is parametrised by which of the operands F1 and F2 are satisfiable on the empty trace, in the proof of the following theorem. Theorem 3.2 (Concatenation in Terms of Sequential Composition) For every formula F of Eagle, we can give a semantically equivalent formula F 0 of Eagle, where F 0 contains no concatenation sub-formula. Proof. Concatenation can be expressed by sequential composition, such that every occurrence of F1 · F2 is substituted by ψ ψ ∨ F1 ψ ∨ ( Limit() ∧ F2 ) ψ ∨ F1 ∨ ( Limit() ∧ F2 ) ∨ Limit()

iff iff iff iff

ε, 1 |= ¬F1 ∧ ¬F2 , ε, 1 |= ¬F1 ∧ F2 , ε, 1 |= F1 ∧ ¬F2 , ε, 1 |= F1 ∧ F2 ,

where ψ ≡ (F1 ; (#2 Limit() ; F2 )) ∨ (#(F1 ∧ Limit()) ; (#2 Limit() ; F2 )). We show that the translation is semantically equivalent to the original formula for the case where neither of the concatenation formula’s operands holds on the empty trace first. It is then sufficient to only address the additional formulas for the remaining cases. Case ε, 1 6|= ¬F1 ∧ ¬F2 : The formula ψ denotes that F1 and F2 can either be separated by introducing a separating mock-trace as shown in Figure 3.5, which is due to F1 ; (#2 Limit() ; F2 ), or it is the case that F1 is evaluated on the right boundary of its sub-trace because of Eagle’s semantics5 , which is expressed by (#(F1 ∧ Limit()) ; (#2 Limit() ; F2 )). Since neither F1 nor F2 are holding on the empty trace, the shortest trace which can satisfy F1 · F2 has to be of length 2. When F1 · F2 is evaluated at index 5

The argument F1 is evaluated on the right boundary of its sub-trace when we set i = j in the semantics of σ, i |=D F1 · F2 iff ∃j.i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and σ [j,|σ|] , 1 |=D F2 .

3.4. DETERMINISTIC CUT OPERATORS

61

i, 1 ≤ i ≤ |σ|, then according to the semantics of concatenation, the trace is either split as σ [1,i] /σ [i+1,|σ|] or σ [1,i−1] /σ [i,|σ|] , which is reflected by ψ, as described above. Case ε, 1 6|= ¬F1 ∧F2 : When F2 is satisfiable on the empty trace, the semantics of concatenation allow us an additional split σ [1,|σ|] /σ [|σ|+1,|σ|] . Hence, the evaluation of F2 can be completely omitted, so that only F1 is evaluated as denoted by the given disjunction. Case ε, 1 6|= F1 ∧ ¬F2 : If F1 is satisfiable on the empty trace, it cannot simply be omitted. According to the semantics of concatenation, F1 is evaluated only for i = 1, where the trace is then split as σ [1,0] /σ [1,|σ|] . As such, it is necessary to verify that F2 is evaluated at i = 1, which is the case when Limit() evaluates to True. The resulting formula for representing this semantics is therefore Limit() ∧ F2 . Case ε, 1 6|= ¬F1 ∧ ¬F2 : Trivially, this case includes the previous cases. Additionally, both formulas F1 and F2 are satisfied on the empty trace, which implies that the trace under evaluation is empty, i.e. ε, 1 |= F1 · F2 . The latter is modelled by the disjunct Limit().  Again, this result can be carried forward to any arbitrary Eagle-specification, where one subsequently replaces occurrences of concatenation formulas – from innermost sub-formulas to outermost sub-formulas.

3.4

Deterministic Cut Operators

Both sequential composition and concatenation allow a trace to be split non-deterministically, i.e. due to the semantics of the operators, several cut positions may satisfy a formula F1 ; F2 or F1 · F2 on a given trace. The designer of a monitoring specification may however desire a unique position of the cut, i.e. a deterministic choice of where a trace is being cut. In the following, mixfix operators are introduced which allow us to express deterministic cuts in specifications. These operators extend sequential composition and concatenation by additionally verifying that there is no shorter, respectively longer, sub-trace on which the sub-formula holds. It is shown that all deterministic cut operators can be formulated in unextended Eagle. Even though the operators do not increase Eagle’s expressiveness, we show in Section 3.5 that the new operators enable more efficient on-line monitoring.

62

CHAPTER 3. TRACE COMPOSITION IN EAGLE

3.4.1

Syntax and Semantics of Deterministic Cut Operators

Eagle with deterministic cut operators extends the syntax of Definition 3.1. For brevity just the new BNF production F is given. The other productions are left unchanged. Definition 3.4 (Syntax of Eagle with Mixfix Operators) Eagle[] denotes an extension of Eagle with additional mixfix operators, where the production F of Definition 3.1 is replaced by F ::= False | True | xi | expression | ¬F | F1 ∨ F2 | # F | F |

F1 ◦ F2 | bF1 c ◦ F2 | dF1 e ◦ F2 | F1 ◦ bF1 c | F1 ◦ dF1 e | N (F1 , . . . , Fn )

◦ ::=

; |·



For an operand F of a concatenation or sequential composition formula, we write bF c and dF e to denote that F is only satisfied on its respectively shortest and longest sub-trace of all the sub-traces that satisfy the unrestricted F . In the following, we will then refer to bF c and dF e as the minimally and maximally trace length restricting formulas, respectively. As with the definition of Eagle[] ’s syntax, only the extensions to Eagle’s semantics are given. Definition 3.5 (Semantics of Eagle with Mixfix Operators) On traces σ = s1 s2 . . . s|σ| the satisfiability relation σ, i |=D F , with 0 ≤ i ≤ |σ|+1, is extended by σ, i |=D bF1 c · F2

iff

∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j,|σ|] , 1 |=D F2 and ¬∃k.i − 1 ≤ k < j − 1 and σ [1,k] , i |= F1 σ, i |=D dF1 e · F2

iff

∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= F1 σ, i |=D F1 · bF2 c

iff

∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j,|σ|] , 1 |=D F2 and ¬∃k.j < k ≤ |σ| + 1 and σ [k,|σ|] , 1 |= F2 σ, i |=D F1 · dF2 e

iff

∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j,|σ|] , 1 |=D F2 and ¬∃k.1 ≤ k < j and σ [k,|σ|] , 1 |= F2 σ, i |=D bF1 c ; F2

iff

∃j. i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j−1,|σ|] , 1 |=D F2 and ¬∃k.i ≤ k < j − 1 and σ [1,k] , i |= F1

3.4. DETERMINISTIC CUT OPERATORS

σ, i |=D dF1 e ; F2

iff

63

∃j. i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j−1,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= F1 σ, i |=D F1 ; bF2 c

iff

∃j. i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j−1,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [k,|σ|] , 1 |= F2 σ, i |=D F1 ; dF2 e

iff

∃j. i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F1 and

σ [j−1,|σ|] , 1 |=D F2 and ¬∃k.1 ≤ k < j − 1 and σ [k,|σ|] , 1 |= F2 

We depict three applications of the deterministic cut operators in Figure 3.6 below, where we show the evaluation of σ, 1 |= Eventually(err) ; drste, of σ, 1 |= ϕ and of σ, 1 |= ψ on an example trace σ as shown.

Eventually(err) ; drste ϕ ψ ϕ ≡ bEventually(err ∧ Eventually(rst))c · ok ψ ≡ True · dberr ∧ Eventually(rst)c · oke

Figure 3.6: Examples of deterministic cut operator applications It should be noted that while the formula Eventually(err) ; drste is satisfied on the example trace, this is not the case for the formula Eventually(err) ; doke. The longest sub-trace on which “ok” is satisfied is the whole trace, i.e. due to the “ok” at index one of the trace, but for that cut the left-hand formula Eventually(err) is not true. Alternative Deterministic Cuts We have presented mixfix operators whose cut positions are deterministically determined by the formula for which the trace-length operator applies. Apparently,

64

CHAPTER 3. TRACE COMPOSITION IN EAGLE err

err rst

err

err

err rst

  rst

err rst

Eventually(err) ·lm rst

  

err

X

bEventually(err)c · rst



Figure 3.7: An alternative deterministic cut operator

other modes of deterministic cut operators can be found, of which we discuss one possible alternative solution in the following. We are considering the na¨ıve approach where the cut position is deterministically chosen among all possible non-deterministic cuts which satisfy the unconstrained trace-composition operators. In Figure 3.7, we have depicted an alternative deterministic cut operator, ·lm , which shares its semantics with Eagle’s concatenation operator, except that for a formula F1 ·lm F2 only the leftmost cut that satisfies the formula F1 ·F2 is chosen. We are reconsidering our example from the previous page about the occurrences of errors and resets (“err” and “rst” above). As it is depicted, Eventually(err) ·lm rst satisfies the trace under consideration, whilst the formula bEventually(err)c · rst does not hold on the given trace. The drawback of this approach is clearly the high complexity of the evaluation of such semantics, since we have to assume that a cut can be made at any index of the trace, where only at the end of the trace we can then pick the leftmost cut position for which both operands evaluate to True. As such, whilst the cut position is determined deterministically, the evaluation of the cut does not perform any better than the evaluation of the respective non-deterministic operator. On the other hand, a trace-length restricted formula permits the computation of the deterministic cut position independently of the second operator, and therefore, it can be evaluated at either the same computational costs or even more efficiently as we show in the proofs of Theorem 3.6 and Theorem 3.7 respectively.

3.4. DETERMINISTIC CUT OPERATORS

3.4.2

65

Expressiveness of Deterministic Cut Operators

It is not straightforward to see whether the mixfix variants of sequential composition and concatenation can also be defined in Eagle. In the following it will be shown that Eagle[] is not more expressive than Eagle. The translation of the maximal mixfix operators into Eagle is given first, followed by the translation for the minimal mixfix operators. For the maximal mixfix-operators, we will use the rules6 min NonMtMxLT(Form F1 , Form F2 ) = (((F1 ∧ #Limit()) · F2 ) → ¬((F1 ∧ Eventually(#2 Limit())) · True)) ∨ #NonMtMxLT( F1 , F2 ,),

min NonMtMxRT(Form F1 , Form F2 ) = ((F1 · (F2 ∧ #Limit())) → ¬(True · (F2 ∧ Eventually(#2 Limit())))) ∨ #NonMtMxRT(F1 , F2 ),

min NonMtMxOvrlpngLT(Form F1 , Form F2 ) = (((F1 ∧ #Limit()) ; F2 ) → ¬((F1 ∧ Eventually(#2 Limit())) ; True)) ∨ #NonMtMxOvrlpngLT( F1 , F2 ,), min NonMtMxOvrlpngRT(Form F1 , Form F2 ) = ((F1 ; (F2 ∧ #Limit())) → ¬(True ; (F2 ∧ Eventually(#2 Limit())))) ∨ #NonMtMxOvrlpngRT(F1 , F2 ),

in order to denote the semantics of dF1 e · F2 , F1 · dF2 e, dF1 e ; F2 and F1 ; dF2 e on non-empty traces, respectively. Each of the above rule definitions ensures that the is no non-empty left or right sub-trace, which is longer than the than the respective trace-length restricted formula (F1 or F2 ). For example, the rule application NonMtMxOvrlpngLT(F1 , F2 ) denotes that when a cut is made on which F1 ; F2 is satisfied, then it is not the case that there is a future cut point on whose left-hand sub-trace F1 is satisfied. As such, NonMtMxOvrlpngLT(F1 , F2 ) coincides with the semantics of dF1 e ; F2 , which is proven in the following pages. Before we carry on to give the formal proof, we outline the semantics of NonMtMxLT(F1 , F2 ). The semantics of the remaining rules can be explained similarly. When NonMtMxLT(F1 , F2 ) is substituted for an occurrence of the formula 6

NonMtMxLT spells out as NonEmptyMaximalLeftTrace, etc.

66

CHAPTER 3. TRACE COMPOSITION IN EAGLE

dF1 e · F2 , the first invocation of its rule body will cause a cut of the form: (((F1 ∧ #Limit()) · F2 ) → ¬((F1 ∧ Eventually(#2 Limit())) · True)) The sub-formula (F1 ∧ #Limit()) · F2 denotes that the cut is enforced so that the right boundary of the left-subtrace immediately follows the current index at which the rule body is evaluated. Then, the following implication ¬((F1 ∧ Eventually(#2 Limit()))·True)), ensures that F1 is not satisfied on any future subtrace, i.e. a sub-trace for which the cut is made further to the right. The rule body enters a recursion due to the disjunctive formula #NonMtMxLT( F1 , F2 ), so that the rule is evaluated for all positions on which dF1 e · F2 would be evaluated on. With each recursion step, the cut is moved one index further to the right, where the recursion terminates as soon as the boundary of the trace under inspection is reached. Whilst the maximal sequential composition mixfix operators can be directly expressed by the given rules, this is not the case for the maximal concatenation mixfix operators. Since dF1 e · F2 and F1 · dF2 e could be satisfiable on the empty trace, their corresponding rules in the respective translations have to be accompanied by a formula that explicitly handles the formulas holding on the empty trace. With these rule definitions and the extra explanation given for the concatenation operators, the maximal mixfix operators can be expressed in Eagle as dF1 e · F2 ≡ (((F1 ∧ Limit()) · F2 ) → ¬((F1 ∧ ¬Limit()) · True)) ∨ NonMtMxLT(F1 , F2 ) F1 · dF2 e ≡ ((F1 · (F2 ∧ Limit())) → ¬(True · (F2 ∧ ¬Limit()))) ∨ NonMtMxRT(F1 , F2 ) dF1 e ; F2 ≡ NonMtMxOvrlpngLT(F1 , F2 ) F1 ; dF2 e ≡ NonMtMxOvrlpngRT(F1 , F2 )

Theorem 3.3 (Maximal Trace-Length Operators are Syntactically Expressible in Eagle) For each of the formulas dF1 e · F2 , F1 · dF2 e, dF1 e ; F2 and F1 ; dF2 e of Eagle[] there exists a semantically equivalent formula in Eagle.

3.4. DETERMINISTIC CUT OPERATORS

67

Proof. (Sketch) The semantical equivalence is shown for dF1 e · F2 and its translation only. The proof can then be carried forward to the remaining equivalences in a straightforward manner. For an arbitrary trace σ, the translation (((F1 ∧ Limit()) · F2 ) → ¬((F1 ∧ ¬Limit()) · True)) ∨ NonMtMxLT(F1 , F2 ) of dF1 e · F2 can be rewritten as the formula (((F1 ∧ Limit()) · F2 ) → ¬((F1 ∧ ¬Limit()) · True)) ∨ W∞ n n n=0 # ((( F1 ∧ #Limit()) · F2 ) → ¬(( n F1 ∧ Eventually(#2 Limit())) · True))

(1) (2)

in which the fix-point equation NonMtMxLT(F1 , F2 ) is fully expanded. In the disjunction over n, it is sufficient to reduce the upper bound from ∞ to |σ|, since the operand of the disjunction evaluates to False as soon as the evaluation goes beyond the trace due to the #n formula prefix. The expanded formula logically spells out the semantics of dF1 e · F2 , where

(1) makes sure that when F1 holds on the empty trace, then it does not hold on any longer sub-trace than this, and (2) ensures that when F1 holds on a non-empty trace of length n, then there is no longer trace (i.e., a trace of length n + x, with x ≥ 1 being the trace-length of the Eventually(. . .) sub-formula).  For the minimal mixfix-operators, the translations are much simpler. Minimal cut operators can be formulated in terms of non-deterministic cut operators by adding a constraint that ensures that the trace-length of a trace-length restricted formula is not satisfied on any cut which yields a shorter sub-trace. For example, the left-minimal trace-length restriction as in bF1 c ; F2 can be rewritten as (F1 ∧ ϕ) ; F2 , if ϕ guarantees that for the choosen cut point F1 does not hold on a shorter sub-trace. In Eagle, we can express ϕ in terms of a parametrised rule where F1 is passed as an argument and the rule body’s semantics express that F1 does not hold on a shorter sub-trace. In fact, we introduce two rules which are satisfied when there is a shorter non-empty sub-trace of the current trace under

68

CHAPTER 3. TRACE COMPOSITION IN EAGLE

inspection on which F is satisfied, and we use their negation to ensure their is no shorter sub-trace. The rule min LeftShorterNonEmptyTrace(Form F ) holds when there is no shorter non-empty trace, where the trace length is reduced by successively removing states at the end of the sub-trace, whilst the rule min RightShorterNonEmptyTrace(Form F ) also holds when there is no shorter non-empty trace, but the trace length is reduced by successively removing states at the beginning of the sub-trace. Formally, the rules are defined as follows: min LeftShorterNonEmptyTrace(Form F ) = ((F ∧ #Limit()) · #True) ∨ #LeftShorterNonEmptyTrace( F ) min RightShorterNonEmptyTrace(Form F ) = (#Limit() · F ) ∨ #RightShorterNonEmptyTrace( F )

In the actual translation, it is then sufficient to verify whether the restricted sub-formula cannot be satisfied on a shorter sub-trace. For example, bF1 c · F2 becomes (F1 ∧¬LeftShorterNonEmptyTrace(F1 ))·F2 , which reflects the semantics of bF1 c·F2 under the assumption that F1 is not satisfied on the empty trace. Since for mixfix concatenation formulas it is the case that the trace length restricted formula can also be satisfied on the empty trace, we have to add a formula to the translations which explicitly addresses this. Concretely, the translations are as follows: bF1 c · F2 ≡ ((F1 ∧ Limit()) · F2 ) ∨ (((F1 ∧ ¬LeftShorterNonEmptyTrace(F1 )) · F2 ) → ¬(F1 ∧ Limit() · True)) F1 · bF2 c ≡ (F1 · (F2 ∧ Limit())) ∨ ((F1 · (F2 ∧ ¬RightShorterNonEmptyTrace(F2 ))) → ¬(True · (F2 ∧ Limit()))) bF1 c ; F2 ≡ (F1 ∧ ¬LeftShorterNonEmptyTrace(F1 )) ; F2 F1 ; bF2 c ≡ F1 ; (F2 ∧ ¬RightShorterNonEmptyTrace(F2 )) Theorem 3.4 (Minimal Trace-Length Operators are Syntactically Expressible in Eagle) For each of the formulas bF1 c · F2 , F1 · bF2 c, bF1 c ; F2 and F1 ; bF2 c of Eagle[] there exists a semantically equivalent formula in Eagle. Proof. (Sketch) Semantical equivalence is only shown for the first translation of bF1 c · F2 , where the proof can be easily carried forward to the remaining translations.

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 69 For an arbitrary trace σ, we substitute bF1 c · F2 by the formula ((F1 ∧ Limit()) · F2 ) ∨ (((F1 ∧ ¬LeftShorterNonEmptyTrace(F1 )) · F2 ) → ¬(F1 ∧ Limit() · True)) and then expand the rule LeftShorterNonEmptyTrace(F1 ) recursively, so that we are left with the formula ((F1 ∧ Limit()) · F2 ) ∨ W n n (((F1 ∧ ¬ ∞ n=0 # (( F1 ∧ #Limit()) · #True))) · F2 ) → ¬(F1 ∧ Limit() · True))

(1) (2)

The expanded formula’s semantics are as follows: (1) denotes that F1 holds on the empty trace, and as such, there is no shorter sub-trace on which F1 can be satisfied; or it is the case that (2) implies that the previous formula is not satisfied — because the current formula under consideration would then not hold on the shortest sub-trace — and that for the chosen cut point there is no shorter left-hand non-empty sub-trace on which F1 is satisfied. The latter constraint is formalised by W n n the negated disjunction ∞ n=0 # (( F1 ∧ #Limit()) · #True), where the right-hand formula #True ensures that the sub-trace on which F1 of the formula under the big disjunction is evaluated on is always at least one state shorter than the sub-trace itself. 

3.5

On-line Monitoring of Deterministic Cut Operators

In [BGHS03], a calculus for Eagle was presented that defines directly an on-line monitoring algorithm in which observation states are consumed on a step-bystep basis in tandem with a partial evaluation of the monitoring formula. Here, Eagle’s calculus is extended by rules that encode the semantics of the mixfix operators of Eagle[] . For the calculus of Eagle[] , we establish that the asymptotic space complexity of on-line monitoring for the variants with restrictions applied to the left operand is no worse than the asymptotic space complexity

70

CHAPTER 3. TRACE COMPOSITION IN EAGLE

of the sub-formulas. For the operators with restrictions applied to the right operand, we show that the space complexity coincides with the corresponding non-deterministic operators. The extended calculus allows us an efficient evaluation, which cannot be achieved by substituting appearances of mixfix operators by their semantically equivalent Eagle-formulas. For example, in the extended calculus the evaluation of F1 ; bF2 c takes |σ| applications of on-line monitoring evaluation steps, while the semantically equivalent Eagle-formula F1 ; (F2 ∧ ¬ShorterNonEmptyTrace(F2 )) already takes |σ|2 applications of on-line monitoring evaluation steps due to evaluation of the sequential composition operator in the formula, plus the evaluation steps for the rule ShorterNonEmptyTrace(F2 ).

3.5.1

Eagle’s On-Line Monitoring Algorithm

The evaluation calculus presented in [BGHS03] uses four functions, inithh. . .ii, eval hh. . .ii, updatehh. . .ii and valuehh. . .ii, each of which carries out a designated task during the monitoring of a trace. A short description of each function is presented in the following. First, a formula is initialised using inithh. . .ii, which substitutes occurrences of the temporal operators # and as well as rule applications by named formulas whose arguments keep track of all possible non-deterministic evaluations of those sub-formulas. For example, a sub-formula F will be rewritten as Previous(inithhF, Z, b0 ii, valuehhinithhF, Z, b0 iiii), where the first argument keeps track of the evaluation of the current state in case there is a #-operator in a sub-formula of F that undoes the effect of the -operator, whilst the second argument remembers the truth value of F in the current state for the calculus’ future evaluation. Second, eval hh. . .ii evaluates the resulting formula in the current state, where, third, updatehh. . .ii takes care of -operator related occurrences of Previous(. . .) so that a history of states does not need to be stored. Each evaluation step, i.e. each application of eval hh. . .ii, corresponds to one state on the trace, where the first application of eval hh. . .ii refers to the first state of the trace under inspection, the second application of eval hh. . .ii refers to the second state of the trace and so on. Fourth, valuehh. . .ii determines the truth value of the verification at the boundaries of a trace. The boundary of a trace is obviously reached after each state has

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 71 been inspected by eval hh. . .ii, but valuehh. . .ii is also applied at the boundaries of sub-traces which are due to the occurrence of Eagle’s sequential composition and concatenation operators. In the following, ρb.F (b) is a closed term which denotes a fixed-point, such that ρb.F (b) = F (ρb.F (b)), where b represents the recursion variable. Furthermore, named operators are introduced. The named operators are indeed functions of some type Form × . . . × Form → Form such that it is possible to rewrite a formula during evaluation. Rules are assumed to have their parameters ordered by their type in the form N (Form F1 , . . . , Form Fm , primitive type x1 , . . . , primitive type xn ) = F . W.l.o.g. all definitions can be rewritten into this form by simply reordering the rule’s arguments. The arguments are then written as two vectors F~ and P~ with −−−→ types Form and T~ respectively. Similar to the rewriting of #, each rule N −−−→ is rewritten as N : Form × T~ → Form during initialisation, where the first argument denotes a recursive application of the rule body of N . Definition 3.6 (Runtime-Verification Calculus of Eagle, [BGHS03]) A monitoring formula F holds on a trace σ = s1 s2 . . . s|σ| , iff valuehheval hh. . . eval hheval hhinithhF, null, nullii, s1 ii, s2 ii . . ., s|σ| iiii evaluates to True, where null denotes a special element that is not equivalent to −−−→ any other formula of Eagle. Instances of vector types Form and T~ are denoted by hF1 , . . . , Fn i and hp, . . . , ri respectively. For both vector types, ~∅ denotes the empty vector. The rules for the temporal operators and temporal predicates are:

inithhα, Z, b0 ii = α, where α ∈ {expression, True, False} inithh#F , Z, b0 ii = Next(inithhF, Z, b0 ii) inithh F , Z, b0 ii = Previous(α, valuehhαii), where α = inithhF, Z, b0 ii inithhF1 ◦ F2 , Z, b0 ii = inithhF1 , Z, b0 ii ◦ inithhF2 , Z, b0 ii, where ◦ ∈ {∧, ∨} inithhF1 ◦ F2 , Z, b0 ii = inithhF1 , Z, b0 ii ◦ inithhF2 , Z, b0 ii, where ◦ ∈ {·, ; } inithhN (F~ , P~ ), N (F~ , P~ 0 ), b0 ii = N (b0 , P~ ) inithhN (F~ , P~ ), Z, b0 ii = N (ρb.inithhF [Fˆ /F~ ], N (F~ , P~ ), bii, P~ ), where Fˆ = inithhF~ , Z, b0 ii and Z 6≡ N (F~ , P~ 0 ), where P~ 0 is an arbitrary vector

72

CHAPTER 3. TRACE COMPOSITION IN EAGLE

valuehhexpressionii = False valuehhTrueii = True valuehhFalseii = False valuehhF1 ◦ F2 ii = valuehhF1 ii ◦ valuehhF2 ii where ◦ ∈ {∧, ∨} ( F if at the beginning of the trace valuehhNext(F )ii = False if at the end of the trace or |σ| = 0 ( False if at the beg. of the trace or |σ| = 0 valuehhPrevious(F, Fˆ )ii = valuehhF ii if at the end of the trace valuehhF1 · F2 ii = valuehhF1 ii ∧ valuehhF2 ii valuehhF1 ; F2 ii = False ( True if (max N (. . .)) ∈ R, valuehhN (F~ , P~ )ii = False otherwise

eval hhTrue, sii = True eval hhFalse, sii = False ( True eval hhexpression, sii = False

if expression is True in s otherwise

eval hhF1 ◦ F2 , sii = eval hhF1 , sii ◦ eval hhF2 , sii where ◦ ∈ {∧, ∨} eval hhNext(F ), sii = updatehhF, s, null, nullii eval hhPrevious(F, Fˆ ), sii = eval hhFˆ , sii eval hhF1 · F2 , sii =

if valuehhF1 ii = True then (α · F2 ) ∨ eval hhF2 , sii else α · F2 , where α = eval hhF1 , sii

eval hhF1 ; F2 , sii =

if valuehhαii = True then (α ; F2 ) ∨ eval hhF2 , sii else α ; F2 , where α = eval hhF1 , sii

eval hhN (ρb0 .F (b0 ), P~ ), sii = eval hhF (ρb0 .F (b0 ))[eval hhP~ , sii/~ p ], sii

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 73 updatehhα, s, Z, b0 ii = α where α ∈ {expression, True, False} updatehhF1 ◦ F2 , s, Z, b0 ii = updatehhF1 , s, Z, b0 ii ◦ updatehhF2 , s, Z, b0 ii, where ◦ ∈ {∧, ∨} updatehhNext(F ), s, Z, b0 ii = Next(updatehhF, s, Z, b0 ii) updatehhPrevious(F, Fˆ ), s, Z, b0 ii = Previous(updatehhF, s, Z, b0 ii, eval hhF, sii) updatehhF1 ◦ F2 , s, Z, b0 ii = updatehhF1 , s, Z, b0 ii ◦ F2 , where ◦ ∈ { ; , ·} updatehhα, s, α, b0 ii = N (b0 , P~ ), where α ≡ N (ρb.F (b), P~ ) updatehhα, s, Fˆ , Zii = N (ρb0 .updatehhF (ρb0 .F (b0 )), s, α, P~ ii, P~ ), where α ≡ N (ρb.F (b), P~ ) and Z 6≡ N (F~ , . . .)

The rules for propositional constants and operators are defined in the obvious way.  We provide two examples, one of which shows the evaluation of the temporal operators # and , and the other one which demonstrates the evaluation of a rule. For the first example, we have chosen a trace of length one, where the proposition p evaluates to True in the trace’s solemn state. In the second example, we are focusing on the evaluation of a rule, where we decided to use a trace of length two. Evaluating # p on h{p}i: 1. valuehheval hhinithh# p, null, nullii, s1 iiii 2. valuehheval hhNext(inithh p, null, nullii), s1 iiii For the next step, inithh p, null, nullii is rewritten to Previous(inithhp, null, nullii, valuehhinithhp, null, nulliiii). The first parameter of Previous(. . .) stores the formula that would be evaluated after the -operator. In the second parameter, the past is stored, which is referring to the left boundary of the trace now. 3. valuehheval hh Next(Previous(inithhp, null, nullii, valuehhinithhp, null, nulliiii)), s1 iiii 4. valuehheval hhNext(Previous(p, valuehhpii)), s1 iiii 5. valuehheval hhNext(Previous(p, False)), s1 iiii

74

CHAPTER 3. TRACE COMPOSITION IN EAGLE 6. valuehhupdatehhPrevious(p, False), s1 , null, nulliiii

When Next(. . .) is evaluated, it is rewritten to updatehh. . .ii. The last two parameters of updatehh. . .ii are only used in conjunction with the evaluation of rules, so that they can be ignored in this example. updatehh. . .ii rewrites the arguments of occurrences of Previous(. . .), since due to the next operator, previous states occur now as being shifted one state to the end of the trace. Hence, the first parameter of Previous(. . .), which was used to store the initialised parameter of the -operator, is evaluated in this state and the result of this evaluation is placed in the second parameter. 7. valuehhPrevious(updatehhp, s1 , null, nullii, eval hhp, s1 ii)ii valuehh. . .ii now refers to the right boundary of the trace. Hence, the second parameter of Previous(. . .), which stores the result of the last state, determines the outcome of valuehh. . .ii. 8. valuehhPrevious(p, True)ii 9. valuehhTrueii 10. True Evaluating Eventually(p) on h{}, {p}i: Before the actual evaluation can be described, the rule Eventually(. . .) has to be defined. Literally, Eventually(F ) denotes that F holds eventually, i.e. F evaluates to True on some state before the boundary of the trace is reached. Formally, min Eventually(Form F1 ) = F1 ∨ #Eventually(F1 ) In order to evaluate this rule in the calculus, it has to be rewritten into the proposed normal form, where the parameters are ordered. This ordering requires parameters of type Form to come before parameters of type primitive type. Since there are no parameters of primitive type associated with Eventually(F ), the transformed rule has an empty vector for its non-existing primitive parameters and thus becomes Eventually(hF i, ~∅). 1. valuehheval hheval hhinithhEventually(hpi, ~∅), null, nullii, s1 ii, s2 iiii

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 75 When initialising a rule, it is substituted by a recursive application of its definition. Since a rule can be unfolded infinitely, its recursion variable b is bound by ρb.F (b, . . .). F (b, . . .) is initialised too, where the substituted rule’s name and the recursive variable are passed as parameters to inithh. . .ii. The latter parameters will be used to prevent an infinite unfolding so that the rule body is indeed only unfolded once. 2. valuehheval hheval hhEventually( ρb.inithh(F1 ∨ #Eventually(hF1 i, ~∅))[hpi/hF1 i], Eventually(hpi, ~∅), bii, ~∅), s1 ii, s2 iiii 3. valuehheval hheval hhEventually( ρb.inithhp ∨ #Eventually(hpi, ~∅), Eventually(hpi, ~∅), bii, ~∅), s1 ii, s2 iiii 4. valuehheval hheval hhEventually( ρb.(inithhp, Eventually(hpi, ~∅), bii∨ inithh#Eventually(hpi, ~∅), Eventually(hpi, ~∅), bii), ~∅), s1 ii, s2 iiii

5. valuehheval hheval hhEventually( ρb.(p ∨ Next(inithhEventually(hpi, ~∅), Eventually(hpi, ~∅), bii)), ~∅), s1 ii, s2 iiii inithh. . .ii is now at a point where another unfolding is prevented, such that the unfolding of the rule body is only done once. This is recognised by the matching first two parameters of inithh. . .ii. The third parameter b is then used to refer back to the recursive application ρb.F (b, . . .), such that the formula is closed and ready for evaluation in the next state. 6. valuehheval hheval hhEventually(ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), s1 ii, s2 iiii 7. valuehheval hheval hhp ∨ Next(Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅)), s1 ii, s2 iiii 8. valuehheval hheval hhp, s1 ii ∨ eval hhNext(Eventually( ρb.(p ∨ Next(Eventually(b, ~∅)))), ~∅), s1 ii, s2 iiii 9. valuehheval hhFalse ∨ updatehhEventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), s1 , null, nullii, s2 iiii Updating Eventually(. . .) has no effect on the formula other than substituting b with b0 , because the rule body contains no -operator. As in initialisation, the rule body is unfolded only once. This is realised by passing down Eventually(. . .) and the recursion variable, as has already been done using inithh. . .ii.

76

CHAPTER 3. TRACE COMPOSITION IN EAGLE

10. valuehheval hhFalse ∨ Eventually( ρb0 .updatehhp ∨ Next(Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅)), s1 , Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), b0 ii, ~∅), s2 iiii 11. valuehheval hhFalse ∨ Eventually( ρb0 .(updatehhp, s1 , Eventually(ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), b0 ii∨ updatehhNext(Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅)), s1 , Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), b0 ii), ~∅), s2 iiii 12. valuehheval hhFalse ∨ Eventually( ρb0 .(p ∨ Next(updatehhEventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), s1 , Eventually( ρb.(p ∨ Next(Eventually(b, ~∅))), ~∅), b0 ii)), ~∅), s2 iiii 13. valuehheval hhFalse ∨ Eventually(ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), s2 iiii 14. valuehheval hhFalse, s2 ii ∨ eval hhEventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), s2 iiii At this point it is possible to choose how to proceed, i.e. either to continue by evaluating valuehh. . .ii or its argument. Here the latter is chosen, because this solution saves a derivation step. However, both approaches lead to the same result. 15. valuehhFalse ∨ eval hhp ∨ Next(Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅)), s2 iiii 16. valuehhFalse ∨ eval hhp, s2 ii ∨ eval hhNext(Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅)), s2 iiii 17. valuehhFalse ∨ True ∨ updatehhEventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), s2 , null, nulliiii 18. valuehhFalse ∨ True ∨ Eventually( ρb00 .updatehhp ∨ Next(Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅)), s2 , Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), b00 ii, ~∅)ii

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 77 19. valuehhFalse ∨ True ∨ Eventually( ρb00 .(updatehhp, s2 , Eventually(ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), b00 ii∨ updatehhNext(Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅)), s2 , Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), b00 ii), ~∅)ii 20. valuehhFalse ∨ True ∨ Eventually( ρb00 .(p ∨ Next(updatehhEventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), s2 , Eventually( ρb0 .(p ∨ Next(Eventually(b0 , ~∅))), ~∅), b00 ii)), ~∅)ii 21. valuehhFalse ∨ True ∨ Eventually(ρb00 .(p ∨ Next(Eventually(b00 , ~∅))), ~∅)ii Finally, the truth value of the rule at the right boundary of the trace is determined. Since the corresponding rule definition of Eventually(. . .) is defined as maximal, valuehh. . .ii simply evaluates to True. 22. valuehhFalse ∨ True ∨ Trueii 23. True

3.5.2

Eagle’s Monitoring Algorithm Extended

The following ternary rules LMxConcat(. . .), RMnConcat(. . .), RMxConcat(. . .), LMxSeqComp(. . .), RMnSeqComp(. . .) and RMxSeqComp(. . .), each of which corresponds to its respective mixfix operator (i.e. left maximal concatenation, right minimal concatenation, etc), are used during evaluation to keep track of the shortest/longest sub-traces by updating the their arguments depending on whether a cut could be made or not. For example, a formula dF1 e · F2 is rewritten during the initialisation as LMxConcat(F1 , F2 , null).7 With each evaluation step, it is tested whether the first argument would be satisfied if a cut were made at the current index. If this is the case, then the third argument is overwritten by the evaluation of F2 at the current index, where we have kept F2 untouched as the second argument of the rule. Otherwise, the third argument is evaluated at the current index, which is 7

We assume that the arguments of LMxConcat(. . .) have been initialised in this example, but we refrain from using the full notation of using inithhF1 , Z, b0 ii and inithhF2 , Z, b0 ii in order to simplify our explanation.

78

CHAPTER 3. TRACE COMPOSITION IN EAGLE

either null or refers to the evaluation of F2 from an earlier cut. In both cases, the evaluation of the first argument is carried forward to the next index by overwriting the first argument with an evaluated formula of the current index. As such, the first argument carries the evaluation of F1 over the whole trace, whilst the third argument captures the evaluation of F2 from the latest cut where a cut would satisfy the evaluation of F1 . The second argument is merely used to store F2 to begin a fresh evaluation for every cut that is made. At the end of the evaluation, it is either the case that the first argument becomes True, in which case it is tested whether F2 holds on the empty trace, or otherwise the truth value of the evaluation of F2 in the third argument is determined. The evaluation of the other rules is carried out similarly. The main difference between the explained rule evaluation in the last example and the remaining rules is in the utilisation of the third argument of the respective rules. The third argument can also be assigned the ternary rule List(. . .), which represents a linked list of evaluated F2 formulas in its second argument, whilst the first argument keeps track of values of the evaluated F1 formula at cuts and the third argument is a linked list “next pointer”.8 At the end of the evaluation, the list is sequentially traversed for an occurrence of a second argument of the list which evaluates to True, in which case the shortest/longest right-hand sub-trace is seen (depending on the ordering of the list), and the first argument of the list is taken to determine the truth value of the rule.

Definition 3.7 (Runtime-Verification Calculus of Eagle with Mixfix Operators) (Extension of Definition 3.6) Eagle’s calculus is extended by mixfix variants of sequential composition, such that

8

inithhbF1 c ◦ F2 , Z, b0 ii

= binithhF1 , Z, b0 iic ◦ inithhF2 , Z, b0 ii

inithhdF1 e ◦ F2 , Z, b0 ii

= ϕ(inithhF1 , Z, b0 ii, inithhF2 , Z, b0 ii, null)

inithhF1 ◦ bF2 c, Z, b0 ii

= ϕ(inithhF1 , Z, b0 ii, inithhF2 , Z, b0 ii, null)

inithhF1 ◦ dF2 e, Z, b0 ii

= ϕ(inithhF1 , Z, b0 ii, inithhF2 , Z, b0 ii, List(False, False, null))

We opted for the term List(. . .) in order to make clear that we are dealing with a linked list, whereas the rule itself is actually representing an element of the list with data in its first two parameters and the third parameter referring to the next element in the linked list. The end of the list is indicated by the value null.

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 79

valuehhnullii valuehhList(F1 , F2 , F3 )ii

= False =

if valuehhF2 ii = True then F1 else valuehhF3 ii

valuehhbF1 c · F2 ii valuehhLMxConcat(F1 , F2 , F3 )ii

= valuehhF1 ii ∧ valuehhF2 ii =

if valuehhF1 ii = True then valuehhF2 ii else valuehhF3 ii

valuehhRMnConcat(F1 , F2 , F3 )ii

=

if valuehhF2 ii = True then valuehhF1 ii else valuehhF3 ii

valuehhRMxConcat(F1 , F2 , F3 )ii valuehhbF1 c ; F2 ii

= valuehhAppend(F3 , List(valuehhF1 ii, F2 , null))ii =

False

valuehhLMxSeqComp(F1 , F2 , F3 )ii = valuehhRMnSeqComp(F1 , F2 , F3 )ii = valuehhRMxSeqComp(F1 , F2 , F3 )ii = valuehhF3 ii

eval hhnull, sii

= null

eval hhList(F1 , F2 , F3 ), sii

=

List(F1 , β, γ)

eval hhbF1 c · F2 , sii

=

if valuehhF1 ii = True then β else bαc · F2

eval hhLMxConcat(F1 , F2 , F3 ), sii

=

if valuehhF1 ii = True then LMxConcat(α, F2 , β) else LMxConcat(α, F2 , γ)

eval hhRMnConcat(F1 , F2 , F3 ), sii

=

RMnConcat(α, F2 , eval hhList(valuehhF1 ii, F2 , F3 ), sii) eval hhRMxConcat(F1 , F2 , F3 ), sii

=

RMxConcat(α, F2 , eval hhAppend(F3 , List(valuehhF1 ii, F2 , null)), sii) eval hhbF1 c ; F2 , sii

=

if valuehhαii = True then β else bαc ; F2

eval hhLMxSeqComp(F1 , F2 , F3 ), sii

=

if valuehhαii = True then LMxSeqComp(α, F2 , β) else LMxSeqComp(α, F2 , γ)

eval hhRMnSeqComp(F1 , F2 , F3 ), sii

=

RMnSeqComp(α, F2 , eval hhList(valuehhαii, F2 , F3 ), sii) eval hhRMxSeqComp(F1 , F2 , F3 ), sii

=

RMxSeqComp(α, F2 , eval hhAppend(F3 , List(valuehhαii, F2 , null)), sii)

80

CHAPTER 3. TRACE COMPOSITION IN EAGLE updatehhbF1 c ◦ F2 , s, Z, b0 ii updatehhϕ(F1 , F2 , F3 ), s, Z, b0 ii

= bupdatehhF1 , s, Z, b0 iic ◦ F2 =

ϕ(updatehhF1 , s, Z, b0 ii, F2 , F3 )

where α ≡ eval hhF1 , sii, β ≡ eval hhF2 , sii, γ ≡ eval hhF3 , sii, ◦ ∈ { ; , ·}, and ϕ denotes either the rule LMxSeqComp, RMnSeqComp, RMxSeqComp, LMxConcat, RMnConcat, RMxConcat, which is apparent from the context.  Theorem 3.5 (Semantical Equivalence between Eagle’s Logic and Calculus) The semantics of Eagle[] ’s calculus (Definition 3.7) coincide with the semantics of the corresponding logic (Definition 3.5). Proof. The rather long but otherwise straightforward proof of Theorem 3.5 is given in Appendix A. 

3.5.3

On-line Monitoring Complexity

We now consider the time and space requirements of the evaluation of concatenation, sequential composition and the mixfix operators. In [BGHS04a], it was shown that the time and space complexity of the future LTL fragment of Eagle is independent of the length of the monitoring trace. We first show below that the evaluation of a non-deterministic cut operator F1 ◦ F2 , ◦ ∈ { ; , ·}, whose operands are free of cut formula may require O(|σ|2 ) number of calls to evaluate F2 (Theorem 3.6). However, for the mixfix operators with restrictions on the left, the complexity is reduced to being independent of the trace length again (Theorem 3.7). Consider an arbitrary formula F1 · F2 . The state evaluation rule for concatenation is given by eval hhF1 · F2 , sii =

if valuehhF1 ii = True then (eval hhF1 , sii · F2 ) ∨ eval hhF2 , sii else eval hhF1 , sii · F2

In the worst case scenario of evaluating concatenation, a non-deterministic cut is made at each state of a trace. This can be enforced by the formula True · F2 . On an arbitrary trace σ, True · F2 is evaluated as valuehheval hh. . . eval hheval hhinithhTrue · F2 , null, nullii, s1 ii, s2 ii . . ., s|σ| iiii

3.5. ON-LINE MONITORING OF DETERMINISTIC CUT OPERATORS 81 By straightforward applications of rules of Eagle[] ’s calculus, the evaluation can be unfolded as valuehh

|σ| _

eval hh. . . eval hheval hhinithhF2 , null, nullii, sn ii, sn+1 ii . . ., s|σ| iiii

n=1

Relative to the evaluation of F2 , the formula requires (|σ|2 +|σ|)/2 applications of eval hh. . .ii. This argument can be carried forward to sequential composition as well, and additionally, to all mixfix operators with restrictions on the right operand. Theorem 3.6 (Trace-Composition Operators with O(|σ|2 ) Complexity) For a given trace σ, the operators F1 · F2 , F1 ; F2 , F1 · bF2 c, F1 · dF2 e, F1 ; bF2 c and F1 ; dF2 e require up to O(|σ|2 ) applications of eval hh. . .ii in addition to the applications required to evaluate F1 and F2 . Proof. Only the proof for the mixfix variants of concatenation is provided, since it is straightforward to carry the results forward for the mixfix variants of sequential composition. Case F1 · F2 : Consider the state evaluation rule for F1 · F2 , i.e. eval hhF1 · F2 , sii =

if valuehhF1 ii = True then (eval hhF1 , sii · F2 ) ∨ eval hhF2 , sii else eval hhF1 , sii · F2

In the worst case scenario of evaluating concatenation, a non-deterministic cut is made at each state of a trace. This can be enforced by the formula True · F2 . On an arbitrary trace σ, True · F2 is evaluated as valuehheval hh. . . eval hheval hhinithhTrue · F2 , null, nullii, s1 ii, s2 ii . . ., s|σ| iiii By straightforward applications of rules of Eagle[] ’s calculus, the evaluation can be unfolded as valuehh

|σ| _

eval hh. . . eval hheval hhinithhF2 , null, nullii, sn ii, sn+1 ii . . ., s|σ| iiii

n=1

Relative to the evaluation of F2 , the formula requires (|σ|2 +|σ|)/2 applications of eval hh. . .ii.

82

CHAPTER 3. TRACE COMPOSITION IN EAGLE Case F1 · bF2 c: The evaluation for an arbitrary trace σ is reflected by valuehheval hh. . . eval hheval hhinithhF1 · bF2 c, null, nullii, s1 ii, s2 ii . . ., s|σ| iiii

By applying Eagle[] ’s evaluation rules, it is straightforward to see that List is used to store inithhF2 , null, nullii, eval hhinithhF2 , null, nullii, s|σ| ii, eval hheval hhinithhF2 , null, nullii, s|σ|−1 ii, s|σ| ii, etc., up to eval hh. . . eval hheval hhinithhF2 , null, nullii, s1 ii, s2 ii . . ., s|σ| ii. With the evaluation of F1 over indices 1 . . . |σ|, which is tracked within the first argument of RMnConcat, there are (|σ|2 + |σ|)/2 + |σ| occurrences of eval hh. . .ii. Case F1 · dF2 e: The same holds for evaluating F1 · dF2 e on an arbitrary trace σ. This can be easily seen from the evaluation rule eval hhRMxConcat(F1 , F2 , F3 ), sii = RMxConcat(α, F2 , eval hhAppend(F3 , List(valuehhF1 ii, F2 , null)), sii) which coincides with the evaluation rule for F1 ·bF2 c, save for the reversed ordering of List-elements.  When we consider a mixfix formula with deterministic restrictions on the left operand, e.g. bF1 c · F2 , then we can show that we only need linear-space for its evaluation – relative to the space required to evaluate the operands. By taking the state-evaluation rule for bF1 c · F2 , i.e. eval hhbF1 c · F2 , sii =

if valuehhF1 ii = True then eval hhF2 , sii else beval hhF1 , siic · F2

one can immediately see that the non-deterministic choice of concatenation (i.e. (eval hhF1 , sii ·F2 ) ∨eval hhF2 , sii) is replaced by a single application of eval hh. . .ii. We can carry this forward to all mixfix operators with restrictions on the left operand, so that we obtain the following result: Theorem 3.7 (Trace-Composition Operators with O(|σ|) Complexity) For a given trace σ, the left mixfix operators bF1 c · F2 , dF1 e · F2 , bF1 c ; F2 and dF1 e ; F2 require only O(|σ|) applications of eval hh. . .ii in addition to the applications required to evaluate F1 and F2 .

3.6. SUMMARY

83

Proof. Here, only the proofs for concatenation are given. It is straightforward to carry them over to the case of sequential composition. Space requirements of bF1 c·F2 : The space requirements for evaluating bF1 c·F2 follows along the lines of the previous proof (Theorem 6). By taking the stateevaluation rule for bF1 c · F2 , i.e. eval hhbF1 c · F2 , sii =

if valuehhF1 ii = True then eval hhF2 , sii else beval hhF1 , siic · F2

one gets immediately the unfolding for valuehheval hh. . . eval hheval hhinithhbTruec · F2 , null, nullii, s1 ii, s2 ii . . ., s|σ| iiii, which is valuehheval hh. . . eval hheval hhinithhF2 , null, nullii, s1 ii, s2 ii . . ., s|σ| iiii. As such, the evaluation of bF1 c · F2 needs only |σ| applications of eval hh. . .ii. dF1 e · F2 : It is trivial to see that the rule eval hhLMxConcat(F1 , F2 , F3 ), sii = if valuehhF1 ii = True then LMxConcat(eval hhF1 , sii, F2 , eval hhF2 , sii) else LMxConcat(eval hhF1 , sii, F2 , eval hhF3 , sii) leads to the evaluation of F1 and F2 /F3 exactly |σ|-times, which is the least number of eval hh. . .ii-applications possible. 

3.6

Summary

For the runtime verification logic Eagle, we have shown that concatenation and sequential composition are equally expressive. We introduced mixfix operators which limit the possible number of cuts of sequential composition and concatenation, so that one of the operator’s parameters has to be satisfied on a sub-trace of minimal or maximal length. These mixfix variants of sequential composition and concatenation are deterministic counterparts of their corresponding non-mixfix

84

CHAPTER 3. TRACE COMPOSITION IN EAGLE

operators as we demonstrated in several examples. We showed that the semantics of the mixfix operators are already definable in unextended Eagle and we presented semantically equivalent mixfix operator free formulas accordingly. We extended Eagle’s on-line monitoring calculus with rules for the mixfix operators, where we proved that right-hand side restricted mixfix operators evaluate as efficiently as their non-deterministic counterparts, whilst left-hand side restricted mixfix operators can be evaluated more efficiently.

Chapter 4 Model-Checking Context-Free Properties Model-checking permits the verification of a system’s behaviour against its purported specification. This is in contrast to the runtime verification approach where only one possible behaviour of a system is investigated at a time. As in the case of runtime verification, the actual verification is fully automatic, which is achieved by restricting the expressiveness of system behaviours and behavioural specifications in such a way that the model-checking problem is decidable. Whilst we can freely choose the formalism in which we define specifications, and thus make an implicit decision about the behavioural properties that are expressible in our specifications, this is not the case for software that is written in the popular programming languages C/C++/C# or Java, which are all expressively equivalent to a Turing-machine. As such, a straightforward reachability property, such as determining the termination of an arbitrary program, cannot be verified due to the undecidability of the corresponding halting problem for Turing-machines. It is therefore necessary to consider an abstraction of the original program behaviour, which aims to preserve as many behavioural properties of the program under inspection, whilst maintaining a decidable model-checking problem. The block diagram in Figure 4.1 on the next page visualises this process. We address the model-checking problem in a general context, i.e. we focus on the expressiveness of behavioural specifications and abstracted program behaviours. We approach the verification from a formal language point of view that enables us to relate the behaviours of systems and specifications to language classes over finite and infinite words – which we interpret as finite and 85

86

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Figure 4.1: Overview of the model-checking process infinite traces – and establish a language hierarchy of languages related to modelchecking. Before we focus on the formal language approach to verification, we present a grammatical representation of ωVPLs that is based on a structural property of matchings as they appear in ω-visibly pushdown words. The structure of the productions will provide some insight in the representation of 1 : 1-matchings in general, which serves as the basis for our generalised language theoretic representation of languages whose expressiveness is beyond the expressiveness of ωVPLs. We establish constraints for which the model-checking problem for these language classes is still decidable and we provide examples that illustrate their practical relevance for the behavioural verification of systems.

4.1

Introduction

Model-checking is a verification technique where it is decided whether the behaviour of a given system adheres to the behaviour of a given specification, both of which are represented as sets of traces. These traces are interpreted as words of formal languages, as we have explained at the beginning of Chapter 2. The model-checking problem of verifying whether a model M adheres to a specification S is then expressible as the language inclusion problem L(M ) ⊆ L(S), where L(M ) and L(S) denote the languages, also known as behaviours, of the model and specification respectively. Model-checking properties such as safety, liveness and fairness play an important role in the verification of hardware and software systems, [GPSS80, OL82,

4.1. INTRODUCTION

87

Sis94]. We can briefly recapture the definition of a safety property here: “something bad never happens” (Section 1.2). As such, this definition and the definitions of liveness and fairness are language independent and not bound to a particularly required expressiveness. The language independence stems from the wording “something”/“some”, which represents a placeholder for an arbitrary but concrete specification, so that we can interpret safety, liveness and fairness as parameterised templates. In Eagle, we gave examples of safety and liveness in Example 3.2, where we filled the placeholders with the formulas Deadlock() and Terminate() respectively. By leaving out the latter arguments, we can restate the formulas more generally as: mon Safety = ¬Eventually(F1 ) mon Liveness = Eventually(F2 ) and thus, we interpret the formulas F1 and F2 as parameters. The resulting parameterised formulas are then expressing regular behaviour in the case of finite traces and ω-regular behaviour in the case of infinite traces – in respect to their parameters – since we can rephrase them in terms of regular expressions Σ∗ · L(F1 ) · Σ∗ and Σ∗ ·L(F2 )·Σ∗ , and as ω-regular expressions Σ∗ · L(F1 ) · Σω and Σ∗ ·L(F2 )·Σω . As such, we are referring to safety, liveness and fairness as ω-regular model-checking properties, whilst their instantiation with a concrete parameter substitution might be beyond the expressiveness of ω-regular languages. Call-stack behaviour, to the contrary, is a ω-visibly-pushdown behaviour. The latter can be seen as a refinement of earlier approaches to model-checking callstack behaviour, [Esp97, EN98, KPV02], which permitted the behaviour of system models to be non-ω-regular, but specifications themselves were still restricted to ω-regular properties. As such, in the latter approaches, call-stack behaviour can be modelled in the abstraction of the system, but it is not possible to actually verify it. ωVPLs unite the expressiveness of system model behaviours and behavioural specifications, in the sense that both are equally expressive, which means that every behaviour of the system model can also be verified.

4.1.1

Model-Checking Nested Context-Free Behaviour

Alur et.al. studied various alternative automata-theoretic and logical representations of visibly-pushdown languages over finite and infinite words, [AM04, AM06,

88

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

ACM06], but nested behaviour in words is not restricted to calls and returns. For example, one may interpret matchings as the synchronisation of requests and acknowledgements as they occur in communication protocols. These matchings between calls and returns, or requests and acknowledgements, are matchings between two symbols each, so that we are also referring to them as 1 : 1 matchings. We can also view 1 : 1 matchings as a linear equation x − y = 0 which acts as a constraint over words w of some language, where x denotes the number of calls and y denotes the number of returns in w, where we simply assume that returns follow calls in w. Over finite words, the relevance of linear equations in relation to verification problems was addressed in [ISD+ 00]. Similarly, in [Cau06], the synchronisation of terminals by a regular transducer over finite words was given as an extension of visibly pushdown languages over finite words. A non-arithmetic approach for finite words was taken in [FP01] and [ENS07], where subclasses of deterministic context-free languages that are suitable for model-checking were investigated. It is an open question whether the results for the finite word languages can be carried forward to infinite word languages. The extension of ωVPLs we provide in this chapter addresses the generalisation of matchings, where we preserve a decidable language inclusion problem for equiexpressible system model behaviour and behavioural specifications. The envisaged applications of the studies in this chapter are in model-checking, where specifications beyond ω-visibly-pushdown behaviour permit the verification of arithmetic matchings as described above, but also non-arithmetic properties, e.g. the verification of stack behaviour other than what can be observed in call-stacks. We consider two examples of non-ωVPLs in the following: in Example 4.1 a datastack verification problem is given, and in Example 4.2 a counting property is verified. Example 4.1 (Data-Stack Behaviour in Model-Checking) In a producer/consumer-pattern, where a producer pushes terminals onto a stack that are later removed by a consumer such as we have outlined in Example 1.2, it is of interest to verify whether the stack is implemented correctly. In this example we consider such a pattern, where we take multiple producers into account that successively fill the stack, until eventually a single consumer empties the stack and the process starts over again. Let w = a1 a2 . . . a|w| denote a finite word, whose terminals are pushed onto the stack in the order a1 , a2 , . . . , a|w| . When emptying the stack, the terminals

4.1. INTRODUCTION

r

g

| {z } |r| = |g|

89

c

r

r

g

g

|

{z |g| ≤ |r| ≤ 2 · |g|

g

c

}

Figure 4.2: Embedded 1 : 1- and 1 : 2-matchings will appear in the reverse order, i.e. a|w| , a|w|−1 , . . . , a1 . We write wr to denote the reversal of a word w. The aforementioned producer/consumer-pattern can then produce infinite words of the form consumer producers }| {z }| { z r r r cp w1 rp cp w2 rp . . . cp wn rp cc wn wn−1 . . . w1 rc cp w10 rp cp w20 rp . . . where we write cp , rp , cc and rc to denote a call to the producer function, a return from the producer function, a call to the consumer function and a return from the consumer function, respectively. Occurrences of wi , i ∈ N, denote finite words that represent the terminals that are pushed onto the stack. The words of our example given above form a deterministic context-free language.1 However, it is not an ωVPL, since ωVPLs can only match two distinct terminals. In this example, a matching has to be established between two occurrences of one and the same terminal, which is beyond the expressiveness of ωVPLs since any two matching terminals in an ωVPL have to be in two mutually disjoint sets of terminals, [AM04].

Example 4.2 (Counting Properties in Model-Checking) In Example 1.3, we introduced counting properties in a behavioural abstraction of a communication between a client and server system, where each resource request by a client was followed by one or two grants of resources. From another point of view, the number of occurrences of requests and grants, which we write 1

Finite word languages of palindromes with a unique center marker are known to be deterministic context-free, [HMU01]. In our example, cc acts as such a center marker.

90

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

as |r| and |g| respectively, within any word of the presented 1 : 1- and 1 : 2matchings satisfies the relation |g| ≤ |r| ≤ 2 · |g|. We have depicted a closer view of the properties of these matchings in Figure 4.2 on the previous page, in which we have added a delimiter letter c to denote the completion of all outstanding requests, and one can see that the given inequation is true for any subword that is determined by the furthest spanning arc that ends just before an occurrence of a c. Alternatively, the counting property can be interpreted as the ratio between occurrences of requests and grants, for example in the following word r...g...g...c...r...r...g...g...g...c... {z } | {z } | 1×r : 2×g

2×r : 3×g

the ratios are 21 and 32 between occurrences of r and g. It is straightforward to see that the ratio will always be situated between 12 and 1, which can serve as another view on counting properties. No matter which view is taken on these matchings, they cannot be formulated in terms of visibly-pushdown languages, since arcs are between two terminals only, whereas 1 : 2-matchings require two arcs in our chosen representation as shown in Figure 4.2. However, as we have pointed out earlier, our graphical representation of arcs disagrees with the more general representation given in [LST95] that applies to all context-free languages, so that we rather refer to the ratio between requests and grants to argue that the shown matchings are beyond the expressiveness of visibly-pushdown languages. As has been pointed out when VPLs were first defined, [AM04], matchings in VPLs are strictly between two terminals due to the transitions a VPA can make: only one symbol can be pushed onto the stack at the time when a call (request in this example) is seen, and upon a return (grant, here) exactly one symbol is removed from the stack or the stack is empty and remains empty. This led to the definition of summary-edges, [AM04], which are bound to contain a matching number of occurrences of calls and returns, whilst terminals outside of summary-edges are not matched. As such, the only ratio between two distinguished sets of terminals that is expressible thereby is 1.

4.2. AUTOMATA OVER INFINITE WORDS

4.1.2

91

Chapter Outline

In Section 4.2, we introduce automata theoretic representations of formal languages, which address the languages of [AM04]. Section 4.3 describes our grammatical representation of ωVPLs, which has been published in [BB07]. We then carry the approach that we have taken for the grammatical representation further in Section 4.4, where we give an language theoretic extension of model-checking which permits the verification of a larger class of behavioural properties than currently known. Finally, Section 4.5 summarises the achievements of this chapter.

4.2

Automata over Infinite Words

Automata are word recognising systems that are defined as transition systems over a finite number of states, where transitions between states are utilised to recognised letters of an input word – each at a time, [Tho94, Chapter 4, Part I]. So called final states are used in conjunction with a formal acceptance condition in order to determine whether a word is recognised or rejected by an automaton, where all recognised words form as a set the automaton’s language.

4.2.1

Regular Language Representations

Automata theoretic representations can be given for both regular and ω-regular languages, but in this thesis we do not address finite word regular languages, and as such, we only define automata whose expressiveness coincides with the ω-regular languages (ωRLs). An automaton reads an infinite word w = a1 a2 a3 . . ., where a1 , a2 , a3 , . . . ∈ Σ, by starting from a designated initial state, after which, each letter a ∈ Σ occurring in w from left to right causes transitions between states. Transitions are therefore driven by letters of Σ, where each transition of the automaton is labelled with such a letter, and a state change from one state q to a next state q 0 can only be performed if the current letter of the word the automaton is reading matches the labelling of the transition. When reading an infinite word, the automaton will pass some of its finite states infinitely often, which is used to determine whether the word is recognised by the automaton (accepted by the automaton) or whether it is rejected. Formally, the series of states that are visited upon reading a word w is denoted by r(w), which

92

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

we call a run of the automaton, and we write inf(r(w)) to denote the set of states that appear infinitely often in r(w). We consider two acceptance conditions in this thesis, both of which address the set inf(r(w)), where • w is accepted when inf(r(w)) ∩ F 6= ∅, with F being a designated set of accepting states (B¨ uchi-acceptance), and • w is accepted when inf(r(w)) ∈ F, with F containing sets of states that need to be visited infinitely often (Muller-acceptance). For the remainder of the thesis, we simply assume that every automaton is equipped with a complete transition relation, in the sense that transitions are defined for every letter in Σ for all states of the automaton. Both automata are equally expressive, i.e. they both can be used to describe ω-regular languages, but only Muller-automata can be determinised so that there is no ambiguity between transitions when going from one state to a next state. Formally, we define the automata as follows. Definition 4.1 (B¨ uchi-Automata, [B¨ uc62]) A B¨ uchi-automaton is a structure A = (Q, Σ, δ, qi , F ), where • Q is a finite set of states, • Σ is a finite set of letters, • δ is a finite set of transitions between states p, q ∈ Q for inputs a ∈ Σ of a the form p −→ q, • qi ∈ Q is a designated initial state and • F denotes a set of final states. The language L(A) of a B¨ uchi-automaton A is denoted by the set of infinite words {w | inf(r(w)) ∩ F 6= ∅}.  Definition 4.2 (Muller-Automata, [Mul63]) A Muller-automaton is a structure A = (Q, Σ, δ, qi , F), where • Q is a finite set of states, • Σ is a finite set of letters, • δ is a finite set of transitions between states p, q ∈ Q for inputs a ∈ Σ of a the form p −→ q,

4.2. AUTOMATA OVER INFINITE WORDS

93

• qi ∈ Q is a designated initial state and • F denotes a set of final-state sets. The language L(A) of a Muller-automaton A is denoted by the set of infinite words {w | inf(r(w)) ∈ F}.  In [McN66], it has been shown that B¨ uchi-automata and Muller-automata are equiexpressive, and in fact, for any non-deterministic B¨ uchi-automaton a deterministic Muller-automaton can be constructed. The latter automata will be of little importance in the following, except for the proof of Theorem 4.2. B¨ uchiautomata, and especially the B¨ uchi-acceptance condition, will be used in the rest of the thesis for determining the acceptance of infinite words.

4.2.2

Visibly-Pushdown Language Representations

Visibly-pushdown languages are represented by finite-state automata that are equipped with an additional stack which can be manipulated when a transition between states is made, [AM04]. Generally, these automata are referred to as pushdown automata, where the operations on the stack are not restricted, in the sense that it is always permitted to read the top-most stack symbol, to push another symbol onto the stack, to pop the top-most stack symbol off the stack, or simply to leave the stack untouched. As such, the stack operations that can be performed by a pushdown automaton can be chosen independently of any letter in Σ among transitions. The automaton that represents the visibly-pushdown languages, i.e. a visibly-pushdown automaton, is constrained in such a way that the alphabet Σ is taken as the disjoint union of three alphabets Σc , Σi and Σr , which represent calls, internal actions and returns, respectively, and transitions need to • push exactly one symbol onto the stack on transitions labelled with calls, where it is permitted to read the top-most symbol as well as a letter, • do not access the stack at all on transitions labelled with internal actions, and • pop the top-most symbol of the stack (if it is non-empty) on transition labelled with returns, whilst not peeping onto the stack, and leaving an empty stack unaltered.

94

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

It is therefore possible to predict a visibly-pushdown automaton’s operation on the stack for each letter of an input word. Before we give the formal definition of ω-visibly-pushdown automata (ωVPAs), i.e. visibly-pushdown automata over infinite words, it is necessary to reconsider our definition of runs, since visibly-pushdown automata in general permit the reading of a word from a set of designated initial states, rather than just from a single designated initial state, as it is the case for B¨ uchi- and Muller-automata. As such, a visibly-pushdown automaton non-deterministically chooses one of the initial states, from which it starts reading an arbitrary word w, so that runs r(w) can start from any state that is a designated initial state. Definition 4.3 (Visibly-Pushdown Automata, [AM04]) A visibly-pushdown automaton is a sextuple A = (Q, Σ, Γ, δ, Q0 , F ), where • Q is a finite set of states {p, q, q0 , q1 , . . .}, • Σ = Σc ∪ Σi ∪ Σr , with Σc , Σi and Σr being mutually disjoint finite sets of terminals representing calls c, c0 , c1 , . . . , ck , internal actions i, i0 , i1 , . . . , il , and returns r, r0 , r1 , . . . , rm , respectively, • Γ is a finite set of stack symbols A, B, C, . . ., including the stack bottom marker ⊥, • δ is a finite set of transition rules between states p, q ∈ Q for inputs c ∈ Σc , c,κ/Bκ i ∈ Σi , or r ∈ Σr and stack symbols A, B ∈ (Γ\{⊥}) of the form p −−−−→ q i,κ/κ r,⊥/⊥ r,A/ε for all κ ∈ Γ, p −−−−→ q for all κ ∈ Γ, p −−−−→ q, or p −−−−→ q, • Q0 ⊆ Q denotes a non-empty set of designated initial states, and • F ⊆ Q is the set of final states. x,κ/ν

The notation p −−−−→ q, with p, q ∈ Q, x ∈ Σ, κ ∈ Γ, and ν ∈ ((Γ · Γ) ∪ Γ ∪ {ε}), is used to denote that the automaton makes a state transition from p to q whilst reading x and seeing κ on top of the stack, where • ν ∈ (Γ · {κ}) denotes that a stack symbol is written to the stack, • ν = κ denotes that the stack is left unchanged, and • ν = ε denotes that the top-most symbol is removed from the stack. The language L(A) of a visibly-pushdown automaton A over finite words is denoted by the set of words {w | qn ∈ F, r(w) = q1 q2 q3 . . . qn−1 qn } and the language of A over infinite words is denoted by the set of words {w | inf(r(w))∩F 6= ∅}. 

4.2. AUTOMATA OVER INFINITE WORDS

95

X c

c

a

r

r

(a) A minimally well-matched word

c

c

a

X a

r

(b) First call unmatched

a

c

a

r

r

(c) Last return unmatched

Σc = {c}, Σi = {a} and Σr = {r}

Figure 4.3: Structure of matchings in visibly-pushdown languages In words of VPLs and ωVPLs, calls and returns can appear either matched or unmatched as we have depicted in Figure 4.3, where a call or return are said to be unmatched when it is not possible to assign an arc for them. Trivially, unmatched calls cannot be followed by unmatched returns, since there would be an opportunity to draw an arc between them, but unmatched returns can be followed by unmatched calls. We say a word w of the form cαr, c ∈ Σc , r ∈ Σr and α ∈ Σ∗ , is called minimally well-matched, iff c and r are matching, which implies that α contains no unmatched calls or returns (Figure 4.3). The set of all minimally well-matched words, i.e. the language of minimally well-matched words, is denoted by Lmwm ([LMS04], p. 412, par. 8). Minimally well-matched words can be abstracted in runs of ωVPAs, where the transitions an automaton would undergo are substituted by a summary-edge, which is a triple (p, q, f ), p, q ∈ Q, and f ∈ {0, 1}. The parameter f denotes whether a final state was visited when going from p to q, f = 1, or not, f = 0. We write L((p, q, f )) to denote the set of words that the summary-edge (p, q, f ), p, q ∈ Q, and f ∈ {0, 1}, describes. If we extract every minimally well-matched word from the language of an ωVPA A, we obtain what is called a pseudo-run of that automaton: Definition 4.4 (Pseudo-Runs of ω-Visibly Pushdown Automata, [AM04]) A pseudo-run of an ωVPA A = (Q, Σ, Γ, δ, Q0 , F ) is a word w = α1 α2 α3 . . . with S αi ∈ (Σ ∪ m n=1 {Ωn }), and each Ωn denotes a non-empty set of summary-edges of the form (p, q, f ) with p, q ∈ Q and f ∈ {0, 1}. In case αi is a call, then there is no αj , i < j, which is a return, and there is a word w0 = β1 β2 β3 . . ., w0 ∈ L(A), so that either αi = βi , or αi = Ωk and βk is a minimally well-matched word that is generated due to A moving from state p to q, p, q ∈ Q and (p, q, f ) ∈ Ωk , where

96

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

in case f = 1 then a final state is on the path from p to q, or not in case of f = 0.  According to [AM04], p. 210, par. 6, a non-deterministic B¨ uchi-automaton can be constructed that accepts all pseudo-runs of an arbitrary given ωVPA, and for every pseudo-run that is represented by the B¨ uchi-automaton, there exists a corresponding accepting run of the original ωVPA.

4.2.3

Motivation for a Grammatical Representation

We are presenting a grammatical approach towards ωVPLs that provides an alternative view on language specifications, in which matchings between calls and returns are made explicit. An informal example of our work is shown in Figure 4.4 on the next page, which depicts the various representations to describe the behaviour of a program that traverses a continuous supply of finite binary trees. Figure 4.4(a) shows the pseudo code which we utilise to traverse the binary trees, whilst Figures 4.4(b)-(d) display the formal language representations of the behaviour of the pseudo code, where the terminals t and r denote occurrences of call to traverse(...) and return respectively.2 Both representations in terms of automata and grammars have their particular benefits, which we highlight in the next paragraph. The automata theoretic representation is an operational approach, in the sense that transitions of the ωVPA in Figure 4.4(b) make use of the stack to determine further actions, or more precisely, the stack symbol $ is used to determine matchings between the calls and returns that initiate the traversal of a tree at the root, while the symbols L and R are used for matchings in a left- and right-branch of a tree, respectively. On the other hand, the grammatical representation reflects a denotational approach, where the traversal from the root node down to left- and right-branches is directly reflected within the production rules. For example, the traversal of a tree whose root’s children are leaves, i.e. the tree that represents the letter sequence ttrtrr, is t/$

t/L

r,L

t/R

• recognised by the sequence of transitions q1 −−→ q2 −−→ q2 −−→ q3 −−→ t/$ t/L t/R r,R r,$ r,L r,R r,$ q2 −−→ q4 −−→ q5 or q5 −−→ q2 −−→ q2 −−→ q3 −−→ q2 −−→ q4 −−→ q5 , depending on whether the tree is the first tree to be traversed or not, or • generated by the derivation S1 → tS1 S1 r → ttrS1 r → ttrtrr. 2

The omission of a terminal that denotes a call to the function main() is a choice of abstraction that we have made to focus on the matchings between calls and returns.

4.2. AUTOMATA OVER INFINITE WORDS

97

function main() do forever traverse(getTree()) function traverse(node n) if ’n is not a leaf’ then traverse(n’s left child) traverse(n’s right child) return

(a) Pseudo code for traversing a continuous supply of binary trees

(b) Behavioural representation as an ω-visibly pushdown automaton

S → g1 S S1 → aS1 S1 a | aa

S → g1 S S1 → Str $ Str → tStr Str r | tr

F = {S} h(a) = t, h(a) = r

F = {S}

(c) Behavioural representation as an ω-regular (d) Behavioural representation as an ω-regular matching-avoiding injector grammar with an (Σ \ {$})∗ · {$}-factor avoiding grammar with injected balanced grammar and morphism to injected deterministic context-free grammar map terminals of the balanced grammar to that has the prefix property calls and returns Notational remarks: In figures (c) and (d), the terminal g1 denotes a surrogate terminal, which can be substituted by the non-terminal S1 . As such, we can interpret the production S → g1 S to be expressively equivalent to S → S1 S. The notation F = {S} denotes a B¨ uchi-acceptance condition, so that the non-terminal S has to appear infinitely often in the derivation of an infinite word in this example. The terminal $ in figure (d) denotes a designated end-marker, which is erased from words due to a morphism h($) = ε (h(a) = a otherwise) that we implicitly associate with factor avoiding languages in the following.

Figure 4.4: A binary-tree parsing program and its behaviour expressed by automata and grammars

98

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Figure 4.5: Language classes and their automata-, grammar- and logic-representations In the next section, Section 4.3, we formally address matchings in ωVPLs and give a grammatical representation that describes then. The grammatical representation of the matchings gives an insight into the structure of call/returnoccurrences. In Section 4.4, we then give a generalisation of our grammatical approach and present a language theoretic study of matchings and other nested behaviours, which effectively provides an extension of model-checking, where examples at the end of the section show practical applications of languages that exceed the expressiveness of ωVPLs. Section 4.5 summarises this chapter. Figure 4.5 gives an overview of the language classes addressed in the following, it shows their relationship towards each other and their various representations as automata, grammars and logics.

4.2. A GRAMMATICAL REPRESENTATION

4.3

99

A Grammatical Representation of Visibly Pushdown Languages

A grammatical representation of ωVPLs is presented, where we take a compositional approach that builds on pseudo-runs and minimally well-matched words as of [AM04]. We first state our grammatical representation and then decompose it into two types of grammar. We show the resemblance of pseudo-runs and minimally well-matched words, similar to the approach for ωVPAs. A grammatical representation already exists for visibly-pushdown languages over finite words, [AM05], which is available online in a revised and extended version of the original paper on visibly pushdown languages, [AM04]. Even though there are similarities between the grammatical representation of visibly-pushdown languages over finite words of [AM05] and our grammatical representation of ω-visibly-pushdown languages, we do not elaborate the representation for finite words in this thesis since there are considerable differences in the derivation of words when moving from finite words to infinite words. The aim of our grammatical representation is rather to provide a foundation for representing call-stack behaviour in infinite words which we then extend to data-stack behaviour and counting properties in Chapter 4, Section 4.4, rather than finding production rules of grammars that permit the expression of matching calls and returns. Before we proceed and give our grammatical representation, we have to introduce balanced grammars over finite words and a syntactical simplification of them. The latter will be used to encode words of Lmwm , which we utilise in our grammatical representation to describe call- and return-matchings.

4.3.1

Balanced and Quasi Balanced Grammars

Balanced grammars are a specialisation of context-free grammars over finite words [BB02]. Unlike the previous definition of CFGs, Definition 2.1 in Chapter 2, balanced grammars are permitted to have an infinite set of productions. This is due to the permitted use of regular expressions over terminals and/or nonterminals in right-hand sides of productions. As already pointed out in [BB02], an infinite set of productions does not raise the grammars’ expressiveness, but provides a succinct notation and the derivation relation that we have given for context-free grammars is still applicable to balanced grammars. The practicality of allowing regular expressions in derivation

100

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

based rule systems has been studied much earlier on, when Wirth formulated the syntax and semantics of the Extended Backus-Naur Form, [Wir77]. The latter notation is a syntactical variation of context-free grammars, for example, the productions A → aA | b might be rewritten in Extended Backus-Naur Form as A ::= a∗ b. We do not make explicit use of the Extended Backus-Naur Form in this thesis, but we do simplify productions using regular expressions as defined in [BB02] and presented below. Definition 4.5 (Balanced Grammars over Finite Words, [BB02]) A balanced grammar (BG) G over finite words is a quadruple (V, Σ ∪ Σ ∪ Σ, P, S) is a specialisation of a context-free grammar, where • V is a finite set of non-terminals A, B, . . ., • Σ and Σ are finite sets of terminals a1 , a2 , . . . , ak and co-terminals a1 , a2 , . . . , ak respectively, where each terminal ai is associated with its unique counterpart, ai , its co-terminal, and vice versa, • Σ is a finite set of terminals a, b, . . ., • the sets Σ, Σ, and Σ are mutually disjoint, • P is a finite or infinite set of productions of the form V × a(V ∪ Σ)∗ a, a ∈ Σ and a ∈ Σ, and • S denotes a designated starting non-terminal S ∈ V . The language L(G) of a BG G = (V, Σ ∪ Σ ∪ Σ, P, S) is the set of words that ∗ are derivable from the initial symbol, i.e. L(G) = {w | S → w and w ∈ (Σ ∪ Σ ∪ Σ)∗ }.  In the following, we write R to denote an arbitrary regular expression over V ∪ Σ, so that a production in V × a(V ∪ Σ)∗ a can for example be written as A → aRx a, with A ∈ V , a ∈ Σ, a ∈ Σ and Rx ⊆ (V ∪ Σ)∗ . Example 4.3 (Balanced Grammar over Finite Words) In Figure 4.4(c), the production S1 → aS1 S1 a | aa is interpreted as a production of an implicitly defined balanced grammar with appropriate sets of non-terminals, terminals/co-terminals, productions and S1 denoting the designated starting nonterminal. The language of the balanced grammar matches the pattern between calls and returns as they can be observed in the behaviour of the pseudo code

4.3. A GRAMMATICAL REPRESENTATION

101

depicted in Figure 4.4(a) – under the mapping of terminals a to t and the mapping of co-terminals a to r. In order to simplify our proofs, we give an alternative – but expressively equivalent – definition of BGs, where only a finite number of productions is admitted. We reformulate occurrences of regular expressions R in terms of production rules PR and substitute each R by an initial non-terminal SR that appears on a lefthand side in PR . Therefore, matchings aRa, a ∈ Σ and a ∈ Σ, become aSR a, where the derivation of SR resembles L(R). Definition 4.6 (Quasi Balanced Grammars over Finite Words) Let G = (V, Σ ∪ Σ ∪ Σ, P, S) denote an arbitrary BG, a quasi balanced grammar (qBG) G 0 = (V 0 , Σ∪Σ∪Σ, P, S) generalises G by having a finite set of productions, where productions are either a) in double Greibach normal form A → aSR a, A, SR ∈ V 0 , a ∈ Σ and a ∈ Σ, or b) of form A → BC, A → aC, or A → ε, A, B, C ∈ V 0 and a ∈ Σ, where B’s productions are of the form given by a) and C’s productions are of a form given by b).  Example 4.4 (Quasi Balanced Grammar over Finite Words) Figure 4.6(a) on the next page depicts the pseudo code for traversing n-ary trees, which is a generalisation of the pseudo code for traversing binary trees. The explicit traversal of the left- and right-child has been replaced by an iteration over all of a node’s children, so that for a node with m children the function traverse(node n) will be called m times. In Figure 4.6(b), where S denotes the production and designated started non-terminal of an implicit balanced grammar, the repetitive calls are represented by the regular expression S + . For example, a unary node is reflected in a derivation by the factor aSa, a binary node is reflected by aSSa, a ternary node is reflected by aSSSa and so on. It should be noted that the traversed tree does not have to be of fixed arity, which is due to the unconstraint iteration of S in S + , but the tree has to be of finite width since the iteration S + is finite. In Figure 4.6(c), the balanced grammar of Figure 4.6(b) has been reformulated in terms of a quasi balanced grammar. For the quasi balanced grammar the

102

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

function traverse(node n) if ’n is not a leaf’ then for i = 1 .. #children traverse(child(i)) return (a) Pseudo code for traversing an n-ary tree

S → aS + a | aa

S → aSS + a | aa SS + → SSS + | S

h(a) = t, h(a) = r

h(a) = t, h(a) = r

(a) Behavioural representation by a balanced grammar

(b) Behavioural representation by a quasi balanced grammar

Remarks: In figures (b) and (c) the mapping from a to t and a to r has been provided to establish the relationship between words of the grammar and the program behaviour as described earlier.

Figure 4.6: A binary-tree parsing program and its behaviour expressed by automata and grammars regular expression S + has been replaced by a new non-terminal SS + and a new production SS + → SSS + | S, which models the same behaviour as the original regular expression. The transformation from a regular expression to productions of a regular grammar can always be carried out in a straightforward manner according to [HMU01]. Lemma 4.1 (Expressive Equivalence of Balanced and Quasi Balanced Grammars) For every BG G = (V, Σ ∪ Σ ∪ Σ, P, S) there is a qBG G 0 = (V 0 , Σ ∪ Σ ∪ Σ, P, S), such that L(G) = L(G 0 ). Proof. The proof is straightforward and therefore omitted.

4.3.2



A Grammatical Representation of ωVPLs

In the grammatical representation of ωVPLs, which we give in the following, we build upon the property that matchings between calls and returns can only appear in words as factors of finite length. This property has already been exploited in the complementation proof of [AM04], where pseudo-runs were used to decompose ωVPLs into ω-regular languages that describe non-matching behaviours and

4.3. A GRAMMATICAL REPRESENTATION

103

summary-edges that reflect call- and return-matchings as minimally well-matched words. ω-Regular Language Representations As in the automata theoretic case, we only give a grammatical representation of ω-regular languages here, since we do not address finite word regular languages in the following. The representation is in that case almost identical to B¨ uchiautomata, which will become apparent when we give the definition of ω-regular grammars (ωRGs). Similar to the case of context-free grammars over finite words, a grammar generates an infinite word by deriving it due to successive applications of production rules. For the rewriting sequence S → aA → abB → . . ., with a, b ∈ Σ and A, B ∈ V , we write d(w) to abbreviate that there exists an infinite derivation of said form in a given grammar, for which the limit of the derivation approaches the infinite word w. Similar to the case with runs of automata, we use the notation inf(d(w)) to denote the set of non-terminals that appear infinitely often in the derivation. Definition 4.7 (ω-Regular Grammars) An ω-regular grammar is a structure G = (V, Σ, P, S, F ), where • V is a finite set of non-terminals A, B, . . ., • Σ is a finite set of terminals a, b, . . ., • V and Σ are disjoint, • P is a finite set of productions of the form V × Σ · V , • S denotes a designated starting non-terminal S ∈ V , and • F ⊆ V denotes the set of accepting non-terminals. The language of an ω-regular grammar L(G) is denoted by the set of infinite words {w | inf(d(w)) ∩ F 6= ∅}. We do not formalise ω-context-free grammars (ωCFGs) here, since we only refer to context-free grammars over finite words in the following, but we give an outline of ω-context-free grammars. Only once do we briefly refer to ω-contextfree grammars, where we consider the amalgamation of an ω-regular language

104

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

and context-free grammars over finite words. In short, infinite derivations in ωcontext-free grammars can contain underived non-terminals and terminals that do not contribute to the final word, for example, the concrete production rule S → aSbS leads to the derivation S → aSbS → aaSbSbS → . . . → an S(bS)n → . . ., so that the limit of this sequence denotes the word aω . In this example, we have always derived the left-most non-terminal, which we adopt for every derivation in this thesis. It is known that left-most derivations in ω-context-free grammars make it possible to describe any ω-context-free language, whilst this is not true for other modes of derivation, [CG77b]. For infinite words, inf(d(w)) is modified to denote the set of non-terminals whose substitution expands to infinitely many non-terminals in the derivation. ω-Regular Grammars with Injected Quasi Balanced Grammars Given an infinite word w of an ωVPL, it can be split into sub-words that are either in Lmwm (see page 95) or in Σc ∪ Σi ∪ Σr , where for the latter no sub-word in Σr follows a sub-word in Σc . We abbreviate the latter constraint as Σc /Σr -matching avoiding, or simply matching avoiding when the alphabets concerned follow from the context. Our grammatical representation of ωVPLs utilises Σc /Σr -matching avoiding ωRGs to describe languages of pseudo-runs. Languages of summaryedges, i.e. languages with words in Lmwm , are represented by surrogate terminals in the injector language and the actual words are separately described by qBGs under a morphism that maps terminals and co-terminals, a/a, to calls and returns, c/r, respectively. The morphism is required to cover matchings of calls c ∈ Σc that can match more than one return r ∈ Σr , which cannot be reflected as a simple terminal/co-terminal matching a/a, a ∈ Σ and a ∈ Σ. For example, the matchings c/r1 and c/r2 are representable as terminal/co-terminal pairs a/a and b/b under the mappings h(a) = h(b) = c, h(a) = r1 and h(b) = r2 . Finally, the amalgamation of Σc /Σr -matching avoiding ωRGs and qBGs under the aforementioned morphism give us a grammatical representation of ωVPLs: Definition 4.8 (ω-Regular Grammars with Injected Quasi-Balanced Grammars under a Morphism) A superficial3 ω-regular grammar with injected balanced grammars (ωRG(qBG)+h) S Sm is a septuple G = (V, Σc ∪ Σi ∪ Σr ∪ m n=1 {gn }, P, S, F, n=1 {Gn }, h), where 3

superficial – as understood as being on the surface of something.

4.3. A GRAMMATICAL REPRESENTATION • Σc , Σi , Σr and

Sm

n=1 {gn }

105

are mutually disjoint,

• G is Σc /Σr -matching avoiding, • Gn = (Vn , Σn , Pn , Sn ) is a qBG for n = 1, 2, . . . , m,4 and G forms an ωCFG G 0 = (V ∪

Sm

n=1 {Vn }, Σ



Sm

n=1 {Σn }, P

0

, S, F ) with

• disjoint sets V and {V1 , V2 , . . . , Vm } as well as Σ and {Σ1 , Σ2 , . . . , Σm }, and • P 0 is the smallest set satisfying – A →G 0 aB if A →G aB, where a ∈ (Σc ∪ Σi ∪ Σr ) and A, B ∈ V , or S Sm – A →G 0 Si B if A →G gi B, gi ∈ m n=1 {gn } and Si ∈ n=1 {Vn }, or – A →G 0 α if A →Gn α otherwise, S and h : Σc ∪ Σi ∪ Σr ∪ m n=1 −→ Σc ∪ Σi ∪ Σr is a morphism which is constrained in such a way so that it preserves terminals of G, h(a) = a for any a ∈ (Σc ∪Σi ∪Σr ), S and for terminals/co-terminals of grammars Gi ∈ m n=1 {Gn } it maps terminals a ∈ Σn to calls c ∈ Σc , maps co-terminals a ∈ Σn to returns r ∈ Σr , maps terminals a ∈ Σn to internal actions i ∈ Σi . S The language L(G) of an ωRG(qBG)+h G = (V, Σc ∪ Σi ∪ Σr ∪ m n=1 {gn }, Sm ω P, S, F, n=1 {Gn }, h) denotes the set {h(w) | S →G 0 w and w ∈ (Σ ∪ Σ1 ∪ Σ2 ∪ . . . ∪ Σm )ω }, where G 0 is the ω-context-free grammar corresponding to G.  In the following, we refer to the morphism h under the constraints which are given above as the superficial mapping h. S {gn }, P, S, Consider an arbitrary ωRG(qBG)+h G = (V, Σc ∪ Σi ∪ Σr ∪ m Sm Sm n=1 F, n=1 {Gn }, h). We call the ωRG G↑ = (V, Σc ∪ Σi ∪ Σr ∪ n=1 {gn }, P, S, F ) the injector grammar of G, while the qBGs G1 , G2 , . . . Gm are called the injected grammars of G.5 When G is clear from the context, we just talk about the injector grammar G↑ and the injected grammars G1 , . . . , Gm respectively. The languages associated with these grammars are referred to as injector and injected languages respectively. In fact, injector languages resemble pseudo-runs with pseudo edges gn , n = 1 . . . m, while injected language resemble matchings covered by summaryedges. 4

Each Σn is a shorthand for Σn ∪ Σn ∪ Σn . This should not be confused with nested words, [AM06], which describe the structure induced by matchings in finite words. 5

106

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

S → g1 S S1 → aS1 S1 a | aa F = {S} h(a) = t, h(a) = r

(a) Equivalent behavioural representation as an ω-visibly pushdown automaton

(b) ω-regular matching-avoiding injector grammar with an injected balanced grammar and morphism to map terminals to calls and returns

Figure 4.7: Behaviour of a binary-tree parsing program expressed by automata and grammars State q1

Input t

q2

t

q2

r

q3

t

q4

r

q5

t

Action The first t marks the first call of traverse(node n), which will be followed by matched ts and rs corresponding to the traversal of a binary tree. In order to unambiguously match the t against its return symbol r, the stack symbol $ is pushed onto the stack – a stack symbol that is not used during the traversal of the tree’s branches. After the initial t of a tree’s traversal has been seen, the input t must refer to the traversal of a left-branch, and therefore, an L is pushed onto the stack. When a return is eventually seen, the L on top of the stack will then indicate that the traversal of a left-branch has finished. If the top of the stack is an R, then the traversal of a right-branch is completed. Otherwise, a $ indicates that the traversal of the binary tree in question is completed and a transition to the automaton’s final state is made. In state q3 the traversal of a left-branch in the tree has been completed, and therefore, the t indicates the start of a right-branch traversal. As such, an R is pushed onto the stack to record this and a transition to the state q2 is made, so that another possible left-/right-branch of the next node can be traversed. The state q4 is reached after the traversal of a right-branch is completed, which means that further right-branches might complete in case a R is on top of the stack, the traversal of a left-branch might complete in case an L is on top of the stack, or that the r that corresponds to the traversal of the tree is recognised when $ is on top of the stack. Equivalently to the action performed in state q1 , the t initiates the traversal of a new binary tree. Even though state q1 and state q5 carry out the same actions, they are not semantically equivalent, because q5 is an accepting state whilst q1 is not an accepting state.

Figure 4.8: Description of an ω-visibly pushdown automaton for recognising the behaviour of a binary-tree parsing program

4.3. A GRAMMATICAL REPRESENTATION

107

Example 4.5 (ωRG(qBG)+h representation of an ωVPL) An example of an ωRG(qBG)+h has been given in Figure 4.4(c), page 97, where the behaviour of a given pseudo code for traversing a continuous supply of binary trees was modelled. In Figure 4.4(b), we also gave an ω-visibly pushdown automaton that accepts the same behaviour, but no comparison between the automata theoretic and grammatical representation was made. Here, we give a comparison between the representations and explain the structure of the grammatical productions for expressing matchings between calls and returns. In Figure 4.7 on the previous page, the ω-visibly pushdown automaton and injector/injected grammars from Figure 4.4 are presented again. An infinite word is recognised by the ωVPA of Figure 4.7(a) as described in the table in Figure 4.8 on the previous page, whereas we describe the acceptance of an infinite word by the grammar of Figure 4.7(b) below. The designated initial production S → g1 S repeats the derivation of the surrogate terminal g1 infinitely often, since S is an accepting non-terminal, i.e. S ∈ F . As such, the injector language represented by the grammar is g1ω , which forms the basis for the amalgamated language that describes the behaviour that can be observed when repetitively traversing binary trees using the pseudo-code of Figure 4.4(a). The observable behaviour of a single tree traversal is described by the injector grammar production S1 → aS1 S1 a | aa under the mapping from a to calls t and a to returns r. function traverse(node n) if ’n is not a leaf’ then traverse(n’s left child) traverse(n’s right child) return

S1 → aS1 S1 a | aa

In the illustration above, correlations between lines of pseudo-code and terminals/non-terminals of the injected grammar production are depicted. A call to traverse(node n), i.e. the start of the actual execution of the function, is represented by a, whilst the return from the function is represented by a. The call of traverse(node n), i.e. the invocation of the function, is simply represented by S1 , which is analog to the recursion as it is present in the pseudo-code.

108

4.3.3

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

ωVPL and ωRL(qBL)+h Coincide

For the equivalence proof of ωVPLs and ωRL(qBL)+hs, we first show that minimally well-matched words, as described by summary-edges, can be expressed by qBGs plus an appropriate superficial mapping, and vice versa. It is then only a small step to prove language equivalence for the whole language class, by translating an arbitrary ωRG(qBG)+h into an expressively equivalent ωMSO formula and an arbitrary ωVPA into an expressively equivalent ωRG(qBG)+h. We are giving a brief outline of our approach: Let VPAmwm and MSOmwm refer to the language classes defined by VPA and monadic second-order (MSO) formulas respectively, whose languages coincide with the language class of minimally well-matched words, Lmwm , i.e. restricted variants of VPA and MSO-formulas that can only describe minimally well-matched words. We show that any qBG under a superficial mapping can be translated into an expressively equivalent MSOmwm formula. Since MSOmwm is defined to express the language class Lmwm , the inclusion qBL ⊆ Lmwm is proven. Second, for an arbitrary VPAmwm a qBG is constructed so that their languages coincide, which in turn proves qBL ⊇ VPLmwm . We then carry on and show that the ωVPLs can be represented by matchingavoiding ω-regular grammars with injected qBGs, where the matching avoiding ω-regular grammars resembles pseudo-runs as they can be observed in ωVPLs. It is shown that an MSO-formula can be constructed from said grammars, where matchings are expressed by MSOmwm sub-formulas that we constructed in the previous step. This proves ωRL(qBL)+h ⊆ ωVPL. We then proceed and formulate an ωVPL’s pseudo-runs and summary edges in terms of an ωRG(qBG)+h, which proves the reverse inclusion, i.e. ωRL(qBL)+h ⊇ ωVPL. Monadic Second-Order Logics over Words Logics are descriptive formalisms to express structural properties of models, where the models are interpreted by us as finite or infinite words. Temporal logics are suitable for expressing program behaviours and behavioural properties such as safety, liveness or fairness, [Pnu77, FL79, HPS81, HMM83b, HMM83a, Wol83, Sis94, Mos97], and there are also temporal logics that especially address call-stack behaviour, [AEM04]. The latter logic, Caret, is however not as expressive as VPLs, [AEM04], and can therefore not be used in our language equivalence proof below. For our proof, we consider the MSO-logic with a call-/return-matching

4.3. A GRAMMATICAL REPRESENTATION

109

relation, which was introduced and shown to be equivalent to VPLs in [AM04], in order to express matchings between calls and returns. It is then only a matter of convenience that we choose a simpler MSO-logic for representing the regular recognisable parts of ωVPLs, since it is then straightforward to combine the representations of the regular structure of ωVPL words and their factors that represent matchings, which would not be the case when combining logics with different semantics. In the following, we give formal definitions for both logics, where we introduce the MSO-logic that describes ω-regular languages first. Second, the MSO-logic for describing minimally well-matched words is introduced, which extends the MSO-logic over ω-regular languages with an additional operator.

ω-Regular Language Representation The logic we are considering is monadic second-order logic with a one-successor relation, [B¨ uc62], whose formulas are composed of the usual Boolean operators, logical and, ∧, logical or, ∨, and negation, ¬; and for every terminal a ∈ Σ there exists a second-order predicate Ta that represents a set of time indices that are labelled with a. First-order variables over the positive natural numbers can be used to express temporal dependencies in conjunction with existential and universal quantifiers, ∃ and ∀ respectively, over them. For example, the formula ∀i (Ta (i) ∧ Tb (i + 1)) denotes that every occurrences of the letter a is immediately followed by an occurrence of the letter b. Obviously, for any two distinct terminals a, b ∈ Σ, the predicates Ta and Tb are mutually disjoint for all positions on a trace, whilst the successor relation permits the use of basic Presburger arithmetic. Besides the second-order predicates Ta , a ∈ Σ, arbitrary second-order predicates are allowed, which are later used to encode grammars in the logic. Arbitrary second-order predicates will be denoted by X, with appropriate sub-scripts that set the context of the predicate. We also allow quantification over these predicates, i.e. we allow quantification over sets of sets, which makes this logic effectively a second-order logic. Definition 4.9 (Monadic Second-Order Logic with a One-Successor Relation, [B¨ uc62]) A formula ϕ is a formula of monadic second-order logic with a one-successor relation over an alphabet Σ, iff it is of the form

110

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

• ϕ is the truth constant >, or ϕ is the falsity constant ⊥, • ϕ is a formula ¬ψ, ψ1 ∧ ψ2 , or ψ1 ∨ ψ2 , where ψ, ψ1 and ψ2 are themselves monadic second-order logic formulas, • ϕ is a first-order quantification formula ∃i ψ(i) or ∀i ψ(i), with i being an arbitrary first-order variable and ψ(i) denotes that i is an unbound variable in ψ, • ϕ is a second-order quantification formula ∃Xψ(X) or ∀Xψ(X), with X being a second-order predicate and ψ(X) denotes that X is an unbound variable in ψ, • ϕ is the predicate Ta (i), a ∈ Σ and i is either a bound first-order variable or a positive natural number, • ϕ denotes i ∈ X, for a first-order variable i and second-order predicate X, • ϕ denotes i < j or i ≤ j, for any two first-order variables i and j, • ϕ is the arithmetic expression i + 1 for an arbitrary first-order variable i. With the popular interpretations of this syntax, the language L(ϕ) defined by a monadic second-order logic formula ϕ is the set of infinite words for which the predicates Ta , a ∈ Σ in ϕ are satisfied.  Visibly-Pushdown Language Representation In [AM04], an extension of the monadic second-order logic with a one-successor relation was given, where the extension introduced a matching relation µ. The matching relation µ ranges over two first-order variables, e.g. µ(i, j), which denotes that there is a matching between the terminal at time index i and j. These matchings coincide with our earlier graphical representation when we visualised matchings between calls and returns, and as such, we do not address this further here. Definition 4.10 (Monadic Second-Order Logic with a Call/Return-Matching Relation, [AM04]) A formula ϕ is a formula of monadic second-order logic of a one-successor with call/return matching relation over the alphabet Σc ∪ Σi ∪ Σr , iff it is of the form • ϕ is a formula of monadic second-order logic with a one-successor relation,

4.3. A GRAMMATICAL REPRESENTATION

111

• ϕ is a matching formula µ(i, j) with i, j denoting first-order variables or a positive natural number, where there is a matching between a call terminal at time index i and a return terminal at time index j. The language L(ϕ) defined by a monadic second-order logic with call/return matching relation formula ϕ is the set of finite or infinite words for which the predicates Ta , a ∈ Σ in ϕ are satisfied.  Expressive Equivalence of quasi-Balanced Grammars under a Superficial Mapping and Monadic Second-Order Logic of Minimally WellMatched Words In the following lemma, an MSOmwm formula is constructed from a qBG in such a way so that their languages coincide. The translation works by quantifying over each matching in the word and filling in the respective regular expressions. In order for the lemma to hold, we assume that all of the qBG’s productions are uniquely identified by their terminal/co-terminal pairs. While this clearly restricts a grammar’s language in general, the language under a superficial mapping is preserved. Lemma 4.2 (Quasi Balanced Grammars under a Superficial Mapping are Expressible in Monadic Second-Order Logic of Minimally Well– Matched Words) Let G = (V, Σ ∪ Σ ∪ Σ, P, S) denote an arbitrary qBG and let h be an arbitrary superficial mapping. Then the MSOmwm formula6 _

~ ϕ ≡ ∃X∃i

(Ψa,a (1, i) ∧ T$ (i + 1) ∧ ∀k ∈ [1, i]Φ(k))

(S→aSR a)∈P

accepts the same language as G, where Ψa,a (i, j) ≡ Ta (i) ∧ Ta (j) ∧ µ(i, j), Φ(k) ≡

_

Ta (k) ⇒ ∃j(µ(k, j) ∧ ϕa (k + 1, j)),

a∈Σ 6

In Lemma 4.2 and the proof of Theorem 4.1 we use a sans-serif font for terminals and variables to denote they are not bound in the formula, instead of denoting this explicitly. For example, a ∈ Σ is written as a. We use this notation to draw the focus towards the translation.

112

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

∆(s, t, k) ≡

W

(XA (k) ∧ XB (k + 1) ∧ Ta (k)) ∨ W(A→aB)∈PR ( A → BC, B → bSR0 b )∈PR ∃j ∈ [s, t](XA (k) ∧ XC (j + 1) ∧ Ψb,b (k, j)) V ϕa (s, t) ≡ XSR (s) ∧ (A, B) ∈ V, A 6= B ∀k ¬(XA (k) ∧ XB (k)) ∧ W ∀k ∈ [s, t − 1] ϕµ (s, t, k) ⇒ ∆(s, t, k) ∧ (A→ε)∈PR XA (t),

where (A → aSR a) ∈ P , and ϕµ (s, t, k) ≡ ¬∃i, j ∈ [s, t](µ(i, j) ∧ i < k ≤ j). Proof. Consider an arbitrary qBG+h G and its translation to a MSOmwm formula ϕ. We show that every word w$ ∈ L(ϕ) is a word w ∈ L(G) and vice versa. The intuition of the formulas of Lemma 4.2 are as follows: • ϕ encodes the outermost terminals of a word, which are due to initial productions of the form S → aSR a, where T$ denotes an explicit end-marker that succeeds every word and the sub-formula ∀k ∈ [1, i]Φ(k) dictates the word structure as it is enforced due to the right-hand side of grammar productions, • Ψa,a (i, j) denotes that there is a matching between the terminals a and a at positions i and j, respectively, • Φ(k) expresses that in between two matching terminals a and a, the regular expression R which corresponds to the production aRa is modelled by ϕa ; this implies that the terminal pairs a and a in the right-hand sides of productions have to be unique – an assumption that can be made without loss of generality, • ∆(s, t, k) ensures that there is a grammatical derivation that expresses a regular expression between the boundary positions s and t, where k is used to address a particular position within the interval [s, t], • ϕa (s, t) denotes the semantics of a regular expression R, which is due to a production’s right-hand side aSR a – again, without loss of generality we can assume that a uniquely determines the right-hand side. L(ϕ) ⊆ L(G): Let M be an arbitrary model of ϕ that represents the word w. We write hT1 , X1 ihT2 , X2 i . . . hT|w| , X|w| ihT$ , X|w|+1 i to denote the sequence of unique predicate pairs of T and X which hold at indices 1 to |w| + 1 in M. We use the following rewriting rules:

4.3. A GRAMMATICAL REPRESENTATION

113

• occurrences of the form hTa , XB i are replaced by hTa , Bi if (B → ε) ∈ P , • occurrences of the form hTa , XA ihTa , Bi are replaced by hTa , Ai if (A → aB) ∈ P , • occurrences of the form hTa , XB ihTa , SR ihTb , Ci are replaced with hTb , Ai if (A → BC, B → aSR a) ∈ P . Eventually, hTa , XS ihTa , SR i will be left, which is replaced with S iff (S → aSR a) ∈ P . As a result, we have established a bottom up parse in G for an arbitrary word w$ ∈ L(ϕ), which implies that every word in L(ϕ) is in L(G). L(G) ⊆ L(ϕ): Let S → α → β → . . . → w denote an arbitrary derivation in G. With each derivation step, we try to find variable assignments that satisfy ϕ, so that after the derivation finishes, w$ is represented by the sequence hT1 , X1 ihT2 , X2 i . . . hT$ , X|w|+1 i of ϕ. The construction of hT1 , X1 ihT2 , X2 i . . . hT$ , X|w|+1 i follows the derivation S → α → β → . . . → w in the sense that there is a mapping between the n-th step of the sequence constructed by the variable assignments and the n-th sentential form reached in the derivation. We consider triples of the form LA, ψ, BM, where A is a non-terminal as it appears in some sentential form derived from the initial non-terminal S, ψ denotes ∗ the formula which is supposed to derive a word w, where A → w, and B is a temporary stored non-terminal. When A derives α with A → α being a production, we try to find variable assignments for ψ that represent terminals in α. Since terminals appear only in prefixes/postfixes of α, we remove the ground terms in ψ, add pairs hTk , Xk i to the left/right of LA, ψ, BM accordingly, and replace LA, ψ, BM with LC, ψ 0 , EM or the sequence LC, ψ 0 , EMLD, ψ 00 , F M, depending if α has one non-terminal C or two non-terminals CD as sub-word. The non-terminals E and F are associated with productions E → ε and F → ε respectively, where they denote the end of a regular expression embedded between a call and return. Starting the rewriting process with LS, ϕ, AM, S denoting the initial nonterminal and A is chosen arbitrarily, a sequence of tuples of the form hTk , Xk i is eventually left, which indeed represents w, so that the model for w$ is represented by adding hT$ , XA i to the sequence.  The reverse inclusion, i.e. qBL ⊇ VPL, can be shown by a number of rewriting steps of an arbitrary VPAmwm to a BG equipped with a superficial mapping. Since there is a translation from BGs to qBGs, the inclusion is then proven. The VPAmwm represents hereby L((p, q, f )) of some summary-edge (p, q, f ).

114

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Definition 4.11 (Immediate Matching Context-Free Grammars) Let G = (V, Σc ∪ Σi ∪ Σr , P, S) denote the CFG with productions of the form S → cA, A → cBC, A → iB, and A → r that is obtained from a VPAmwm by the standard translation [HMU01, Theorem 6.14, detailed description in its proof], then the immediate matching CFG G 0 = (V 0 , Σc ∪ Σi ∪ Σr , P 0 , S 0 ) is obtained from G, so that S 0 →G 0 chA, rir iff S →G cA, hA, r1 i →G 0 chB, r2 ir2 hC, r1 i iff A →G cBC, hA, ri →G 0 ihB, ri iff A →G iB, hA, ri →G 0 ε iff A →G r.  Lemma 4.3 (Visibly Pushdown Automata of Minimally Well-Matched Words are Expressible by Immediate Matching Context-Free Grammars) The language L(G) of an immediate matching CFG G that is obtained from a VPAmwm A is equal to L(A). Proof. The translation of Definition 4.11 preserves the language equivalence of the grammars, as it is a special case of the more general translation presented in [Eng92, Page 292].  In the following transformation steps, productions are rewritten so that matchings cAr appear exclusively in right-hand sides. Furthermore, we remove all productions that produce no matchings by introducing language preserving regular expressions R in productions with right-hand sides of the form cAr, so that the resulting right-hand side is cRr. Finally, adding a homomorphism that maps fresh terminal/co-terminal pairs to calls and returns, where the productions are modified accordingly, gives us a BG. Definition 4.12 (Translation from Immediate Matching Context-Free Grammars to Balanced Grammars with a Superficial Mapping) Let G = (V, Σc ∪ Σi ∪ Σr , P, S) denote an immediate matching CFG, a BG G 000 = (V 000 , Σ ∪ Σ ∪ Σ, P 000 , S 000 ) and superficial mapping h are obtained from G in three steps as follows: First step: • A →G 0 cBr iff A →G cBr, • A →G 0 A0 C, A0 → cBr iff A →G cBrC, • A →G 0 iB iff A →G iB, • A →G 0 ε iff A →G ε.

4.3. A GRAMMATICAL REPRESENTATION

115

Second step: A →G 00 cRB r iff A →G 0 cBr, where RB describes the language L(B) over Σi ∪ Vcαr , Vcαr = {A | A →G 0 cBr}.  Third step: A →G 000 aRB a, h(a) = c, h(a) = r iff A →G 00 cRB r. Lemma 4.4 (Translational Correctness of Definition 4.12) For any immediate matching CFG G and its corresponding BG G 0 plus superficial mapping h as of Definition 4.12, their languages coincide. Proof. In the first step, we only split up some productions into two separate productions A → A0 C and A0 → cBr, which preserves language equivalence. In the second step, every non-terminal B in right-hand sides of the form cBr is substituted with its regular language over Σi ∪V . This is clearly just a syntactical abbreviation, and hence, does not modify the language either. Finally, in the third step, every call is replaced by a terminal and every return is replaced by a co-terminal, with an appropriate h respectively.  Equivalence of ωRL(qBL) and ωVPL is now shown by translating an arbitrary ωRG(qBG) into an ωMSOµ formula and an arbitrary ωVPA into an ωRG(qBG), where each time the languages of the characterisations coincide. Theorem 4.1 (Language Equivalence of Matching-Avoiding ω-Regular Languages with Injected Quasi Balanced Languages under a Superficial Mapping and ω-Visibly-Pushdown Languages) The language classes defined by ωRL(qBL)+h and ωVPL coincide. S Proof. ωRL(qBL)+h ⊆ ωVPL: Let G = (V, Σc ∪ Σi ∪ Σr ∪ m n=1 {gn }, P, S, F, Sm n=1 {Gn }, h) denote an arbitrary ωRG(qBG)+h. Its injector language is regular, and hence, is representable as an ωMSO formula by the standard translation. Each of the injected languages L(Gn ) is representable as MSOmwm formula ϕn respectively. Let ϕ0n denote a variation of ϕn , where the formula presented in ~W Lemma 4.2 is modified to (∃X (S→aSR a)∈P (Ψa,a (i, j)∧∀k ∈ [i, j]Φ(k)))(i, j) but left unchanged otherwise. With appropriate renaming of variables, each terminal gn can then be substituted by the corresponding formula ϕ0n in the injector grammar, so that we get an ωMSO formula W ϕ≡

XS (1) ∧ ∀k W

A∈F

W

(A→aB)∈P (XA (k) ∧ XB (k + 1) ∧ Ta (k)) ∨ (A→gn B)∈P

∃j(XA (k) ∧ XB (j + 1) ∧ ϕ0n (k, j))

∀k∃j(k < j ∧ XA (j)).

! ∧

116

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

The formula ϕ is a variation of the commonly found formula for describing languages of B¨ uchi-automata, where we made alterations to represent injected languages. XS (1) denotes that the grammatical derivation has to start from the W initial symbol S. The sub-formula (A→aB)∈P (XA (k) ∧ XB (k + 1) ∧ Ta (k)) represents the generation of non-matchings and internal actions – outside the scope of minimally well-matched words. Injected languages are represented by the subW formula (A→gn B)∈P ∃j(XA (k) ∧ XB (j + 1) ∧ ϕ0n (k, j)), where the production of the surrogate is skipped and we jump from position k to j + 1, whilst the injected word fills in the positions in the interval [k, j] according to the formula ϕ0n . W Finally, A∈F ∀k∃j(k < j ∧ XA (j)) models the B¨ uchi-acceptance conditions, i.e. accepting non-terminals have to be seen infinitively often in the generation of the word. Language inclusion follows from the fact that every ωMSOµ formula can be translated into an ωVPA. ωRL(qBL)+h ⊇ ωVPL: Consider an ωVPA A and let A0 = (Q0 , Σc ∪ Σi ∪ Σr ∪ 0 uchi-automaton accepting all pseudo-runs of A. n=1 {Ωn }, δ, qi , F ) denote the B¨ 0 A can be represented as right-linear injector grammar G↑ with productions of the form A → cB, A → rB, and A → (p, q, f )n0 B for representing sets of summaryedges Ωn with (p, q, f )n0 ∈ Ωn . Since summary-edges (p, q, f )n0 are treated as terminals in A0 , their f component does not contribute to the acceptance of a pseudo-run. Hence, for every production A → (p, q, 1)n0 B, it is without loss of generality required that B ∈ F 0 . Sm

All summary-edges stand for languages in VPLmwm , and hence, are representable as VPAmwm s respectively. Each VPAmwm representing a summary-edge (p, q, f )n0 can be transformed into a qBG Gn0 plus additional superficial mapping h. By combining G↑ and the various Gn0 to a superficial ωRG(qBG), we get the language inclusion.  Injecting BGs instead of qBGs into ωRGs does not change the expressiveness, which is trivially true as every BG can be translated into a qBG. Corollary 4.1 (Language Equivalence of Matching-Avoiding ω-Regular Languages with Injected Balanced Languages under a Superficial Mapping and ω-Visibly-Pushdown Languages) The language classes defined by ωRL(BL)+h and ωVPL coincide.

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

4.4

117

Beyond Visibly Pushdown Properties

In this section we extend the expressiveness of model-checking by presenting a language class with a decidable language inclusion problem whose expressiveness goes beyond the expressiveness of ω-visibly-pushdown languages, i.e. a language of that language class can describe behavioural properties that are not expressible by an ωVPL. Our approach that is presented in the following builds upon the injector and injected grammars that we have introduced in our grammatical representation of ωVPLs, or more precisely, it carries the definition of the respectively associated injector and injected languages further, since we do not rely on particular automata or grammatical representations to describe the said language class. In other words, we investigate the model-checking problem from a language theoretic point of view, where we accompany concrete language examples by grammatical representations later in this section.

4.4.1

Limitations of Visibly Pushdown Languages

Visibly pushdown languages over infinite words have enabled model-checking beyond ω-regular properties, so that it is possible to verify ω-visibly pushdown properties or behaviour of ω-visibly pushdown behavioural models. By showing that there is a language class whose expressiveness exceeds the language class of the ωVPLs for which the model-checking problem is decidable, we are enabling the verification of properties such as data-stack properties and counting properties that are not expressible by ωVPLs, and thus, we effectively extend model-checking to a larger class of system models and specifications. However, similarly to our previous grammatical representation of ωVPLs where we used a homomorphism to raise the expressiveness of balanced grammars so that they could express words represented by summary edges, we are going to discharge two analogous encodings for ωVPLs that would enable us to specify counting properties with them. While this is not a formal study of suitable mappings for formulating properties or behaviour beyond the expressiveness of ωVPLs, it gives an insight that mappings are not a straightforward solution to extending the expressiveness of model-checking beyond ωVPLs, albeit further research might find promising solutions to the drawbacks we are presenting in the following. Example 4.6 (Embedded 1 : 1- and 1 : 2-Matching Behaviour) We consider the implementation of a simple protocol where each request of a

118

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

r

g

c

r

r

g

g

g

c

Figure 4.9: Illustration of embedded 1 : 1- and 1 : 2-matching behaviour bundle of resources is followed by at least one instance of the resource granted, but no more than two instances are ever granted per request. As such, the relationship between the number of resource requests, nreq , and the number of granted resources, ngnt , can be specified as nreq ≤ ngnt ≤ 2 · nreq . An accepted program behaviour that satisfies this specification is depicted in Figure 4.9. By building on top of our grammatical approach from the previous section, we can formalise this specification as a context-free grammar: S → AS A → rAg | rAgAg | ε From the production A → rAgAg it is apparent that one occurrence of the letter r matches two occurrences of the letter g, which makes this language a CFL that is not a VPL. Even though the language we have given in the example exceeds the expressiveness of the VPLs, we can give a VPL that describes a similar property, namely that every n successive occurrences of requests are matched by n granted resources. In terms of a grammar, this is formalised as: S → AS A → rAg | rrAgAg | ε In the system model, this has to be reflected accordingly, where we could permit a non-deterministic choice between single and paired occurrences of r whenever a request is made. While the encoding for this particular example is an alternative for representing 1 : 2-matchings by 1 : 1-matchings, it has the drawback that counter-examples of system behaviour that do not satisfy the specification are ambiguous in the

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

119

sense that it may not be possible to see which occurrences of the letter r are supposed to be single or paired occurrences. For example, the word rrrgggg is not generated by the grammar above and is therefore a counter-example, as a program behaviour that does not match the specification, which is easily determined, but unfortunately, it is not clear which matching history led to the extra g since there are three possible matching patterns here: A→rAg

z }| { r rrgg g g |{z}

A → rrAgAg

A→rrAgAg

z }| { rr rg gg g |{z} A → rAg

A→rAg

z }| { r r rg g g g |{z} A → rAg | {z } A → rAg

In order to uniquely determine the history of the program execution that led to the erroneous program behaviour, it is necessary to resolve this non-determinism. Yet another alternative grammar that is a VPL, but which is also unambiguous when reflecting a similar language as the first grammar, can be reformulated with a distinguishable new terminal x which matches the second appearance of g as S → AS A → rAg | rxAgAg | ε Unfortunately, this encoding entangles appearances of r and x, in a sense that we interpret the sub-word rx as if it were a single terminal that leads to the 1 : 2-matching due to the two appearances of g following a single appearance of rx. As such, the use of the terminal x in productions has to be restricted, for example by allowing its use only in production rules that describe 1 : 2matching of the form above. However, while this is a straightforward approach in terms of a grammatical characterisation, it becomes troublesome when regarding language complementation, the latter which is not uncommon when formulating behavioural specifications. For the grammar we have given here, this means in particular that the complemented language would, for example, contain the word rxg, which we could interpret in two ways: x is wrongly placed within a matching between r and g, or alternatively, it is only the prefix of the word rxgg. Of course, these are just examples and there might as well be a homomorphism or more complex function or mapping on letters or sub-words that resolves the problems we have given as examples here. However, in the following, we extend our work on a grammatical representation of ωVPLs that does not inhibit

120

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

restrictions on the production rules of the grammars. We only require that the grammars in question fall within certain language class, for which we show that the model-checking problem is still decidable.

4.4.2

A Generalisation of Injector and Injected Languages

We generalise the concept of injecting languages by presenting a language theoretic approach from which we derive examples and concrete grammatical representations later on, whilst we maintain the decidability of the language inclusion problem for the language classes given in the following. The injector languages we are considering are ω-regular, just as in the previous section, which is not a strong restriction since any ω-context-free language can be expressed as the Kleene-closure U V ω , with U, V context-free languages over finite-words, [CG77a]. The latter apparently describes the language class of ωRL(CFL). For ωCFLs it is known that their expressiveness is rich enough to make testing language inclusion of two ωCFLs undecidable, so that they are not suited for model-checking, but also, they do not form a Boolean algebra. Of course, for any language class for which testing language emptiness is decidable and that forms a Boolean algebra, testing language inclusion is trivially decidable, i.e. solving the language inclusion L1 ⊆ L2 then becomes equivalent to verifying whether the equation L1 ∩ L2 = ∅ is satisfied. Definition 4.13 (Injector and Injected Languages) An injector language is an infinite word language L↑ over a finite alphabet of S terminals Σ and a finite alphabet of surrogate terminals m n=1 {gn }. The two alphabets are disjoint. An injected language is a finite word language L↓ over an alphabet Σ. The unification of an injector language and multiple injected ~ ↓ ), where each surrogate terminal corresponds to an languages is denoted by L↑ (L injected language, i.e. gi is a surrogate of L↓,i .  The definition of language amalgamation does not prevent the overlapping of sub-words of the injector language and words of the injected language, a constraint which we introduced earlier as “matching avoiding” when we gave the grammatical representation of ωVPLs. Here, we give a generalisation of the previous matching avoiding constraint, where we require that no sub-words of the injector language overlap with words of the injected languages.

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

121

We refer to the refinement of matching avoiding constraint as “L-factor avoiding”, which is indicating that factors that are described by the language L have to be avoided. For example, the matching avoiding constraint can be formulated by the regular language Σc · (Σc ∪ Σi )∗ · Σr , where Σc , Σi and Σr denote the ωVPL alphabets of calls, internal actions and returns respectively. The language Σc · (Σc ∪ Σi )∗ · Σr denotes that once a call is seen, a succeeding return is sufficient to establish a matching, i.e. a call that is eventually followed by a return establishes a matching. Definition 4.14 (L-Factor Avoidance) An infinite word language, Linf , is called L-factor avoiding iff no sub-word of Linf is in L.  ~ ↓ ] to denote that L↑ is an injector In the following, we use the notation L↑ [L Sm language over injected languages n=1 {L↓,n }, the latter which we abbreviate in ~ ↓ . We use an upwards pointing arrow when we refer to injector vector notation, L languages, and we use a downwards pointing arrow when we refer to injected languages. Whilst we relied on an extensive morphism that mapped terminals and coterminals, a ∈ Σ and a ∈ Σ, of qBGs to their respective counterparts of calls and returns in Σc and Σr in the case of ωRL(qBL)+h, we will only need one simple morphism h($) = ε in the next sections. As in the proof of Lemma 4.2, the terminal $ is used as an explicit end-marker, which is removed from words by the given morphism. We are omitting h in conjunction with the notation ~ ↓ ] and therefore we are asserting that the explicit end-marker $ can always L↑ [L be eradicated in the following. Nevertheless, we are keeping $ in the oncoming examples in order to clarify the structural properties of the languages we address.

4.4.3

Factor Avoiding Languages with a Decidable Language Inclusion Problem

We decide the language inclusion problem L1 ⊆ L2 by solving the language emptiness problem of the language L1 ∩ L2 . In order to preserve the decidability of determining the language intersection, language complement and language emptiness for factor avoiding languages, we have to face the following problems: • factor avoidance must be realisable in the injector language class, i.e. the

122

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES avoidance of factors must be recognisable as an ω-regular language in the following, and

• the injected languages which are represented by surrogate terminals have to be chosen in such a way that the complementation of the injector language can be done independently of the injected languages. The first bullet point is easily explained by considering the very simple injected r language class 2{ ww | w∈Σ } , i.e. the language class of data-stack behaviour, and an injector language class over the same alphabet, i.e. Σ. In order to avoid factors wwr , the injector language itself has to be an ω-context-free language, so that the matching occurrences between data-symbols can be recognised and in turn be avoided. Since we are restricting ourselves to ω-regular injector languages here, factors of injected languages have to be regular recognisable in order to ensure their absence in the injected language. For example, minimally well-matched words of ωVPLs are regular recognisable by the regular expression Σc · (Σc ∪ Σi )∗ · Σr , which we previously coined by the term “matching avoiding”. In the following we use a designated terminal as an end-marker to permit that the injected languages are regular recognisable which is very similar to the “matched” return as it is seen at the end of call-/return-matchings, and we provide an example that illustrates the practical suitability of this choice. Bullet point two addresses the complementation of injector languages, which we require to be independent from the injected languages in the sense that no expansion of surrogate terminals is required in order to carry out the complementation. In other words, the complementation of the injector language L↑ must ~ ↓ ]. yield the complement for the resulting amalgamated language L↑ [L Example 4.7 (The Role of Surrogate Terminals in Complementation) ~ ↓ ] over the alphabet Σ = {c, r1 , r2 , r3 , $}, where $ Consider the language L↑ [L denotes an explicit end-marker whose role we explain below, and surrogate terminals corresponding to elements of the lattice depicted in Figure 4.10 on the next page, with • L↑ = {g1ω }, • L↓,1 = ({cn r1n } ∪ {cn r2n }) · {$}, and • L↓,2 = ({cn r2n } ∪ {cn r3n }) · {$}.

{cn r2n } · {$}

{cn r3n } · {$}

Figure 4.10: Lattice of the Boolean algebra of Example 4.7 L↓,⊥ = ∅

({cn r2n } · {$}) ∩ L↓,>

({cn r3n } · {$}) ∩ L↓,>

({cn r1n } ∪ {cn r3n }) · {$} · · · (({cn r2n } ∪ {cn r3n }) · {$}) ∩ L↓,>

L↓,3 = ({cn r1n } · {$}) ∩ L↓,>

L↓,2 = ({cn r2n } ∪ {cn r3n }) · {$}

···

(({cn r1n } ∪ {cn r2n } ∪ {cn r3n }) · {$}) ∩ L↓,>

·

{cn r1n } · {$}

·

··

L↓,1 = ({cn r1n } ∪ {cn r2n }) · {$}

··

(({cn r1n } ∪ {cn r2n } ∪ {cn r3n }) · {$}) ∩ L↓,>

·

({cn r1n } ∪ {cn r2n } ∪ {cn r3n }) · {$}

···

···

···

··

L↓,> = {c, r1 , r2 , r3 }∗ · {$}

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES 123

124

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

~ ↓ ] = {(({cn rn } ∪ {cn rn }) · {$})ω } denotes the set of words The amalgamation L↑ [L 1 2 of matchings between a call c and returns r1 and r2 , but matchings between c and r3 are not part of the language. For each of the lattice’s elements, we can pick a surrogate terminal that corresponds to its language. Every language in the lattice has the prefix property, which is straightforward to see due to the explicit end-marker $, [HMU01]. The latter will be of importance when we revise this example to show the correct way of complementing amalgamated languages in Example 4.8 (page 129). The complement of L↑ , i.e. L↑ , cannot na¨ıvely be taken as the complement of the injector language. By complementing the injector language using well-known techniques, e.g. [Saf88], surrogate terminals are treated as ordinary letters, and as such, the language complementation fails. For example, the complement of L↑ avoids the ω-iteration of g1 , which means that g2ω is a valid word in the complemented language. In the amalgamated language, g2ω expands to ({cn r2n } ∪ {cn r3n })ω . Clearly, {cn r2n }ω should not be in the complement of L↑ , since it is a word of L↑ as well. We can reformulate the language in our example, so that the injected languages form a Boolean algebra which allows us to complement the injector language as an ordinary ω-regular language – plus factor avoidance.7 The set of elements, over which every Boolean algebra is defined, can be visualised in form of a lattice. In our case, the top element of the lattice is always the regular language that recognises all words of particular injected languages in question, L↓,> , and the bottom element is always the empty language, L↓,⊥ = ∅. We are addressing these points in the next subsections formally, where we proceed as follows: 1. the complementation of an injector language is examined by carrying logic considerations over to formal languages, i.e. we relate the law of the excluded middle and the law of non-contradiction to their language theoretic counterparts, 2. the recognition of “forbidden” factors (i.e., words of injected languages) 7

It should be noted that factor avoidance is trivial here, since we require that the injected language is regular recognisable. Consider the language amalgamation with a single injected language L↓ and its supremum L↓,> , i.e. the regular language L↓,> which “recognises” all injected words of L↓ in the sense that L↓ ⊆ L↓,> , then factor avoidance can be formulated via ω-regular set-operations as we describe in the following.

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

125

in injector languages is addressed, so that these factors can be avoided in injector languages, and 3. the unambiguous recognition of injected words is discussed, i.e. every injected word can be traced back to the surrogate terminal it stems from, which allows us to treat surrogate terminals as ordinary terminals regardless their substitution with injected language words later on. Excluded Middle and Non-Contradiction Since the injector language class does not contain any factors that are part of the injected language class, we ensure that the amalgamated language class is consistent, in the sense that it has properties which do not contradict common sense, and adheres to the law of the excluded middle ([Lu89, p.53, Theorem 2.6.8]), and the law of non-contradiction ([Lu89, p.54, Theorem 2.6.9]). In language terms, this means that L↑ [L> ]∪L↑ [L> ] = Σω and L↑ [L> ]∩L↑ [L> ] = ∅ hold respectively,8 where L> refers to the top element of the injected language class, which in specification terms means that language complementation neither misses out words, nor do sets of words of the language under consideration and its complement overlap. We can then carry these notions forward and make sure that a class of injected languages, L↓ , forms a Boolean algebra9 (L↓ , ∩, ∪, ¬, L↓,> , ∅), L↓ ∈ CFL, where L↓,> denotes the top language in L↓ . In the following, we will write L↓,⊥ to denote the bottom element of the lattice, which can always be chosen to be the empty set ~ ↓,2 ] will ~ ↓,1 ] ∩ L↑,2 [L here. The aim is to ensure that the language equation L↑,1 [L then be an ωCFL, for which it is known that the emptiness problem is decidable [CG77a], and as such we can solve the model-checking problem. Deciding Factor Avoidance Generally, it is not straightforward to decide that no factor of the injector language matches a word of an injected language, since we have to take into account that an arbitrary word of the form w1 w2 g1 w3 w4 of the injector language might 8

The law of non-contradiction actually addresses the complement of the presented equation, i.e. L↑ [L> ] ∩ L↑ [L> ] = Σω , which is equivalent to the equation given in the text. 9 In order to form a Boolean algebra, a language class has to form a lattice, i.e. there has to be a unique top language and a unique bottom language in L↓ , [DP02], where the ordering on the lattice is set inclusion.

126

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

~ ↓ ), expand to a word w1 w2 ww3 w4 , w ∈ L↓,1 , in the almagamated language L↑ (L such that w2 ww3 is also a word of L↓,1 . As such, we have to expand the surrogate terminals in the injector language, whilst ensuring that the factors we are testing ~ ↓ are not expanded surrogate terminals. For the latter, it is of for inclusion in L course the case that they match words of the injected languages, because this is how we defined their expansion. ~ ↓ factor avoiding by reducing this problem to a We determine whether L↑ is L language emptiness test of the language L↑ (L↓,> × {#}+ ) ∩ Σ∗ (L↓,> × (Σ+ {#}+ Σ∗ ∪ Σ∗ {#}+ Σ+ ))Σω {z } | {z } | a. b. where a.) denotes the amalgamation of the top injector language over Σ and the top injected language over Σ × {#}, which covers all words that are possibly expressible by the corresponding language class while at the same time marking injected factors with the letter #, and b.) denotes all words that contain a factor of the supremum injected language, but only proper infixes of words of the injected language contain the letter #. The intersection of these languages is the fragment whose words contain a factor of the supremum injected language, where the latter has a prefix and/or postfix that is not marked with the letter #, which means that at least one letter of the injector language is part of a factor which is a word of an injected language as well. In order to carry out the intersection between the languages that we have annotated a. and b. above, it is necessary that L↓,> is a regular language, so that we intersect two regular languages, rather than two context-free languages. Otherwise, the intersection may result in a context-sensitive language – a result which cannot be algorithmically determined, [HMU01]. Unambiguous Transitions Labelled by Surrogate Terminals Transitions labelled by surrogate terminals are ambiguous, as we have described before, in the sense that for an injector language {g1 }ω over the two surrogate

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

127

terminals g1 and g2 , one cannot simply assume that the complement of the injector language, i.e. {g2 }ω , can be taken to denote the complement of the amalgamated languages. In case their injected languages are not disjunct, i.e. L↓,1 ∩ L↓,2 6= ∅, the intersection of the injector language {g1 }ω and its complement will not be empty either. Similarly, ambiguity can also stem from consecutive appearances of surrogate terminals in the injector language, e.g. the occurrence of g1 g2 in an injected language might expand to a factor in the amalgamated language which is in L↓,1 · L↓,2 , L↓,1 , L↓,2 , or any combination of these languages. In order to ensure that a factor within the amalgamated language can be unambiguously traced back to the surrogate terminal it stems from, we use the fact that the injected languages form a Boolean algebra to construct a B¨ uchiautomaton with unambiguously labelled surrogate transitions, but we need to put one more constraint on the injected languages in order to avoid ambiguity due to consecutive appearing surrogate terminals. We are enforcing the prefix property on injected languages, which implies that a word in some injected language L↓,x , cannot be in L↓,x · L↓,y , L↓,x · L↓,y · L↓,z · · · , etc., for any injected languages x, y, z and so on. In fact, with the prefix property imposed on injected languages, no prefix of any word of an injected language, is a word of the language it is part of or any other injected language. We abbreviate the language class of the context-free languages with the prefix property that form a Boolean algebra by writing BpfCFL. We are now able to prove that ωRL[BpfCFL] has a decidable language inclusion problem. The proof follows along the lines of the proofs of closure under union and complement for ωVPAs in [AM04], which implies that language inclu~ ↓,1 ] ⊂ L↑,2 [L ~ ↓,2 ] is decidable due to the decidability of the equivalent sion of L↑,1 [L ~ ↓,2 ] 6= ∅ for any L↑,1 [L ~ ↓,1 ], L↑,2 [L ~ ↓,2 ] ∈ ωRL[BpfCFL], ~ ↓,1 ] ∩ L↑,2 [L problem L↑,1 [L but there are subtle differences in the complementation proof. A brief overview of the complementation proof is given in the following: 1. we use a B¨ uchi-automaton that models the injector language that is to be complemented, where all states of the automaton have an outgoing transition for all terminals and surrogate terminals (automata completeness),10 10

This is usually achieved by introducing a “failure” state that is non-accepting, to which all states lead to with transitions labelled by all terminals of the automaton’s alphabet. The failure state’s outgoing transitions, again labelled by all terminals, lead to the failure state. Since the state is non-accepting, this does not alter an automaton’s language.

128

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Lk = Li ∪ Lj (a) Combining of parallel transitions

Lk = Li ∩ Lj , Lm = Li \ Lj , Ln = Lj \ Li , when assuming that Li ∩ Lj 6= ∅, Li 6= Lj , (b) Separation of transitions Figure 4.11: Removal of ambiguity in transitions labelled by surrogate terminals 2. ambiguous surrogate transitions whose origin and destination are the same state are combined into a single surrogate transition (see “Removal of parallelism I” in the proof), and 3. surrogate terminals are separated into two or more surrogate terminals, so that the languages they represent are mutually disjoint (see “Removal of parallelism II” in the proof). The first point addresses an extension of the definition of completeness of ordinary B¨ uchi-automata to B¨ uchi-automata that represent injector languages: A B¨ uchi-automaton that represents an injector language is said to be complete, when for every state p, there is a transition to a designated non-accepting failure g↓,> a state qf of the form p → qf , for all terminals a, and a transition p −→ qf , where L(g↓,> ) = L↓,> denotes the top language of L↓ , i.e. the top element of the respective Boolean algebra. Clearly, every B¨ uchi-automaton that represents an injector language can be transformed into a complete B¨ uchi-automaton. In the second step, we combine transitions, which are labelled by surrogate terminals, that start and end in the same states. We have depicted this in Figure 4.11(a), where L(gk ) denotes the language L(gi ) ∪ L(gj ). In the left-hand

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

129

side of Figure 4.11(a), the surrogate terminals gi and gj might be substituted with identical factors in the amalgamated language, which is the case when their corresponding injected languages Li and Lj are not disjoint. The single transition shown in the right-hand side of Figure 4.11(a) is unambiguous, in the sense that any injected factor in the amalgamated language which is due to taking the transition from p to q must stem from surrogate terminal gk . Finally, the third step separates the labelling of surrogate transitions, so that for any factor in the amalgamated language, it is possible to determine which surrogate terminal got substituted.This is depicted in Figure 4.11(b), where gi is split into two surrogate terminals gm and gk , whilst gj is split into the surrogate terminals gn and gk . The rewritten transitions generate the same factors as the automaton advances from p to either q or r, due to Li = Lm ∪Lk and Lj = Ln ∪Lk , but since Lm , Ln and Lk are mutually disjoint,11 factors can be traced back to the surrogate terminals which it generated. This does not imply that the choice of transitions becomes deterministic then, which is straightforward to see by the two outgoing transitions labelled by gk in Figure 4.11(b). With these three steps carried out, surrogate terminals in the injector language can be treated as ordinary terminals, since every factor in the amalgamated language can be traced back to the surrogate terminal it generated. If we now complement the injector language, we are excluding words that would otherwise have been in the complemented language as we have shown in Example 4.7. Example 4.8 (Surrogate Terminals in Complementation Revisited) ~ ↓ ] from Example 4.7, i.e. We reconsider the complementation of the language L↑ [L ~ ↓ ] = {(({cn rn } ∪ {cn rn }) · {$})ω }. The constructhe amalgamated language L↑ [L 1 2 tion of the complementary injector language, which leads to the complementation of the amalgamated language, is carried out step by step. The depicted lattice in Figure 4.10, page 123, is reused to visualise the languages represented by the surrogate terminals used in this example. The B¨ uchi-automaton that represents the injector language L↑ = {g1ω } is depicted in Figure 4.12(a) on the previous page. Any input that is not the surrogate terminal g1 causes the automaton’s transition into the failure state q. For each state, the outgoing transitions are labelled with every terminal as well as 11

It should be noted that other solutions are possible here, depending whether Li and Lj are either identical, one language is properly included in the other, or whether Li and Lj intersect each other. In the example, the latter is assumed.

130

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Figure 4.12: Complementation of an injector language with the surrogate terminals that corresponds to the top injector language L↓,> = {c, r1 , r2 , r3 }ω · {$}, and as such, the automaton is complete. In other words, for every terminal or injected factor that could be generated, the automaton can make a transition whilst being in any of its states. In Figure 4.12(b), the first step of the complementation is shown. Since there are no two distinct transitions which originate and end in the same states and are as well labelled by distinct surrogate terminals, the depicted automaton in Figure 4.12(b) is identical to the automaton in Figure 4.12(a). However, if we q> would have labelled the transition p −→ q less elegantly, for example, if there q1 q2 q3 were several transitions p −→ q, p −→ q and p −→ q, then we would fuse these transitions into a single transition, so that the new surrogate terminal’s corresponding language represents the language L↓,1 ∪ L↓,2 ∪ L↓,3 . Here, the union L↓,1 ∪ L↓,2 ∪ L↓,3 denotes the supremum L↓,> , which coincides with the labelling that we have chosen in the already in Figure 4.12(a). Figure 4.12(c) depicts the second step in the complementation, where we q> q1 q3 replace the transition p −→ q with two transitions p −→ q and p −→ q. For the corresponding languages, the equation L↓,> = L↓,1 ∪ L↓,3 holds, and as such, the automaton of Figure 4.12(c) accepts the same language as the automaton in Figure 4.12(a). The effect of this separation becomes apparent in the next step.

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

131

Finally, Figure 4.12(d) shows the complemented B¨ uchi-automaton, for which the amalgamated language denotes the correct language complementation. The injector language excludes all runs that would result in the language {g1ω }, which is straightforward to determine due to the separation in the previous step. Without the separation, the complementation would be in jeopardy, since the corresponding language of the surrogate terminal g> , L↓,> , includes the language L↓,1 . The terminals g> are not in the language of the B¨ uchi-automaton of Figure 4.12(d), but they cannot be in the automaton’s complement either, since the resulting automaton would then include the language of the original automaton. Theorem 4.2 (Closure of ωRL[BpfCFL] under Union, Intersection and Complement) The language class ωRL[BpfCFL] is closed under union, intersection, complement. ~ ↓,1 ], L↑,2 [L ~ ↓,2 ] ∈ ωRL[BpfCFL]. Proof. Consider two arbitrary languages L↑,1 [L Closure under union: Let A↑,1 = (Q1 , Σ1 , δ1 , qi,1 , F1 ) and A↑,2 = (Q2 , Σ2 , δ2 , qi,2 , F2 ) denote B¨ uchi-automata, so that L(A↑,1 ) = L↑,1 and L(A↑,2 ) = L↑,2 . W.l.o.g., the state spaces Q1 and Q2 are assumed to be disjoint. We construct a B¨ uchi-automaton A0↑ = ({hqi,1 , qi,2 i} ∪ Q1 ∪ Q2 , Σ, δ 0 , hqi,1 , qi,2 i, F1 ∪ F2 ), where a a δ 0 (hqi,1 , qi,2 i, a) = {q | qi,1 −→A↑,1 q or qi,2 −→A↑,2 q} and δ 0 agrees with δ1 and δ2 on all other inputs. Clearly, L(A0↑ ) denotes the union of the injector languages ~ ↓,1 ] and L↑,1 and L↑,2 . It is now straightforward to construct the union of L↑,1 [L ~ ↓,2 ] by relabelling all surrogates gn1 ,1 and gn2 ,2 in A0 , 1 ≤ n1 ≤ |L ~ ↓,1 | L↑,2 [L ↑ ~ ↓,2 |, so that they uniquely match their corresponding injector and 1 ≤ n2 ≤ |L ~ ~ ~0 ~ ↓,1 · L ~ ↓,2 . Let L0 = L(A0 ), then L0 [L ~0 = L languages in L ↑ ↓ ] = L↑,1 [L↓,1 ] ∪ L↑,2 [L↓,2 ] ↓ ↑ ↑ Closure under complement: Let A↑,1 = (Q, Σ, δ, qi , F ) denote a B¨ uchi-automaton, so that L(A↑,1 ) = L↑,1 . W.l.o.g. we assume that the transition function of A↑,1 is complete. We now derive a B¨ uchi-automaton A0↑,1 from A↑,1 , so that L(A0↑,1 ) = L(A↑,1 ), gj gi and for any two transitions p → q, p → q of A↑,1 it holds that L↓,i ∩ L↓,j is the empty set. In other words, in A0↑,1 , when going from p to q while reading a subword that is due to an injected language, we can deterministically determine the transition that accepts the sub-word. Removal of transition ambiguity I: For two given states p and q, let X denote gi the set of surrogates gi for which there is a transition of the form p → q. We remove all transitions from p to q that are labelled by a surrogate and introduce S g0 a single new transition p → q, with L(L0↓ ) = gi ∈X L↓,i .

132

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES gi

Removal of transition ambiguity II: Each transition of the form p → q is gj0 T replaced by transitions of the form p → q, where L0↓,j = L↓,i ∩ gk ∈X L↓,k ∩ Sm0 T n=1 {gn } and m0 is the new upper bound due to the gk 6∈X L↓,k , where X ∈ 2 parallelism removal before. From A0↑,1 we can now construct a deterministic Muller-automaton A00↑,1 = S 00 00 00 00 (Q00 , Σ∪ m n=1 {gn }, δ , qi , F ) using Safra’s construction, [Saf88]. Safra’s construction is applicable for compact B¨ uchi-automata, since for compact B¨ uchi-automata every transition accepts unique sub-words, which implies that surrogates can be treated as ordinary terminals. S 00 Since A↑,1 was chosen to be complete, we have L↓,> = m n=1 L(L↓,n ). For an arbitrary accepting or non-accepting run of A00↑,1 , at any position of the run a transition representing a sub-word in Σ ∪ L↓,> can be taken. Hence, one can ~ ↓,1 ]. complement A00↑,1 to obtain the language complement of A↑,1 , and thus, L↑,1 [L Closure under intersection: Follows from closure under union and complement ~ ↓,1 ] ∪ L↑,2 [L ~ ↓,2 ]. ~ ↓,1 ] ∩ L↑,2 [L ~ ↓,2 ] = L↑,1 [L  as L↑,1 [L Corollary 4.2 The language inclusion problem for ωRL[BpfCFL] is decidable.

4.4.4

Examples of Context-Free Property Verification

We are providing some practical examples of our theoretic work, where we exclusively focus on expressing properties that are beyond the expressiveness of ωVPLs, and which concretise the example we gave in the introduction earlier on. Verification of Data-Stack Behaviour Data-stack behaviour, or synonymously first-in/last-out queue behaviour, was introduced in Chapter 1 along with a graphical representation of an exemplary matching structure given in Figure 1.2. Here, we give a concrete language representation of a simplified version of the language presented in Example 4.1 – where we only consider one producer here – that includes the construction of an appropriate injector and injected language, the provision of the Boolean algebra related to the injected language, and a very brief insight into a possible generation of a real-world program abstraction.

4.4. BEYOND VISIBLY PUSHDOWN PROPERTIES

133

L0 · {rc }

L↓ ∩ (L0 · {rc })

L↓

∅ L0 = {cp } · (Σ \ {rc })∗ · {rp cc } · (Σ \ {rc })∗ Figure 4.13: Lattice of the Boolean algebra of Example 4.9 Example 4.9 (Verification of Data-Stack Behaviour) We consider a recurring producer/consumer pattern between two functions in a program, where the first function, cp , represents the producer that transfers data onto a stack, and the second function, cc , represents the consumer which empties the said stack. Formally, we can express this by the language amalgamation L↑ [L↓ ] with the injected language L↓ = {cp wrp cc wR rc | w ∈ Σ+ d} denoting the production and consumption of data atoms in Σd = {a, b, c, d}, while the infinite repetition is given by the straightforward injector language L↑ = {g1 }ω . We choose the Boolean algebra associated with the injected language to be (L↓ , ∩, ∪, ¬, L0 · {rc }, ∅) over the lattice given in Figure 4.13.12 L↓ is a deterministic CFL, which is known to be closed under complementation and intersection with a regular language, so that the language L↓ ∩ (Σ \ {rc })∗ · {rc } is indeed a deterministic CFL with the prefix property due to the endmarker rc . The behaviour of a system that is to be verified can be abstracted in the following way: • behaviour that does not match a word in L0 · {rc } is modelled as ω-regular behaviour in the injector language • behaviour that does match a word in L0 · {rc } is modelled in an injected language, where – data-stack behaviour as we expect it is abstracted as L↓ , – data-stack behaviour that is part of a non-deterministic choice, is abstracted as L0 · {rc }, and – other behaviour is abstracted as L↓ ∩ (L0 · {rc }). 12

In Example 4.9, we write L0 in order to abbreviate {cp } · (Σ \ {rc })∗ · {rp cc } · (Σ \ {rc })∗ .

134

CHAPTER 4. MODEL-CHECKING CONTEXT-FREE PROPERTIES

Verification of Counting Properties Recently, an extension of VPLs over finite words has been given in terms of synchronised pushdown automata, [Cau06]. The language class defined by synchronised pushdown automata is properly contained within the deterministic CFLs, but exceeds the expressiveness of VPLs. Synchronised pushdown automata are ordinary pushdown automata whose transition graph can be “synchronised” by a finite state transducer13 T from Σ∗ to weights in Z. For a fixed transducer T , the class of synchronised pushdown languages forms a Boolean algebra in respect to T [Cau06, Theorem 5.5]. Synchronised pushdown automata can be taken as a representation for an injected language, since they satisfy the requirements for a decidable language inclusion problem as given before: • they form a Boolean algebra for a given transducer T , [Cau06], • the languages of synchronised pushdown are contained in the CFLs, and thus, their emptiness problem is decidable, [HMU01], and • synchronised pushdown automata languages are constructible for which the prefix property holds as described below. The languages of synchronised pushdown automata do not have the prefix property by default, which can be easily inferred, since they are an extension of regular languages which do not share this property, [HMU01]. It is therefore necessary to construct languages of synchronised pushdown automata which have the prefix property, e.g. by introducing an end-marker as we have done before. We have demonstrated this in the previous example, where the injected language can be formulated as all words of producers and consumers, so that the stack is empty at the beginning and end of each word. Since in that case the terminal rc appears exclusively at the end of each word, the language has the prefix property. A similar approach can be taken for injecting synchonised pushdown languages.

4.5

Summary

In this chapter, we have given a grammatical representation of ωVPLs. The new formalism was then used to formalise a generalised language theoretic approach, 13

A transducer is a finite-state automaton with additional output-lettering on its transitions.

4.5. SUMMARY

135

which was proven to be more expressive than ωVPLs, whilst its language inclusion problem is still decidable. Concrete extensions beyond ωVPLs were given by example, where we addressed data-stack behaviour and counting properties.

Chapter 5 Conclusions We addressed the formal verification of software systems in the context of traceand model-checking,1 both of which permit the fully automated verification of software behaviour against its purported behavioural specification. Our studies focused on concatenation operators, i.e. operators for specifying trace composition, and their expressiveness power regarding the respective verification approaches.

5.1

Introduction

In the previous chapters we addressed the formal verification of software by traceand model-checking, where in the case of trace-checking we specifically focused on runtime-verification, i.e. the inspection of software programs whilst they are executed. Our contributions in this thesis can be summarised as follows: • We considered the runtime-verification logic Eagle and – showed the expressive equivalence between two trace-composition operators, sequential composition and concatenation, – introduced deterministic variants of the trace-composition operators, – showed that the deterministic operators are already expressible in Eagle, even in a fragment of Eagle that is covered in Eagle’s implementation, 1

As a reminder, model-checking is a generalisation of trace-checking, where all possible traces of a system’s behaviour are inspected.

136

5.1. INTRODUCTION

137

– proved that some of the deterministic operators can be evaluated requiring only a linear number of evaluation steps in size of the trace, while others require the same number of evaluation steps as the original non-deterministic operators, and • approached model-checking for which we – gave a grammatical representation of ω-visibly pushdown languages, – defined and visualised the matching structure of data-stack behaviour and counting properties and showed that these properties are beyond the expressiveness of ω-visibly pushdown languages, and – extended the expressiveness of model-checking beyond ω-visibly pushdown languages by proving decidability of a language fragment which includes the aforementioned behaviours. We restate the results given in this thesis in the following sections, but in a concise form and with emphasis on its practical applications. We also point out potential further work that could build upon our findings, where we state conjectures about particularly interesting open problems. The discussions are spread over the next two sections respectively, where Section 5.2 refers to Chapter 3 on runtime-verification and Section 5.3 refers to Chapter 4 on model-checking.

5.2

Trace-Composition Operators in the Runtime-Verification Logic Eagle

In the runtime-verification logic Eagle, two operators are defined that permit the composition of traces, namely, sequential composition and concatenation. Formal semantics are given for both operators in Eagle. However, it was not clear whether the operators could be formulated syntactically in terms of each other, or whether one of the operators might be more expressive than the other. We showed that both operators could be expressed in terms of the other, and we provided appropriate translations in both directions. Additionally, the translations we gave are part of the guarded fragment of Eagle, which is the fragment of the logic that has been implemented as a runtime-verification framework in Java, and as such, our work transgresses from the theoretical realm to practical applications in Eagle’s implementation.

138

CHAPTER 5. CONCLUSIONS

It is known that for actual verification problems the trace-composition operators in Eagle are relatively expensive to evaluate, since both require a quadratic number of evaluation steps in size of the trace due to their non-deterministic semantics. Consequently, we introduced deterministic variants of sequential composition and concatenation, studied their expressiveness, and examined their complexity during evaluation. The expressiveness of the deterministic variants does not exceed the expressiveness of the non-determinsitic trace-composition operators, as we proved, where we also provided translations into the guarded fragment of Eagle. However, when considering the calculus on which Eagle’s implementations is based, we proved that some of the deterministic variants only require a linear number of evaluation steps in the size of the trace under inspection to evaluate, whilst the remaining deterministic variants require the same amount of evaluation steps as their non-deterministic counterparts. Further Work In the motivational discussion of the the deterministic trace-composition operators, we defined in Example 3.5 the conditional concatenation operator bF1 c→ · F2 , which is derived from the deterministic operators in the sense that its semantics are formulated as syntactic sugar for an otherwise larger formula in deterministic operators. It is of interest if it is possible to encode this operator, and further practically relevant operators in Eagle’s calculus, besides straightforward translations from the logic into the calculus, so that an efficient on-line monitoring complexity is achieved. It is also of interest whether further deterministic cut operators exist, that permit efficient on-line monitoring, but with constraints different from those presented in this thesis. This is by no means a straightforward research question. For example, a slight variation on our definition of bF1 c · F2 , so that the formula is satisfied on the first cut for which F1 · F2 is satisfied, cannot be evaluated in a linear number of evaluations steps in size of the trace under inspection by the same argumentation that we have given in the proof of Theorem 3.6 any more.2 2

As a reminder, the proof of Theorem 3.6 establishes that concatenation, sequential composition, and their mixfix variants with trace length restriction placed on the right-hand operator require O(|σ|2 ) evaluation steps on a trace σ, because it has to be assumed that each state among the trace could be a potential viable cut position, while the correct cut position is determined as the end of the evaluation is reached.

5.2. TRACE-COMPOSITION OPERATORS IN EAGLE

139

Recently, the rule-based rewriting system RuleR has been introduced, which is a derivative of Eagle whose aim is to permit more efficient runtime-verification, [BRH07]. RuleR does not have concatenation operators per se, and as such, it is not clear how call- or data-stack behaviour can be formulated in the system. In consideration of runtime-verification, a comparison between the efficiency of evaluating behavioural properties that we have shown are easily formulated in terms of concatenation operators in Eagle and RuleR, would not only be of practical interest, but it would also lead to a better understanding of how these behavioural properties are best evaluated.

5.3

Model-Checking Context-Free Properties

From our work on Eagle, we have already seen that Eagle’s syntax, and partly its semantics, are very close to grammars in formal language theory, where specifications resemble context-free grammars to a certain degree. Eagle’s concatenation operator allowed us to specify call-stack behaviour, data-stack behaviour and counting properties, in a straightforward manner. In the context of modelchecking, we have demonstrated in the introduction of this thesis that these behaviours and properties are expressible as context-free languages. Contextfree properties play an important role in model-checking, where it has recently been shown that model-checking is decidable for ω-visibly pushdown languages (ωVPLs). The expressiveness of ωVPLs is sufficiently rich to permit the modelchecking of call-stack behaviour, but not expressive enough for model-checking data-stack behaviour or counting properties. Inspired by Eagle’s syntax, which resembles grammars in great detail, we provided a grammatical representation of ωVPLs that enabled us to study the patterns of regular and non-regular behaviour in ωVPLs similar to the previously introduced pseudo-runs and summary-edges in [AM04]. However, we could provide examples that show that our grammatical representation can be used concisely formulate matching properties and behaviour, due to their denotational expression in productions of grammars. Based on the grammatical representation of ωVPLs, we then introduced a purely language theoretic approach to express call- and data-stack behaviour as well as counting properties. Our work focused on ω-regular languages that may

140

CHAPTER 5. CONCLUSIONS

contain designated surrogate terminals – injector languages – and injected languages, whose words can be substituted for said surrogate terminals. The amalgamation of injector and injected languages was then studied under the premise to provide a language class for which model-checking is decidable. Unlike ωVPLs, our presented language class has no set boundaries, in the sense that we have formulated constraints that have to apply for both injector and injected languages, so that it is possible to choose candidates for injected languages among various language classes – provided that the constraints we have given apply to these languages. By giving actual verification problems that lie beyond the expressiveness of ωVPLs, but are covered by the amalgamated injector- and injected language class we have given in Chapter 4, we have supported the practical relevance of our work. Further Work It is unknown whether there are other practically relevant behaviours or properties besides the examples we have given that are expressible in the language classes that we have studied. Since our model-checking approach allows us to give equiexpressive descriptions of systems’ behaviour and behavioural specifications, it is desirable to find more practically relevant behaviours/properties that are of concern in the software development community. It is desirable to abstract as little from actual programs’ behaviours, so that the behavioural verification can address more behavioural patterns and behavioural properties in order to find errors in their implementations. Further studies should also address the computational complexity required for model-checking behaviours that match our language class that is expressively beyond ωVPLs. This is of practical interest as well, since it allows us to make assumptions about memory and time requirements by model-checker implementations, but it would give theoretical insights about the general structure of the languages as well. While we showed in Chapter 4 that language extensions of ω-visibly-pushdown languages by introducing morphisms which permit expressing data-stack behaviour or counting properties is not advisable, we conjecture that a morphism can be used to translate any of our extended languages into a ω-visibly-pushdown language – and back. The latter would imply that modelchecking our extended language class is not harder to model-check than it is the case for ω-visibly-pushdown languages.

5.3. MODEL-CHECKING CONTEXT-FREE PROPERTIES

141

Our results also give rise to a logical representation in fixed-point temporal logic. In [Lan02], the expressive equivalence between alternating contextfree grammars and linear-time fixed-point logic with sequential composition has been shown. Since the expressiveness of the latter grammars/logic includes the ω-context-free languages, the languages presented in this thesis are trivially included in the language class formulated in [Lan02]. It is straightforward to give a fixed-point logic representation of ωVPLs due to the grammatical representation presented, where a suitable translation from grammars into fixed-point logic was given in [Lan02], whilst for our presented extended language class it would be necessary to find a suitable grammatical characterisation first. Candidates of such grammars are available, including regarding their closure properties and other language theoretic aspects exist, [Okh01, Okh04, MHHO05]. From the logical characterisation, it might the be viable to carry forward the results on deterministic trace-composition operators forward to model-checking applications.

Appendix A Proofs Theorem 3.5 The semantics of Eagle[] ’s calculus (Definition 3.7) coincide with the semantics of the corresponding logic (Definition 3.5). Proof. We only prove that the semantics of the concatenation mixfix operators coincide, since the semantic equivalence for sequential composition can be proven along the same lines. Semantical equivalence of bF1 c · F2 : The calculus for these mixfix operators coincides with the corresponding non-mixfix operators, except for the evaluation rule eval hh. . .ii. In case of the mixfix operators, the non-deterministic choice in the then-path of eval hh. . .ii is removed, which causes a forced cut as soon as F1 is satisfied. It is straightforward to see that these semantics coincide with the semantics of the logic. Semantical equivalence of dF1 e · F2 : We are proving the semantic equivalence inductively. For each possible argument of the trace length restricted operand, we first provide the semantics due to the logic, and second, we show the equivalent result due to Eagle[] ’s calculus. In the following, we assume n is a natural number and 0 ≤ n ≤ i. Case dFalsee · F2 : Evaluation in Eagle[] : 1. σ, i |= dFalsee · F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D False and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= False 2. σ, i 6|= dFalsee · F2 142

143 Evaluation in Eagle[] ’s calculus: On the empty trace: 1. valuehhinithhdFalsee · F2 , null, nulliiii 2. valuehhLMxConcat( inithhFalse, null, nullii, inithhF2 , null, nullii, null)ii 3. valuehhLMxConcat(False, inithhF2 , null, nullii, null)ii 4. valuehhnullii 5. False On a non-empty trace: 1. valuehheval hh. . . eval hheval hhinithhdFalsee · F2 , null, nullii, si−n ii, si−n+1 ii . . ., s|σ| iiii,

2. valuehheval hh. . . eval hheval hhLMxConcat( inithhFalse, null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 3. valuehheval hh. . . eval hheval hhLMxConcat( False, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 4. valuehheval hh. . . eval hhLMxConcat( False, inithhF2 , null, nullii, False), si−n+1 ii . . . , s|σ| iiii .. . 5. valuehhLMxConcat(False, inithhF2 , null, nullii, False)ii 6. valuehhFalseii 7. False Case dTruee · F2 : Evaluation in Eagle[] : 1. σ, i |= dTruee · F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D True and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= True

144

APPENDIX A. PROOFS

2. σ, i |= dTruee · F2 iff σ [1,|σ|] , i |=D True and σ [|σ|+1,|σ|] , 1 |=D F2 3. σ, i |= dTruee · F2 iff σ [|σ|+1,|σ|] , 1 |=D F2 Evaluation in Eagle[] ’s calculus: On the empty trace: 1. valuehhinithhdTruee · F2 , null, nulliiii 2. valuehhLMxConcat( inithhTrue, null, nullii, inithhF2 , null, nullii, null)ii 3. valuehhLMxConcat( True, inithhF2 , null, nullii, null)ii 4. valuehhinithhF2 , null, nulliiii On a non-empty trace: 1. valuehheval hh. . . eval hheval hhinithhdTruee · F2 , null, nullii, si−n ii, si−n+1 ii . . ., s|σ| iiii 2. valuehheval hh. . . eval hheval hhLMxConcat( inithhTrue, null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 3. valuehheval hh. . . eval hheval hhLMxConcat( True, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 4. valuehheval hh. . . eval hhLMxConcat( True, inithhF2 , null, nullii, eval hhinithhF2 , null, nullii, si−n ii), si−n+1 ii . . . , s|σ| iiii 5. valuehheval hh. . . eval hhLMxConcat( True, inithhF2 , null, nullii, eval hhinithhF2 , null, nullii, si−n+1 ii), si−n+2 ii . . . , s|σ| iiii .. . 6. valuehhLMxConcat( True, inithhF2 , null, nullii, eval hhinithhF2 , null, nullii, s|σ| ii)ii 7. valuehhinithhF2 , null, nulliiii Case dexpressione · F2 : Evaluation in Eagle[] :

145 1. σ, i |= dexpressione·F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D expression and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= expression 2. If expression is True in σ(i): σ, i |= dTruee · F2 3. If expression is False in σ(i): σ, i |= dFalsee · F2 Evaluation in Eagle[] ’s calculus: On the empty trace: 1. valuehhinithhdexpressione · F2 , null, nulliiii 2. valuehhLMxConcat( inithhexpression, null, nullii, inithhF2 , null, nullii, null)ii 3. valuehhLMxConcat( expression, inithhF2 , null, nullii, null)ii 4. valuehhnullii 5. False On a non-empty trace: 1. valuehheval hh. . . eval hheval hhinithhdexpressione · F2 , null, nullii, si−n ii, si−n+1 ii . . ., s|σ| iiii 2. valuehheval hh. . . eval hheval hhLMxConcat( inithhexpression, null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 3. valuehheval hh. . . eval hhLMxConcat( expression, inithhF2 , null, nullii, null), si−n ii . . . , s|σ| iiii Note: valuehhexpressionii is False 4. valuehheval hh. . . eval hhLMxConcat( eval hhexpression, s1 ii, inithhF2 , null, nullii, null), si−n+1 ii . . . , s|σ| iiii 5(a). In the case eval hhexpression, si−n ii = True: valuehheval hh. . . eval hhLMxConcat( True, inithhF2 , null, nullii, null), si−n+1 ii . . . , s|σ| iiii 5(b). otherwise valuehheval hh. . . eval hhLMxConcat( False, inithhF2 , null, nullii, null), si−n+1 ii . . . , s|σ| iiii

146

APPENDIX A. PROOFS

We continue by induction over True and False respectively. Case d¬F1 e · F2 : Evaluation in Eagle[] : 1. σ, i |= d¬F1 e · F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D ¬F1 and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= ¬F1 2. σ, i |= d¬F1 e · F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i 6|=D F1 and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i 6|= F1 Evaluation in Eagle[] ’s calculus: On the empty trace: 1. valuehhinithhd¬F1 e · F2 , null, nulliiii 2. valuehhLMxConcat( inithh¬F1 , null, nullii, inithhF2 , null, nullii, null)ii 3. valuehhLMxConcat( ¬inithhF1 , null, nullii, inithhF2 , null, nullii, null)ii According to the calculus the truth value is determined by: if valuehh¬inithhF1 , null, nulliiii = True then valuehhinithhF2 , null, nulliiii else valuehhnullii We remove the negation by inverting the test condition, so that if valuehhinithhF1 , null, nulliiii 6= True then valuehhinithhF2 , null, nulliiii else valuehhnullii On a non-empty trace: 1. valuehheval hh. . . eval hheval hhinithhd¬F1 e · F2 , null, nullii, si−n ii, si−n+1 ii . . ., s|σ| iiii 2. valuehheval hh. . . eval hheval hhLMxConcat( inithh¬F1 , null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii

147 3. valuehheval hh. . . eval hheval hhLMxConcat( ¬inithhF1 , null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii Note: In the calculus, the negation can be removed by inverting the test condition, i.e. if valuehh¬inithhF1 , null, nulliiii = True then becomes if valuehhinithhF1 , null, nulliiii 6= True then Let ϕ denote the outcome of the test, then we continue as

4. valuehheval hh. . . eval hheval hhLMxConcat( eval hh¬inithhF1 , null, nullii, si−n ii, inithhF2 , null, nullii, ϕ), si−n+1 ii, si−n+2 ii . . . , s|σ| iiii 5. valuehheval hh. . . eval hheval hhLMxConcat( ¬eval hhinithhF1 , null, nullii, si−n ii, inithhF2 , null, nullii, ϕ), si−n+1 ii, si−n+2 ii . . . , s|σ| iiii We can remove the negation by inverting the test again, so that if valuehh¬eval hhinithhF1 , null, nullii, s1 iiii = True then becomes if valuehheval hhinithhF1 , null, nullii, s1 iiii 6= True then .. . This carries forward for the whole trace, where inverting the test condition in the calculus results in the negation of the semantics accordingly. Cases dF1 ∧ F2 e · F3 , dF1 ∨ F2 e · F3 , d#F1 e · F2 , d F1 e · F2 , dF1 · F2 e · F3 , dF1 ; F2 e · F3 : We omit the proof for these cases, as they can be proven by straightforward application of the rules of Eagle[] ’s calculus. Case dN (F~ , P~ )e · F2 : Evaluation in Eagle[] : 1. σ, i |= dN (F~ )e · F2 iff ∃j. i ≤ j ≤ |σ| + 1 and σ [1,j−1] , i |=D N (F~ ) and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= N (F~ ) 2. σ, i |= dN (F~ )e · F2 iff ∃j. i < j ≤ |σ| + 1 and σ [1,j−1] , i |=D F [F~ /~x] and σ [j,|σ|] , 1 |=D F2 and ¬∃k.j ≤ k ≤ |σ| and σ [1,k] , i |= F [F~ /~x]

148

APPENDIX A. PROOFS where (N (T~ ~x) = F ) ∈ D

3. σ, i |= dN (F~ )e · F2 iff σ [1,i−1] , i |=D N (F~ ) and σ [i,|σ|] , 1 |=D F2 and ¬∃k.i ≤ k ≤ |σ| and σ [1,k] , i |= N (F~ ) 4. σ, i |= dN (F~ )e · F2 iff (max N (T~ ~x) = F ) ∈ D and σ [i,|σ|] , 1 |=D F2 and ¬∃k.i ≤ k ≤ |σ| and σ [1,k] , i |= N (F~ ) Evaluation in Eagle[] ’s calculus: On the empty trace: 1. valuehhinithhdN (F~ , P~ )e · F2 , null, nulliiii 2. valuehhLMxConcat( inithhN (F~ , P~ ), null, nullii, inithhF2 , null, nullii, null)ii 3. valuehhLMxConcat( N (ρb.inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii, P~ ), inithhF2 , null, nullii, null)ii 4(a). If N is defined as maximal rule: valuehhinithhF2 , null, nulliiii 4(b). If N is defined as minimal rule: valuehhnullii, and thus, False On a non-empty trace: 1. valuehheval hh. . . eval hheval hhinithhdN (F~ , P~ )e · F2 , null, nullii, si−n ii, si−n+1 ii . . ., s|σ| iiii 2. valuehheval hh. . . eval hheval hhLMxConcat( inithhN (F~ , P~ ), null, nullii, inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii 3. valuehheval hh. . . eval hheval hhLMxConcat( N (ρb.inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii, P~ ), inithhF2 , null, nullii, null), si−n ii, si−n+1 ii . . . , s|σ| iiii Note: Depending on whether N is a maximal or minimal fixed-point, valuehhN (ρb.inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii, P~ )ii will evaluate to True or False. As such, the third argument of LMxConcat(. . .) changes accordingly. We abbreviate the respectively chosen formula as ϕ.

149 4. valuehheval hh. . . eval hheval hhLMxConcat( eval hhN (ρb.inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii, P~ ), s1 ii, inithhF2 , null, nullii, ϕ), si−n+1 ii, si−n+2 ii . . . , s|σ| iiii Since N is a fixed-point formula, we unwind the fixed-point equation:

5. valuehheval hh. . . eval hheval hhLMxConcat( eval hh(inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii [ρb.inithhF [inithhF~ , N (F~ , P~ ), bii/F~ ]ii/N (b, P~ )])[eval hhP~ , s1 ii/~p], si−n ii, inithhF2 , null, nullii, ϕ), si−n+1 ii, si−n+2 ii . . . , s|σ| iiii

The correct semantics then follows by induction. Case dϕe · F2 , where ϕ is any mixfix operator formula: By induction. Semantic equivalence of F1 · bF2 c: We do not restate the steps performed by Eagle[] ’s calculus here, since the evaluation rules are very similar to the leftmaximal mixfix operator cases. The difference in the evaluation is only due to the rules valuehhRMnConcat(F1 , F2 , F3 )ii = if valuehhF2 ii = True then valuehhF1 ii else valuehhF3 ii eval hhRMnConcat(F1 , F2 , F3 ), sii = RMnConcat(α, F2 , eval hhList(valuehhF1 ii, F2 , F3 ), sii)

At each stage of the evaluation, the head of the list represents the most recent cut being made. With each new evaluation step, previous cuts are moved towards the end of the list, while the evaluation continues for each entry of the list. Finally, valuehhRMnConcat(F1 , F2 , F3 )ii determines the truth value of the evaluation by choosing the first cut in the list for which F2 evaluates to True. Since the cuts in the list are ordered from rightmost cut (head) to leftmost cut (tail), we have chosen the shortest sub-trace on which F2 holds. Semantic equivalence of F1 · dF2 e: Semantic equivalence follows from the previous proof for F1 · bF2 c. For the maximal mixfix operators, the list is populated

150

APPENDIX A. PROOFS

in reverse order, due to the rules: eval hhRMxConcat(F1 , F2 , F3 ), sii = RMxConcat(α, F2 , eval hhAppend(F3 , List(valuehhF1 ii, F2 , null)), sii) valuehhRMxConcat(F1 , F2 , F3 )ii = valuehhAppend(F3 , List(valuehhF1 ii, F2 , null))ii

As such, the list contains evaluations starting with the longest sub-trace on which F2 could satisfied on, followed by the second longest, etc. 

Index automaton

co-terminal, 100

B¨ uchi-automaton, 92

concatenation, 37

Muller-automaton, 92

injected language, 120

pseudo-run, 95

injector language, 120

visibly-pushdown automaton, 94

Kleene-closure, 120 language, 38

behaviour

law of non-contradiction, 125

1 : 1 matching behaviour, 88

law of the excluded middle, 125

1 : 1-matching behaviour, 117

letter, 37

1 : 2 matching behaviour, 88

regular expression, 38

1 : 2-matching behaviour, 117

superficial mapping, 105

call-stack behaviour, 22, 23, 49, 87

surrogate terminal, 97, 127

compositional property, 47

terminal, 40

counting property, 22, 25, 49, 89, 134

word, 37 word length, 37

data-stack behaviour, 22, 24, 40, 49, grammar 88, 132 ω-regular grammar, 103 fairness property, 22 matching avoiding, 104

ω-regular grammar with injected quasibalanced grammar, 104

matching behaviour, 88

balanced grammar, 100

minimally well-matched word, 95

co-terminal, see formal language

safety property, 22, 46, 87

context-free grammar, 40

summary-edge, 95

immediate matching context-free grammar, 114

liveness property, 22, 46, 87

Boolean algebra, see formal language

injected grammar, 105 formal language

injector grammar, 105

L-factor avoidance, 121

non-terminal, 39

alphabet, 37

production, 39

Boolean algebra, 125

quasi balanced grammar, 101 151

152

INDEX rewriting, 39 summary-edge, see behaviour superficial mapping, see formal language surrogate terminal, see formal language terminal, see formal language

Kleene-closure, see formal language law of non-contradiction, see formal language law of the excluded middle, see formal language logic monadic second-order logic, 109 monadic second-order logic with call/returnmatching relation, 110 matching, see behaviour matching avoiding, see behaviour minimally well-matched word, see behaviour operator concatenation, 46 conditional concatenation, 51 deterministic concatenation, 62 deterministic cut operator, 62 deterministic sequential composition, 62 evaluation complexity, 80 minimal cut operator, 62 mixfix operator, 62 sequential composition, 46 trace-length restricted, 50 property, see behaviour pseudo-run, see automaton

Bibliography [ABG+ 05] C. Artho, H. Barringer, A. Goldberg, K. Havelund, S. Khurshid, M. Lowry, C. Pasareanu, G. Rosu, K. Sen, W. Visser, and R. Washington. Combining test case generation and runtime verification. Theoretical Computer Science, 336(2–3):209–234, 2005. [ACM06] R. Alur, S. Chaudhuri, and P. Madhusudan. A fixpoint calculus for local and global program flows. In Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 153–165. ACM Press, 2006. [AEM04] R. Alur, K. Etessami, and P. Madhusudan. A temporal logic of nested calls and returns. In K. Jensen and A. Podelski, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 2988 of LNCS, pages 467–481. Springer-Verlag, 2004. [AM04] R. Alur and P. Madhusudan. Visibly pushdown languages. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, pages 202–211. ACM Press, 2004. [AM05] R. Alur and P. Madhusudan. Visibly pushdown languages. http://www.cis.upenn.edu/˜alur/Stoc04.pdf, 2005. (available on 30th September 2008). [AM06] R. Alur and P. Madhusudan. Adding nesting structure to words. In O.H. Ibarra and Z. Dang, editors, Developments in Language Theory, 10th International Conference, DLT 2006, volume 4036 of LNCS, pages 1–13. Springer-Verlag, 2006. [AS87]

B. Alpern and F.B. Schneider. Recognizing safety and liveness. Distributed Computing, 2(3):117–126, 1987. 153

154

BIBLIOGRAPHY

[BB86]

B. Banieqbal and H. Barringer. A study of an extended temporal logic and a temporal fixed point calculus. Technical Report UMCS-86-10-2, University of Manchester, 1986.

[BB87]

B. Banieqbal and H. Barringer. Temporal logic with fixed points. In B. Banieqbal, H. Barringer, and A. Pnueli, editors, Temporal Logic in Specification, volume 398 of LNCS, pages 62–74. Springer-Verlag, 1987.

[BB02]

J. Berstel and L. Boasson. Balanced grammars and their languages. In W. Brauer, H. Ehrig, J. Karhum¨aki, and A. Salomaa, editors, Formal and Natural Computing, LNCS, pages 3–25. Springer-Verlag, 2002.

[BB07]

J. Baran and H. Barringer. A grammatical representation of visibly pushdown languages. In D. Leivant and R. de Queiroz, editors, Proceedings of the 14th Workshop on Logic, Language, Information and Computation, WoLLIC’07, volume 4576 of LNCS, pages 1–11. Springer Verlag, 2007.

[BB08]

J. Baran and H. Barringer. Forays into concatenation and sequential composition in EAGLE. In M. Leucker, editor, 8th International Workshop on Runtime Verification, RV’08 , Budapest, Hungary, volume 5289 of LNCS, pages 69–85. Springer-Verlag, 2008.

[BBF+ 98] B. B´erard, M. Bidot, A. Finkel, F. Laroussinie, A. Petit, L. Petrucci, and P. Schnoebelen. Systems and Software Verification: ModelChecking Techniques and Tools. Springer Verlag, 1998. [BCM+ 90] J.R. Burch, E.M. Clarke, K.L. McMillan, D.L. Dill, and L.J. Hwang. Symbolic model checking: 1020 states and beyond. In LICS’90: 5th Annual IEEE Symposium on Logic in Computer Science, pages 428– 439, 1990. [BEM97] A. Bouajjani, J. Esparza, and O. Maler. Reachability analysis of pushdown automata: Application to model-checking. In A. Mazurkiewicz and J. Winkowski, editors, Proceedings of the 8th International Conference on Concurrency Theory, CONCUR’97, Warsaw, Poland, volume 1243 of LNCS, pages 135–150. Springer-Verlag, 1997.

BIBLIOGRAPHY

155

[BGHS03] H. Barringer, A. Goldberg, K. Havelund, and K. Sen. EAGLE monitors by collecting facts and generating obligations. Technical Report CSPP-26, University of Manchester, October 2003. [BGHS04a] H. Barringer, A. Goldberg, K. Havelund, and K. Sen. Program monitoring with LTL in EAGLE. In 18th International Parallel and Distributed Processing Symposium, IPDPS 2004. IEEE Computer Society, 2004. [BGHS04b] H. Barringer, A. Goldberg, K. Havelund, and K. Sen. Rule-based runtime verification. In Verification, Model Checking, and Abstract Interpretation: 5th International Conference, VMCAI 2004, volume 2937 of LNCS, pages 44–57. Springer-Verlag, 2004. [BJNT00] A. Bouajjani, B. Jonsson, M. Nilsson, and T. Touili. Regular model checking. In E.A. Emerson and A.P. Sistla, editors, Computer Aided Verification, 12th International Conference, CAV 2000, volume 1855 of LNCS, pages 403–418. Springer Verlag, 2000. [BKP84] H. Barringer, R. Kuiper, and A. Pnueli. Now you may compose temporal logic specifications. In STOC ’84: Proceedings of the 16th Annual ACM Symposium on Theory of Computing, pages 51–63. ACM Press, 1984. [BRH07] H. Barringer, D. Rydeheard, and K. Havelund. Rule systems for run-time monitoring: From EAGLE to RULER. In O. Sokolsky and S. Ta¸sıran, editors, Proceedings of the 11th International Workshop, RV’07, Vancouver, Canada, volume 4839 of LNCS, pages 111–125. Springer-Verlag, 2007. [BS94]

O. Burkart and B. Steffen. Pushdown processes: Parralel composition and model checking. In B. Jonsson and J. Parrow, editors, Concurrency Theory, 5th International Conference, CONCUR 1994, volume 836 of LNCS, pages 98–113. Springer Verlag, 1994.

[B¨ uc62] J.R. B¨ uchi. On a decision method in restricted second order arithmetic. In E. Nagel, P. Suppes, and A. Tarski, editors, Proceedings of the International Congress on Logic, Method, and Philosophy of Science, pages 1–12. Stanford University Press, 1962.

156

BIBLIOGRAPHY

[Cau06] D. Caucal. Synchronization of pushdown automata. In O.H. Ibarra and Z. Dang, editors, Developments in Language Theory, 10th International Conference, DLT 2006, volume 4036 of LNCS, pages 120–132. Springer Verlag, 2006. [CG77a] R.S. Cohen and A.Y. Gold. Theory of ω-languages. I: Characterizations of ω-context-free languages. Journal of Computer and System Sciences, 15:169–184, 1977. [CG77b] R.S. Cohen and A.Y. Gold. Theory of ω-languages. II: A study of various models of ω-type generation and recognition. Journal of Computer and System Sciences, 15:185–208, 1977. [CGP99] E.M. Clarke, O. Grumberg, and D.A. Peled. Model Checking. The MIT Press, 1999. [CHMP81] A. Chandra, J. Halpern, A. Meyer, and R. Parikh. Equations between regular terms and an application to process logic. In STOC ’81: Proceedings of the 13th Annual ACM Symposium on Theory of Computing, pages 384–390. ACM Press, 1981. [Cho56] N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113–124, 1956. [CT98]

M. Clark and O. Thyen. The Concise Oxford-Duden German Dictionary: German-English / English-German. Oxford University Press, 1998.

[DP02]

B.A. Davey and H.A. Priestley. Introduction to Lattices and Order. Cambridge University Press, second edition, 2002.

[Dru06] D. Drusinsky. Modeling and Verification using UML Statecharts : a working guide to reactive system design, runtime monitoring, and execution-based model checking. Newnes, 2006. [EH00]

K. Etessami and G.J. Holzmann. Optimizing b¨ uchi automata. In C. Palamidessi, editor, Concurrency Theory, 11th International Conference, CONCUR 2000, volume 1877 of LNCS, pages 153–168. Springer Verlag, 2000.

BIBLIOGRAPHY

157

[EHRS00] J. Esparza, D. Hansel, P. Rossmanith, and S. Schwoon. Efficient algorithms for model checking pushdown systems. In E.A. Emerson and A.P. Sistla, editors, Computer Aided Verification, 12th International Conference, CAV 2000, volume 1855 of LNCS, pages 232–247. Springer Verlag, 2000. [EKS01] J. Esparza, A. Kuˇcera, and S. Schwoon. Model-checking LTL with regular valuations for pushdown systems. In N. Kobayashi and B.C. Pierce, editors, Proceedings of the 4th International Symposium on Theoretical Aspects of Computer Science, TACS’01, Sendai, Japan, volume 2215 of LNCS, pages 316–339. Springer-Verlag, 2001. [EN98]

E. A. Emerson and K.S. Namjoshi. On model checking for nondeterministic infinite-state systems. In Proceedings of the 13th Annual IEEE Symposium on Logic in Computer Science, LICS, pages 70–80. IEEE Computer Society, 1998.

[Eng92] J. Engelfriet. An elementary proof of double Greibach normal form. Information Processing Letters, 44(6):291–293, 1992. [ENS07] J. Esparza, D. Nowotka, and J. Srba. Deterministic context-free model checking. Technical report, Universit¨at Stuttgart, Fakult¨at Informatik, Elektrotechnik und Informationstechnik, 2007. [Esp97] J. Esparza. Decidability of model checking for infinite-state concurrent systems. Acta Informatica, 34(2):85–107, 1997. [FL79]

M.J. Fischer and R.E. Ladner. Propositional dynamic logic of regular programs. Journal of Computer and System Sciences, 18(2):194–211, 1979.

[FP01]

D. Fisman and A. Pnueli. Beyond regular model checking. In R. Hariharan and M. Mukund, editors, Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer Science, FST TCS 2001, volume 2245 of LNCS. Springer Verlag, 2001.

[GHJX08] A. Groce, G. Holzmann, R. Joshi, and R-G. Xu. Putting flight software through the paces with testing, model checking, and constraint-solving. In Workshop on Constraints in Formal Verification, LNCS, pages 1–15, 2008. To appear.

158

BIBLIOGRAPHY

[GPSS80] D. Gabbay, A. Pnueli, S. Shelah, and J. Stavi. On the temporal analysis of fairness. In Annual Symposium on Principles of Programming Languages Archive Proceedings of the 7th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 163–173. ACM Press, 1980. [HKP82] D. Harel, D. Kozen, and R. Parikh. Process logic : Expressiveness, decidability, completeness. Journal of Computer and System Sciences, 25(2):144–170, 1982. [HMM83a] J. Halpern, Z. Manna, and B.C. Moszkowski. A hardware semantics based on temporal intervals. Technical Report STAN-CS-83-963, Stanford University, March 1983. [HMM83b] J.Y. Halpern, Z. Manna, and B.C. Moszkowski. A hardware semantics based on temporal intervals. In J. D´ıaz, editor, Automata, Languages and Programming, 10th Colloquium, Barcelona, Spain, volume 154 of LNCS, pages 278–291. Springer-Verlag, 1983. [HMU01] J.E. Hopcroft, R. Motwani, and J.D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2001. Second Edition. [HPS81] D. Harel, A. Pnueli, and J. Stavi. Propositional dynamic logic of context-free programs. Technical Report CS81-17, Weizmann Institute of Science, 1981. [ISD+ 00] O. Ibarra, J. Su, Z. Dang, T. Bultan, and R. Kemmerer. Counter machines: Decidable properties and applications to verification problems. In M. Nielsen and B. Rovan, editors, Mathematical Foundations of Computer Science 2000: 25th International Symposium, MFCS 2000, volume 1893 of LNCS, pages 426–435. Springer Verlag, 2000. [Jor02]

P.C. Jorgensen. Software Testing : A Craftman’s Approach. CRC Press, second edition, 2002.

[Koz83] D. Kozen. Results on the propositional µ-calculus. Theoretical Computer Science, 27:333–354, 1983.

BIBLIOGRAPHY

159

[KPV02] O. Kupferman, N. Piterman, and M.Y. Vardi. Pushdown specifications. In M. Baaz and A. Voronkov, editors, Logic for Programming, Artificial Intelligence, and Reasoning: 9th International Conference, LPAR 2002, volume 2514 of LNCS, pages 262–277. Springer-Verlag GmbH, 2002. [KV01]

O. Kupferman and M.Y. Vardi. Model checking of safety properties. Formal Methods in System Design, 19(3):291–314, 2001.

[Lam77] L. Lamport. Proving the correctness of multiprocess programs. IEEE Transactions on Software Engineering, 3(2):125–143, 1977. [Lam79] L. Lamport. A new approach to proving the correctness of multiprocess programs. ACM Transactions on Programming Languages and Systems, 1(1):84–97, 1979. [Lan02] M. Lange. Alternating context-free languages and linear time µ-calculus with sequential composition. Electronic Notes in Theoretical Computer Science, 68(2):70–86, 2002. [Lan04] M. Lange. Symbolic model checking of non-regular properties. In R. Alur and D. Peled, editors, Proceedings of the 16th Conference on Computer Aided Verification, CAV’04, volume 3114 of LNCS, pages 83–95. Springer-Verlag, 2004. [LMS04] C. L¨oding, P. Madhusudan, and O. Serre. Visibly pushdown games. In K. Lodaya and M. Mahajan, editors, FSTTCS 2004: Foundations of Software Technology and Theoretical Computer Science: 24th International Conference, Chennai, India, volume 3328, pages 408–420. Springer-Verlag, 2004. [LS02]

M. Lange and C. Stirling. Model checking fixed point logic with chop. In M. Nielsen and U. H. Engberg, editors, Proceedings of the 5th Conference on Foundations of Software Science and Computation Structures, FOSSACS’02, volume 2303 of LNCS, pages 250–263. Springer-Verlag, 2002.

[LST95] C. Lautemann, T. Schwentick, and D. Th´erien. Logics for contextfree languages. In L. Pacholski and J. Tiuryn, editors, Selected Papers from the 8th International Workshop on Computer Science Logic, 1994, volume 933 of LNCS, pages 205–216. Springer–Verlag, 1995.

160

BIBLIOGRAPHY

[LT93]

N.G. Leveson and C.S. Turner. An investigation of the Therac-25 accidents. IEEE Computer, 26(7):18–41, 1993.

[Lu89]

Z. Lu. Mathematical Logic for Computer Science. World Scientific, 1989.

[Mar92] E. Marshall. Fatal error: how Patriot overlooked a Scud. Science, 255(5050):1347, 1992. [McN66] R. McNaughton. Testing and generating infinite sequences by a finite automaton. Information and Control, 9(5):521–530, 1966. [MHHO05] E. Moriya, D. Hofbauer, M. Huber, and F. Otto. On state-alternating context-free grammars. Theoretical Computer Science, 337(1–3):183– 216, 2005. [Min61] M.L. Minsky. Recursive unsolvability of Post’s problem of “tag” and other topics in theory of Turing machines. The Annals of Mathematics, 74(3):437–455, 1961. [MO99] M. M¨ uller-Olm. A modal fixpoint logic with chop. In C. Meinel and S. Tison, editors, STACS 99: 16th Annual Symposium on Theoretical Aspects of Computer Science, volume 1563 of LNCS, pages 510–520. Springer-Verlag, 1999. [Mos97] B.C. Moszkowski. Compositional reasoning using interval temporal logic and Tempura. In W.-P. de Roever, H. Langmaack, and A. Pnueli, editors, Compositionality: The Significant Difference: International Symposium, COMPOS’97, volume 1536 of LNCS, pages 439–464. SpringerVerlag, 1997. [Mos00] B.C. Moszkowski. A complete axiomatization of interval temporal logic with infinite time. In 15th Annual IEEE Symposium on Logic in Computer Science, pages 241–252, 2000. [Mos05] B.C. Moszkowski. A hierarchical analysis of propositional temporal logic based on intervals. In S.N. Art¨emov, H. Barringer, A.S. d’Avila Garcez, L.C. Lamb, and J. Woods, editors, We Will Show Them!, volume 2, pages 371–440. College Publications, 2005.

BIBLIOGRAPHY

161

[Mul63] D.E. Muller. Infinite sequences and finite machines. In Proceedings of the 4th Annual Symposium on Switching Circuit Theory and Logical Design, pages 3–16. IEEE Computer Society, 1963. [Obd02] J. Obdrˇza´lek. Model checking Java using pushdown systems. In Proceedings of the Workshop on Formal Techniques for Java Programs, FTfJP’02, M´alaga, Spain, 2002. Appeared as technical report, number NIII-R0204, at the Computing Science Department, University of Nijmegen. [Okh01] A. Okhotin. Conjunctive grammars. Journal of Automata, Languages and Combinatorics, 6(4):519–535, 2001. [Okh04] A. Okhotin. Boolean grammars. 194(1):19–48, 2004. [OL82]

Information and Computation,

S. Owicki and L. Lamport. Proving liveness properties of concurrent programs. ACM Transactions on Programming Languages and Systems, 4(3):455–495, 1982.

[Oui08] M. Ouimet. Formal software verification: Model checking and theorem proving. Technical report, Massachusetts Institute of Technology, 2008. [Pnu77] A. Pnueli. The temporal logic of programs. In Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, pages 46–57, 1977. [Pra76]

V.R. Pratt. Semantical considerations on Floyd-Hoare logic. Technical Report MIT-LCS-TR-168, Massachusetts Institute of Technology, 1976.

[Pra79]

V.R. Pratt. Process logic: preliminary report. In POPL ’79: Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pages 93–100. ACM Press, 1979.

[RN05]

G. Reeves and T. Neilson. The mars rover Spirit FLASH anomaly. IEEE Aerospace Conference, pages 4186–4199, 2005.

[Saf88]

Shmuel Safra. On the complexity of ω-automata. In Proceedings of the 29th Annual Symposium on Foundations of Computer Science, FOCS’88, pages 319–327. IEEE Computer Society Press, 1988.

162 [Sis94]

BIBLIOGRAPHY A.P. Sistla. Safety, liveness and fairness in temporal logic. Formal Aspects of Computing, 6(5):495–511, 1994.

[Sta97a] L. Staiger. Handbook of Formal Languages, volume 3, chapter 6. Springer Verlag, 1997. [Sta97b] L. Staiger. On ω-power languages. In G. Paun and A. Salomaa, editors, New Trends in Formal Languages - Control, Cooperation, and Combinatorics (to J¨ urgen Dassow on the Occasion of his 50th Birthday), volume 1218 of LNCS, pages 377–394. Springer-Verlag, 1997. [Tho94] W. Thomas. Handbook of Theoretical Computer Science, volume B, chapter 4. MIT Press, 1994. [Tho97] W. Thomas. Handbook of Formal Languages, volume 3, chapter 7. Springer Verlag, 1997. [Wir77] N. Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions? Communications of the ACM, 20(11):822–828, 1977. [Wol83] P. Wolper. Temporal logic can be more expressive. Information and Control, 56(1–2):72–99, 1983. [YD02]

W. Yurcik and D. Doss. Software technology issues for a US national missile defense system. IEEE Technology and Society Magazine, 21(2):36–46, 2002.

handling concatenation in trace- and model-checking

For model-checking, i.e. the verification of a system's model against a specifica- tion, we examine behavioural .... the way the file system was implemented, the directory structure could only grow larger, because the ..... is different from call-stack behaviour in that the observed data values at the ori- gin and the end of an arc ...

2MB Sizes 1 Downloads 229 Views

Recommend Documents

Handling Exceptions in Haskell
Jan 19, 1999 - ... less idealistic programmers can write in C, Java, Ada and other useful .... since the Prelude is not just ordinary Haskell code, requires a lot of ...

Handling Exceptions in Haskell
Jan 19, 1999 - Handling Exceptions in Haskell. Alastair Reid. Yale University. Department of Computer Science. New Haven, CT 06520 [email protected].

TRACE FOSSILS
Smooth the surface of the sediment using a flat piece of plastic or wood; a cheap ruler should work well. 4. Allow the animal to run, walk, or crawl across the ...

String Constraints with Concatenation and Transducers Solved ...
path-queries for graph databases [Barceló et al. 2013; Barceló et al. 2012], which has ...... important for our purpose. However, the crucial point is that all queries that a DPLL(T) solver asks ...... In USENIX Security Symposium. http://static.us

Medical Device Track and Trace System in ...
Whoops! There was a problem loading more pages. Retrying... Medical Device Track and Trace System in Turkey_OmarOzkan_Beijing2016.pdf. Medical Device ...

Infant access and handling in sooty mangabeys and ...
Available online 27 October 2010 .... annual temperature of 24 C (Taп Monkey Project data, 1991e1999) ... We focused the data collection on adult females. We used ...... determines the market value of food providers in wild vervet monkeys.

Grooming and Infant Handling Interchange in Macaca ...
Division of Psychology, School of Humanities and Social Sciences,. Nanyang ...... Biological markets: Partner choice as the driving force behind the evolution of.

Air-Cargo Handling and Management Information Systems in Air ...
Page 1 of 2. No. of Printed Pages : 2 MAV-038. O. O. ADVANCED DIPLOMA IN AIR CARGO. MANAGEMENT (ADACM). Term-End Examination. June, 2012.

pdf-15207\chemometrics-a-textbook-data-handling-in-science-and ...
of presentation. Like its predecessor, this book will be the standard text on the subject for some. time. Journal of Chemometrics. The authors are to congratulated ...

Standard operating procedure for tracking and handling in SIAMED
Apr 10, 2017 - supplementary information is submitted by relevant Marketing Authorisation ... Email template - PAM Submission – Timetable (TT) of signal ...

Training workshop in Handling School Discipline Cases.pdf ...
Department of Education. National Capital Region. SCHOOLS DIVISION OFFICE. Nueva Ecija St., Bago Bantay, Quezon City. LIST OF NEWLY APPOINTED SENIOR HIGH SCHOOL TEACHERS. 'ffi,. Name of Teacher School Assigned. SAN FRANCISCO HS. 2. CABIAO, JOYCE PUGA

Event handling in Java - NCSU COE People
that paint/update method calls are serialized properly with other events. Now, let's list ... by calling processEvent on the button. • the button's .... quits the app. */.

Data File Handling In C.pdf
Retrying... Data File Handling In C.pdf. Data File Handling In C.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Data File Handling In C.pdf.

Why to use trace paper in printing.pdf
Take your time and choose the best one. Page 1 of 1. Why to use trace paper in printing.pdf. Why to use trace paper in printing.pdf. Open. Extract. Open with.

trace-modifications-in-support-of-s-co2-transient-modeling.pdf ...
trace-modifications-in-support-of-s-co2-transient-modeling.pdf. trace-modifications-in-support-of-s-co2-transient-modeling.pdf. Open. Extract. Open with. Sign In.

Trace-metal pollution of soils in northern England
(the bedrock geology or Quaternary deposit from which it formed). ... Published online: 6 April 2002 ...... pollution. However, they cannot account for more diffuse.

Geographic Variation in Trace Element Composition of Juvenile ...
in the multivariate elemental signatures was found among sites within an ... Interannual variability in the trace element signatures meant that fish could not be ac-.