Component Reconn-exion by Andrew Le Gear
Submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science at the University of Limerick, Ireland
Supervised by Jim Buckley Examined by Tony Cahill and Gail Murphy 9th November 2006
Declaration The work described in this thesis is, except where otherwise stated, entirely that of the author and has not been submitted as an exercise for a degree at this or any other university. Signed: AndrewLeGear
In my opinion, the work described in this thesis is, except where otherwise stated, entirely that of the author. Signed: JimBuckley
ii “There is no such thing as a long piece of work, except one that you dare not start.” -Charles Baudelaire.
Abstract For over thirty years, increased software reuse and replaceability have been touted as a means of easier software development. Unfortunately this is a non-trivial task. Component-based development attempts to ease the creation of replaceable and reusable. However, the majority of legacy systems are not implemented using the componentbased development paradigm. To enable the reuse of portions of legacy software as part of a component-based development process, a component recovery technique must first be employed. The two phases to component recovery are (1) encapsulation of a candidate component, followed by (2) the application of a component wrapper to allow the component to be used with component-based technologies. This thesis focuses on the first phase, proposing and evaluating a human-driven process for targeted component encapsulation using a combination of two existing techniques: A dynamic software understanding technique called Software Reconnaissance and a design recovery technique called Reflexion Modelling. Specifically, reuse of core assets in a system is identified using a variation of the Software Reconnaissance technique called the reuse perspective. The set of reuse elements in the reuse perspective, is subsequently investigated and partitioned into cohesive units of functionality using an adapted version of the Reflexion Modelling technique. The potential usefulness of this process is demonstrated and evaluated using two large scale, industrial case studies. The results of the studies, for the most part, would seem to indicate that the process is worthwhile and affords significant time saving opportunities for software engineers.
Dedication Dedicated to a little multi-purpose air conditioner.
Acknowledgements First and foremost I’d like to give a big thanks to my supervisor, Jim Buckley. A man of eternal patience, with me, who always had time for a question, and who educated so well from my beginning as a “research illiterate” in September 2003. None of this would have been possible without him. To my parents, Betty and Joseph Le Gear, who have unfailingly supported me through the years leading to my PhD. I’ll never forget your unquestioning support as I announced it would be a good idea to spend another three years without a job, after seventeen years of education. To Nicola Quinn, for always seeming so interested as I’d tell about what a “great day I had at the lab,” and never once said “I told you so” when I didn’t. To everyone else, past and present, from the SAE group and CS1-045; J.J. Collins, Brendan Cleary, Damien Conroy, Thomas Collins, Deirdre Carew, Andrea Suchankova, Seamus Galvin, Finbar McGurren, Darwin Slattery and Chris Exton. Thank you all for discussion, debate, inspiration, help and friendship and in darker times the “post graduate support group” for those “thesis blues.” A thank you to those at QAD, our research and funding partners, and to IBM as research partners. Thanks to Enterprise Ireland, Lero and SFI for their generous funding of the research project. Thank you those at the CSIS department in UL in their support of this research. Finally, thanks to my examiners Prof. Tony Cahill and Gail Murphy.
Contents I
Literature Review
2
1
Introduction
3
1.1
Software Maintenance and Development . . . . . . . . . . . . . . . . .
4
1.2
Software Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
1.3
Reusable and Replaceable Software Through Component-based Devel-
2
opment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4
Problem Statement: The Legacy Dilemma . . . . . . . . . . . . . . . .
6
1.5
Reengineering Towards Components . . . . . . . . . . . . . . . . . . .
7
1.6
Objectives and Contributions . . . . . . . . . . . . . . . . . . . . . . .
8
1.7
Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Software Components
11
2.1
A Component Definition . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2
Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3
Introducing Component-Based Development . . . . . . . . . . . . . . 17 2.3.1
A Component-Based Development Process . . . . . . . . . . . 20
2.4
The Legacy Dilemma Revisited . . . . . . . . . . . . . . . . . . . . . . 23
2.5
Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.1
A Brief History of Encapsulation in Software Development . . 25 2.5.1.1
Monitors . . . . . . . . . . . . . . . . . . . . . . . . 25
CONTENTS
3
iv
2.5.1.2
Information Hiding . . . . . . . . . . . . . . . . . . 26
2.5.1.3
Object Oriented Programming . . . . . . . . . . . . 26
2.5.2
Coupling and Cohesion
. . . . . . . . . . . . . . . . . . . . . 33
2.5.3
Encapsulation Features of Component-Based Development
. . 37
Software Reengineering
42
3.1
Agendas for Reengineering Software Systems . . . . . . . . . . . . . . 43
3.2
Dynamic versus Static Analysis . . . . . . . . . . . . . . . . . . . . . 45
3.3
Reuse Identification Techniques . . . . . . . . . . . . . . . . . . . . . 46 3.3.1
Clone Detection . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2
Fan-in Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.3
Frequency Spectrum Analysis . . . . . . . . . . . . . . . . . . 48
3.4
Dependency Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5
Reengineering Towards Components . . . . . . . . . . . . . . . . . . . 53 3.5.1
Design Recovery . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2
Clustering for Architectural Recovery and Component Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4
3.5.2.1
Dataflow-based Approaches . . . . . . . . . . . . . . 57
3.5.2.2
Structure-Based Approaches . . . . . . . . . . . . . 59
3.5.2.3
Domain-Model Based Approaches . . . . . . . . . . 60
3.5.3
Aggregated Recovery Approaches . . . . . . . . . . . . . . . . 61
3.5.4
Componentisation Processes . . . . . . . . . . . . . . . . . . . 62
3.5.5
Component Wrappers . . . . . . . . . . . . . . . . . . . . . . 65
Software Reconnaissance 4.1
67
A Functionality View of Software . . . . . . . . . . . . . . . . . . . . 69 4.1.1
Common Software Elements . . . . . . . . . . . . . . . . . . . 70
4.1.2
Potentially Involved Software Elements . . . . . . . . . . . . . 70
CONTENTS
v
4.1.3
Indispensably Involved Software Elements . . . . . . . . . . . 72
4.1.4
Uniquely Involved Software Elements . . . . . . . . . . . . . . 75
4.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.2.1
Software Instrumentation Enabling Software Reconnaissance . 75
4.2.2
Best Practices When Applying Software Reconnaissance . . . . 77
4.2.3
Previous Work Using Software Reconnaissance . . . . . . . . . 78
5 Software Reflexion Modelling
82
5.1
The Reflexion Modelling Process . . . . . . . . . . . . . . . . . . . . . 85
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.1
Early Experiences with Reflexion Modelling . . . . . . . . . . 88
5.2.2
Extensions and Further Uses of Reflexion Modelling . . . . . . 88
5.2.3
A Cognitive Basis for Reflexion Modelling . . . . . . . . . . . 92 5.2.3.1
Encoding
5.2.3.2
Retrieval . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2.3.3
Human Learning
5.2.3.4
Learning Preferences . . . . . . . . . . . . . . . . . 98
6 Research Methodology
. . . . . . . . . . . . . . . . . . . . . . . 93
. . . . . . . . . . . . . . . . . . . 96
100
6.1
Scientific Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.2
Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.3
Quantitative and Qualitative Research Methods . . . . . . . . . . . . . 103
6.4
The Culture of Research Evaluation in Computer Science . . . . . . . . 106 6.4.1
6.5
Arguing for Hybrid Approaches to Research in Computer Science106
A Research Model for This Thesis . . . . . . . . . . . . . . . . . . . . 109 6.5.1
Empirical Techniques Employed . . . . . . . . . . . . . . . . . 110
CONTENTS
II
vi
“Component Reconn-exion”: Reengineering Towards Com-
ponents Using Variations on Reconnaissance and Reflexion
113
7 Reconn-exion
114
7.1
A Conjecture for Prompting Component Abstractions 7.1.1
7.2
. . . . . . . . . 115
A New Reuse Perspective Derived from Software Reconnaissance115
A Hypothesis for Encapsulating Components Using Reflexion Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.3
Hypothesising a Process For Component Encapsulation
. . . . . . . . 121
7.4
A Small Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.4.1
The House Application . . . . . . . . . . . . . . . . . . . . . . 124
7.4.2
Part 1: A Reuse Perspective . . . . . . . . . . . . . . . . . . . 124
7.4.3
Part 2: Encapsulating with Reflexion . . . . . . . . . . . . . . 128
8 Evaluating the Basis Techniques of Reconn-exion 8.1
8.2
134
Validating the Reuse Perspective: The JIT/S Shipping Case Study . . . 135 8.1.1
Purpose and Research Questions
. . . . . . . . . . . . . . . . 135
8.1.2
The Subject System: JIT/S . . . . . . . . . . . . . . . . . . . . 135
8.1.3
The Participants . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.1.4
Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.1.5
The Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . 137
8.1.6
Case Study Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.1.7
Case Study Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.1.8
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Validating Reflexion-Based Component Encapsulation: The Workspace Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.2.1
Purpose and Research Questions
. . . . . . . . . . . . . . . . 145
8.2.2
The Subject System: The Learning Management System . . . . 145
CONTENTS
vii
8.2.3
The Participants . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.2.4
Participant Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2.5
Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2.6
The Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . 148
8.2.7
Enacting the Process . . . . . . . . . . . . . . . . . . . . . . . 150
8.2.8
8.2.9 9
8.2.7.1
Participant 1 . . . . . . . . . . . . . . . . . . . . . . 150
8.2.7.2
Participant 2 . . . . . . . . . . . . . . . . . . . . . . 152
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.2.8.1
Questions 1 and 2 . . . . . . . . . . . . . . . . . . . 157
8.2.8.2
Question 3 . . . . . . . . . . . . . . . . . . . . . . . 161
8.2.8.3
Question 3 . . . . . . . . . . . . . . . . . . . . . . . 163
8.2.8.4
Question 4 . . . . . . . . . . . . . . . . . . . . . . . 166
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Evaluating Reconn-exion: The AIM Case Study
171
9.1
Purpose and Research Questions . . . . . . . . . . . . . . . . . . . . . 172
9.2
The Subject System: The Advanced Inventory Management Application 172
9.3
The Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.4
Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.5
Study Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.6
Recounting the Process . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.6.1
Participant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.6.1.1
Creating the reuse perspective . . . . . . . . . . . . . 176
9.6.1.2
Using the reuse perspective to prompt component abstractions . . . . . . . . . . . . . . . . . . . . . . . . 177
9.6.2
9.6.1.3
Encapsulation with Reflexion . . . . . . . . . . . . . 177
9.6.1.4
Identifying Multiple Interfaces . . . . . . . . . . . . 177
Participant 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
CONTENTS
viii
9.6.2.1
Creating the reuse perspective . . . . . . . . . . . . . 182
9.6.2.2
Using the reuse perspective to prompt component abstractions . . . . . . . . . . . . . . . . . . . . . . . . 182
9.7
9.6.2.3
Encapsulation with Reflexion . . . . . . . . . . . . . 185
9.6.2.4
Identifying Multiple Interfaces . . . . . . . . . . . . 185
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.7.1
Process: The Effectiveness of Reuse Perspective as a Prompt . . 188
9.7.2
Process: Reconn-exion for Component Encapsulation . . . . . . 190
9.7.3
Product: An Architect’s Assessment of Encapsulated Components192
9.7.4
Product: Metrics of Coupling and Cohesion on Encapsulated Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.7.5 9.8
Contextual Knowledge . . . . . . . . . . . . . . . . . . . . . . 195
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10 Scoping Reconn-exion
198
10.1 Validity of Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 10.2 Theoretical Limitations of Reconn-exion . . . . . . . . . . . . . . . . . 201 11 Future Work
205
11.1 Catalogue of Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 206 11.2 Exploring Database Accesses . . . . . . . . . . . . . . . . . . . . . . . 206 11.3 Component Wrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.4 Combining with Automated Techniques . . . . . . . . . . . . . . . . . 207 11.5 Feature-based Decomposition of Software . . . . . . . . . . . . . . . . 208 11.6 Software Product Line Recovery . . . . . . . . . . . . . . . . . . . . . 211 11.7 Aspect Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 11.8 Inverse Structural Summarization . . . . . . . . . . . . . . . . . . . . . 213 11.9 Collaborative Design Recovery . . . . . . . . . . . . . . . . . . . . . . 214
CONTENTS
ix
11.10Temporal Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 215 11.11Alternate Source Models . . . . . . . . . . . . . . . . . . . . . . . . . 216 11.12Metrics and Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . 217 12 Conclusion
III
Bibliography
Bibliography
IV
Appendices
A Reuse Perspectives
218
223 224
250 251
A.1 Scrabble Emulator Reuse Perspective . . . . . . . . . . . . . . . . . . 251 B Pilot Models and Maps
254
C Workplace - Participant 1
259
D Workplace - Participant 2
268
E AIM - Participant 1
280
F AIM - Participant 2
291
G Interfaces on the House Application
299
H Peer Reviewed Publications
301
List of Figures 2.1
Levels of interface specification adapted from Beugnard et. al. (Beugnard et al., 1999) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2
A Generic Component Architecture adapted from Bachmann et al. (Bachmann et al., 2000). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3
The component based development process adapted from (Cheesman and Daniels, 2001). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4
A detailed description of the provisioning workflow.
. . . . . . . . . . 24
2.5
A code fragment from a C program. . . . . . . . . . . . . . . . . . . . 28
2.6
A visualisation of the encapsulation exercise shown in figure 2.7.
2.7
A revised version of the code fragment in figure 2.5.
2.8
An inheritance hierarchy code sample.
2.9
A visualisation of the inheritance hierarchy in 2.8.
. . . 30
. . . . . . . . . . 31
. . . . . . . . . . . . . . . . . 32 . . . . . . . . . . . 32
2.10 Using polymorphism for encapsulation code example.
. . . . . . . . . 34
2.11 An example of a loosely coupled and highly cohesive component. . . . 35 2.12 Many classes without and encapsulation policy. . . . . . . . . . . . . . 38 2.13 Many classes from figure 2.12 encapsulated by a component.
. . . . . 39
2.14 Event handling code sample. . . . . . . . . . . . . . . . . . . . . . . . 39 2.15 A deployment diagram of two distributed components. . . . . . . . . . 40 2.16 A call between the distributed components shown in 2.15.
. . . . . . . 41
LIST OF FIGURES
xi
3.1
Code example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2
A possible graph representation of code example 1 in figure 3.1. . . . . 49
3.3
Code example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4
A possible graph representation of code example 2 in figure 3.3. . . . . 50
3.5
Code example 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6
A possible graph representation of code example 3 in figure 3.5. . . . . 51
3.7
Code example 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.8
A possible graph representation of code example 4 in figure 3.7. . . . . 53
4.1
Identifying Features from Running Systems.
4.2
Common Software Elements. . . . . . . . . . . . . . . . . . . . . . . . 71
4.3
Potentially Involved Software Elements, shaded in black. . . . . . . . . 73
4.4
Indispensably Involved Software Elements, are shaded in black. . . . . 74
4.5
Uniquely involved software elements, are shaded in black. . . . . . . . 76
5.1
Collapsing strategy in operation. . . . . . . . . . . . . . . . . . . . . . 84
5.2
The Software Reflexion Modelling Process . . . . . . . . . . . . . . . 87
5.3
Adapted from (Brace and Roth, 2005).
7.1
The three sets used to form the shared set. . . . . . . . . . . . . . . . . 117
7.2
Encapsulating a component and making its interface explicit.
7.3
Identifying multiple interfaces on a component using Reflexion Mod-
. . . . . . . . . . . . . . 68
. . . . . . . . . . . . . . . . . 93
. . . . . 120
elling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.4
The Component Reconn-exion process. . . . . . . . . . . . . . . . . . 123
7.5
A screenshot of the house application. . . . . . . . . . . . . . . . . . . 125
7.6
First house application Reflexion model. . . . . . . . . . . . . . . . . . 129
7.7
First house application Reflexion model map. . . . . . . . . . . . . . . 129
7.8
Second house application Reflexion model. . . . . . . . . . . . . . . . 130
7.9
Second house application reflexion model map. . . . . . . . . . . . . . 131
LIST OF FIGURES
xii
7.10 Third house application reflexion model. . . . . . . . . . . . . . . . . . 132 7.11 Third house application reflexion model map. . . . . . . . . . . . . . . 133 8.1
A screenshot of the jRMTool eclipse plug-in. . . . . . . . . . . . . . . 148
8.2
First iteration by the first participant.
8.3
Last iteration by the first participant. . . . . . . . . . . . . . . . . . . . 153
8.4
Web administration user interface component. (Note the bolded black
. . . . . . . . . . . . . . . . . . 152
box is not part of the tool’s visual output.) . . . . . . . . . . . . . . . . 155 8.5
A design pattern in the LMS. . . . . . . . . . . . . . . . . . . . . . . . 163
9.1
Procedure clusters prompted by the reuse perspective (participant 1). File names are blurred for copyright reasons. . . . . . . . . . . . . . . 178
9.2
Participant one’s first Reflexion model and map.
. . . . . . . . . . . . 179
9.3
Participant one’s final Reflexion model. . . . . . . . . . . . . . . . . . 181
9.4
Procedure clusters prompted by the reuse perspective (participant 2). Note that the file names are blurred for copyright reasons.
. . . . . . . 184
9.5
Participant two’s third Reflexion model. . . . . . . . . . . . . . . . . . 186
9.6
Participant two’s final Reflexion model. . . . . . . . . . . . . . . . . . 187
11.1 Decomposing a system in terms of its SHARED sets. . . . . . . . . . . 209 11.2 Including the common software elements in the model of the system.
. 210
11.3 A feature based decomposition of a software systems that shows shared, unique and common software elements of a system. 11.4 A simple temporal source model.
. . . . . . . . . . 211
. . . . . . . . . . . . . . . . . . . . 215
11.5 An example temporal summarization. . . . . . . . . . . . . . . . . . . 215 12.1 The Component Reconn-exion process. . . . . . . . . . . . . . . . . . 220 B.1 Pilot Study - Scrabble Emulator - Reflexion model and map 1. . . . . . 255 B.2 Pilot Study - Scrabble Emulator - Reflexion model and map 2. . . . . . 256
LIST OF FIGURES
xiii
B.3 Pilot Study - Scrabble Emulator - Reflexion model and map 3. . . . . . 257 B.4 Pilot Study - Scrabble Emulator - Reflexion model and map 4. . . . . . 258 C.1 Case Study - Workplace - Participant 1 - Reflexion Model and map 1 . . 260 C.2 Case Study - Workplace - Participant 1 - Reflexion model and map 2 . . 261 C.3 Case Study - Workplace - Participant 1 - Reflexion model and map 3 . . 262 C.4 Case Study - Workplace - Participant 1 - Reflexion model and map 4 . . 263 C.5 Case Study - Workplace - Participant 1 - Reflexion model and map 5 . . 264 C.6 Case Study - Workplace - Participant 1 - Reflexion model 6 . . . . . . . 265 C.7 Case Study - Workplace - Participant 1 - Reflexion model and map 7 . . 266 C.8 Case Study - Workplace - Participant 1 - Reflexion model and map 8 . . 267 D.1 Case Study - Workplace - Participant 2 - Reflexion model and map 1 . . 269 D.2 Case Study - Workplace - Participant 2 - Reflexion model and map 2 . . 270 D.3 Case Study - Workplace - Participant 2 - Reflexion model and map 3 . . 271 D.4 Case Study - Workplace - Participant 2 - Reflexion model and map 4 . . 272 D.5 Case Study - Workplace - Participant 2 - Reflexion model and map 5 . . 273 D.6 Case Study - Workplace - Participant 2 - Reflexion model and map 6 . . 274 D.7 Case Study - Workplace - Participant 2 - Reflexion model and map 7 . . 275 D.8 Case Study - Workplace - Participant 2 - Reflexion model and map 8 . . 276 D.9 Case Study - Workplace - Participant 2 - Reflexion model and map 9 . . 277 D.10 Case Study - Workplace - Participant 2 - Reflexion model 10 . . . . . . 278 D.11 Case Study - Workplace - Participant 2 - Reflexion model and map 11 . 279 E.1 Case Study - AIM - Participant 1 - Reflexion Model and map 1 . . . . . 281 E.2 Case Study - AIM - Participant 1 - Reflexion Model and map 2 . . . . . 282 E.3 Case Study - AIM - Participant 1 - Reflexion Model and map 3 . . . . . 283 E.4 Case Study - AIM - Participant 1 - Reflexion Model and map 4 . . . . . 284 E.5 Case Study - AIM - Participant 1 - Reflexion Model and map 5 . . . . . 285
LIST OF FIGURES
xiv
E.6 Case Study - AIM - Participant 1 - Reflexion Model and map 6 . . . . . 286 E.7 Case Study - AIM - Participant 1 - Reflexion Model and map 7 . . . . . 287 E.8 Case Study - AIM - Participant 1 - Reflexion Model and map 8 . . . . . 288 E.9 Case Study - AIM - Participant 1 - Reflexion Model and map 9 . . . . . 289 E.10 Case Study - AIM - Participant 1 - Reflexion Model and map 10 . . . . 290 F.1
Case Study - AIM - Participant 2 - Reflexion Model 1 . . . . . . . . . . 292
F.2
Case Study - AIM - Participant 2 - Reflexion Model 2 . . . . . . . . . . 293
F.3
Case Study - AIM - Participant 2 - Reflexion Model 3 . . . . . . . . . . 294
F.4
Case Study - AIM - Participant 2 - Reflexion Model 4 . . . . . . . . . . 295
F.5
Case Study - AIM - Participant 2 - Reflexion Model 5 . . . . . . . . . . 296
F.6
Case Study - AIM - Participant 2 - Reflexion Model 8 . . . . . . . . . . 297
F.7
Case Study - AIM - Participant 2 - Reflexion Model 9 . . . . . . . . . . 298
G.1 “Transforms” component’s interface on “Main.” Provides interface. . . 299 G.2 “Transforms” component’s interface on “GUI.” Provides interface. . . . 300
List of Tables 2.1
Motivations for software components described by four tiers. . . . . . . 13
2.2
Levels of maturity among CBD Technologies. . . . . . . . . . . . . . . 14
7.1
Features identified for the house application.
8.1
JIT/S Summary.
8.2
Particulars of the participants of the JIT/S study.
8.3
Features identified in JIT/S. . . . . . . . . . . . . . . . . . . . . . . . . 139
8.4
Summary of reuse perspective. . . . . . . . . . . . . . . . . . . . . . . 142
8.5
Workplace case study participant details.
9.1
Participant details.
9.2
Feature set examined by participant 1 during case study.
9.3
The set of features chosen by the second participant. . . . . . . . . . . 183
. . . . . . . . . . . . . . 124
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 . . . . . . . . . . . . 137
. . . . . . . . . . . . . . . . 147
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 . . . . . . . . 176
Part I Literature Review
Chapter 1 Introduction “Every new beginning comes from some other beginning’s end.” - Dan Wilson, Closing Time
1.1 Software Maintenance and Development
1.1
4
Software Maintenance and Development
Despite advances in software development techniques, such as object-oriented programming (OOPSLA, 2006), aspect-oriented programming (Kiczales et al., 1997) and component-based development (Cheesman and Daniels, 2001), existing implementation abstractions are not used on a large scale to create new systems, therefore the majority of software is still built from scratch (Greenfield et al., 2004). Software systems are projected to become ever more complex in the future (Corbi, 1989). This will require increased time and effort when developing new software systems (Voas, 1998; Meyer and Mingins, 1999; Zweben et al., 1995) making such development even less feasible in the future (Meyer and Mingins, 1999). Not only is it imperative that we begin developing software artifacts that are reusable to address this agenda (Voas, 1998; Meyer and Mingins, 1999; Washizaki et al., 2002), but it is also desireable with respect to software maintenance. Software artifacts that are reusable and replaceable (Cheesman and Daniels, 2001), will facilitate software evolution and help to curb the time spent on maintenance and evolution, an activity (Leintz and Swanson, 1980) that accounts for up to 80% of the entire development process (Bowen et al., 1993; Seaman, 2002)1 .
1.2
Software Reuse
Software reuse refers to the construction and evolution of software systems from existing software assets rather than creating a new system from scratch (Krueger, 1992). The potential gains of reuse were first voiced at a NATO software engineering conference 1
It should be noted that a comprehensive survey of software maintenance has not been carried out since that of Lientz and Swanson in 1978 (Leintz and Swanson, 1980; Lientz and Swanson, 1978). However, the increased complexity of software systems (Pressman, 2004) and observing the longevity of software systems such as Microsoft Word (Microsoft, 2006b), or Adobe Acrobat (Inc., 2006), suggests, if anything, that the proportion of effort consumed in manufacturing and evolving software systems has increased since their landmark study.
1.2 Software Reuse
5
in 1968 as a means of curbing the unsustainable trend of creating increasingly larger systems in this way (Nauer and Randell, 1968). Furthermore, software solutions for a specific domain contain many valuable business rules that an organisation will find difficult to reimplement. Over time, these rules become embedded in a system. Therefore they are no longer explicit to developers and maintainers, making it difficult to reuse these valuable assets in other systems within an organisation (Verhoef, 2000). Reports of pragmatic software reuse have been partially realised. During the mid1980’s the reuse of up to 40% of software within Japanese organisations has been reported (Mii and Takeshita, 1993). Software process models, once optimally applied within an organisation, have also prompted increased amounts of reuse during software development (Institute, 1997; Richardson, 1999). However, reuse in this way is entirely dependant upon the institutionalised cultural norms within the organisation’s development teams rather than a prescribed technological approach, such as object orientation or structured programming. However, such a high cultural standard is notoriously difficult to achieve (Richardson, 1999). Other attempts at enabling future reuse on a larger scale have included approaches to software development such as object-orientation, or the availability of reuse libraries (see section 3.3) for common tasks such as string manipulation or lists. Neither of these, however, provide the needed panacea for software development that is sought. Object orientation has proved to be too fine grained for large scale reuse, while library reuse is often too generic to have a major impact on domain specific software solutions produced within many organisations (Greenfield et al., 2004).
1.3 Reusable and Replaceable Software Through Component-based Development6
1.3
Reusable and Replaceable Software Through Componentbased Development
Component-based development refers to the assembly of software from pre-packaged components (Cheesman and Daniels, 2001), and builds upon earlier development approaches, such as object-oriented programming, procedural programming, fourth generation languages and aspect-oriented software development. Component-based development presents itself as a possible solution that can introduce reuse and replaceability to software development practice on a large scale. Components will often encapsulate much larger bodies of code than their object-oriented predecessors and lead to more easily understood software solutions (Zweben et al., 1995). This approach to software development promises many benefits. Most notably, with respect to this thesis, this development approach accommodates the reuse and replacibility of existing software during development and evolution at an encapsulated level of interaction (Parnas, 2002, 1971, 1972). For example, an organisation’s business rules, that may be realised within a component of one of their systems, may be reused by deploying it to augment the functionality of another system. Conversely, if that business rule changes, a new component implementing the new business rule could quickly replace the existing one.
1.4
Problem Statement: The Legacy Dilemma
The vision of large-scale reuse and replaceability using a modern development approach, such as component-based development, would seem to afford significant benefits in alleviating the problems of hiding complexity and facilitating software reuse. However, the vast majority of existing software was developed in accordance with older, different software development paradigms such as structured programming or object-orientation (Kontogiannis et al., 1996; Johnson, 2002). These systems lessen the
1.5 Reengineering Towards Components
7
potential for reuse. This thesis addresses the need to reuse portions of existing non-component-based software when implementing and evolving new or existing component-based systems within a domain vertical.
1.5
Reengineering Towards Components
Reengineering and maintenance research provides a plethora of techniques for analysing and altering existing systems. Attempts at recovering modules for reuse have existed since the early 1980’s (Belady and Evangelisti, 1982; Hutchens and Basili, 1985; Livadas and Johnson, 1994), with the most comprehensive review on the subject to date found in (Koschke, 2000a). Automatic approaches, in isolation, have shown poor rates of success when recovering components (Koschke, 2000a). Hence, Koschke (Koschke, 1999) states that the best approach when attempting component recovery should: Aggregate several techniques: Research has shown that any single automated component recovery technique has a success rate no greater than 40%. To increase accuracy to acceptable levels, two or more techniques should be used to increase accuracy to acceptable levels. Include the human in the recovery process: Human inference on data often provides insight that cannot be achieved algorithmically. Thus, semi-automating the recovery or encapsulation process can be greatly assisted using human insight. Include domain knowledge: Domain knowledge can attribute semantics to the syntax of the program under analysis. Without meaningful semantics attached to the code under analysis, the identified component is often less useful. For example, semantic information such as ‘what is the purpose of the component identified?’
1.6 Objectives and Contributions
8
is vital domain knowledge that can be provided by a software engineer during recovery. This thesis is cognizant of these requirements when exploring existing reengineering and maintenance research to arrive at a proposed solution to the legacy dilemma stated in section 1.4.
1.6
Objectives and Contributions
In approaching component recovery, two distinct steps can be identified: 1. Encapsulating reusable code. 2. Wrapping the reusable code to conform with a component-based development process. If step one is adequately performed then step two becomes trivial, thus the main focus of this thesis lies with the first step, with step two given a lighter treatment (Le Gear et al., 2004). Given this focus, the objectives of this thesis can be outlined as follows: 1. To produce a repeatable process for targeted component encapsulation, that is driven by the software engineer, that makes use of domain knowledge, is humanoriented and aggregates two or more appropriate techniques, based on a comprehensive literature of the field. 2. To evaluate this process and its products in ecologically valid settings. The key contributions of this thesis include: 1. Exploiting a previously unexplored variation on Software Reconnaissance that can be used to identify reuse in software systems. The view produced is called the reuse perspective.
1.7 Thesis Structure
9
2. Creating a tailored version of Reflexion Modelling designed specifically for component interface identification and component encapsulation. 3. The novel combination of the reuse perspective and the variation on Reflexion Modelling as a process for component recovery. This combination forms a component encapsulation process called Reconn-exion. 4. The positioning of Reconn-exion within existing literature. 5. An empirical evaluation of Reconn-exion acquired from realistic industrial settings that evaluates the validity of Reconn-exion with respect to the process and the product and also this evaluation suggests some future refinements to the process. 6. A repeatable process for component recovery.
1.7
Thesis Structure
The remainder of this thesis is structured as follows: • Chapters 2, 3, 4 and 5 forms a literature review relevant to Component Reconnexion. The broad areas of software components and software reengineering are first discussed, followed by a more detailed look at two software analysis techniques - Software Reconnaissance and Software Reflexion Modelling. • Chapter 6 explores research methods and in particular describes the research approach that will be adopted in this thesis. • Chapters 8 and 9 follow with an evaluation of Component Reconn-exion. First, the two constituent parts of Reconn-exion, which are core contributions in themselves, are evaluated in industrial, case studies. Then a large industrial case study is undertaken to evaluate the complete Reconn-exion process.
1.7 Thesis Structure
10
• Chapter 10 reflects on the previous three chapters providing a critical evaluation in terms of validity of the studies undertaken and the theoretical limitations of the technique. • Chapter 11 suggests potential future work that could be undertaken to refine Reconn-exion and evaluate results to date, as well as suggesting potential expansions to Reconn-exion. • Finally in chapter 12 a conclusion to the thesis is provided.
Chapter 2 Software Components “Architecture starts when you carefully put two bricks together. There it begins.” - Ludwig Mies van der Rohe (German born American Architect. 1886-1969).
12
Though many definitions exist that describe components, no officially recognised, standard definition of components, that is sufficiently constrained, exists at present (Hamlet, 2001). The word component has been used to describe procedures (Wilde and Scully, 1995), collections of reusable code (Mii and Takeshita, 1993), identified modules within systems (Cimitile and Visaggio, 1995; Girard and Koschke, 1997), class libraries (Zweben et al., 1995) and, more recently, black-box units of composition in software (Eddon, 1999; Ran et al., 2001). These are but a few understandings of components from a much wider list of contradictory viewpoints (Bachmann et al., 2000; McGurren, 2004; Szyperski, 2003; Johnson, 2002; Wang et al., 1999; Stevens and Pooley, 2000; Girard and Koschke, 1997; Chiricota et al., 2003; Cimitile and Visaggio, 1995; Cheesman and Daniels, 2001; Allen and Frost, 1998; Wilde and Scully, 1995; Ran et al., 2001; Mii and Takeshita, 1993). More recently Clemens Szyperski took a different approach to component definition (Szyperski, 2003). Rather than attempting to adopt a single interpretation that serves as a panacea, he instead categorised existing viewpoints on components into a four tiered classification framework based upon their use and history of inception. This is summarised by table 2.1. The first two tiers in table 2.1 describe reuse only. Terms that relate to more recent component-based development are introduced in tiers 3 and 4. The entries in this table are core issues of consideration when designing a state of the art component (Woodman et al., 2001) and should be addressed when implementing component recovery. For example the designer must decide if the system needs to be able to undergo dynamic alteration (Buckley et al., 2003), be deployable, or is likely to be regarded as reusable again. Szyperski refined this categorization to describe four tiers of maturity among component technologies (table 2.2), defining where the elements of reuse are available and whether they can be introduced dynamically.
13
Table 2.1: Motivations for software components described by four tiers. Tiers Description Tier 1 - Basic Reuse A programmer informally reuses source as he writes a program. i.e. cut and past code cloning Tier 2 - Advanced Reuse
Tier 3 - Deployable Components
Tier 4 - Dynamic Components
Standard libraries of reusable source code are introduced and used, as-is, across a company or several companies. Units of construction are built and deployed to software systems. Real world examples include the java bean (Ran et al., 2001; Inc, 2005) and COM+ (Eddon, 1999) implementations of component-based development. They allow static composition only. The units of construction allow runtime alteration of their properties and the dynamic replacement of components in assembled systems. Examples of which may be found in (Sadd, 2003) and (Dowling et al., 1999).
2.1 A Component Definition
14
Table 2.2: Levels of maturity among CBD Technologies. Maturity Level Description 1. Maintainability Modular Solutions to software systems. 2.Internal Reuse E.g. Product Lines within companies, for companies promoting CMM 5 development processes. 3. a. i. Closed Composition Make and buy from a closed pool of organisations. 3. a. ii. Open Composition Make and buy from the open market. 3. b. Closed Dynamic Dynamically Upgrade from restricted markets. 4. Open and Dynamic Completely open and dynamic upgrades from a potentially unlimited pool of organisations.
The component technologies referred to in this report are concerned primarily with tier 3 of table 2.1 at maturity level 3. a. i. closed composition. That is, this thesis focuses on reuse identification and replaceability of components from existing systems of an organisation. In doing so, it is likely that the reusable entities identified will be of the most use in generating varietal systems for that organisation or organisations in a similar domain, or alternatively they could be used in evolving the organisation’s current systems.
2.1
A Component Definition
By confining the definition of a component in this report to tier three of table 2.1 at a maturity level of 3.a.i., our ability to precisely define the nature of components, relevant to this thesis, is considerably clarified. Three core criteria are identified to characterise the nature of components: 1. A black-box implementation of some functionality (Bass et al., 2000; Bachmann et al., 2000; Wallnau, 2003).
2.2 Interfaces
15
2. May be reused, “as-is” by a third-party consumer (Washizaki et al., 2002). 3. Conforms to some component model (Councill, 2001). This definition can be further expanded to account for the maintenance life cycle in evolving systems by additionally describing components as units of versioning and replacement (Szyperski, 2003; Cheesman and Daniels, 2001). This definition should not be taken as firmly established yet. For example, debate surrounds the requirement that a component must be a black-box implementation (Cho et al., 2001). However, for the purposes of this thesis, we will consider these characteristics as defining components.
2.2
Interfaces
Core to components and the assembly of components into a component architecture is the concept of an interface (Cheesman and Daniels, 2001; Lau, 2001). The IEEE describes an interface as, “A shared boundary across which information is passed ... To connect two or more components for the purpose of passing information from one to the other.” (IEEE, 1990) This sixteen year old definition still carries weight in describing the underlying principals of interfaces. Expanding upon this, the shared boundary described is essentially a formalism for controlling dependencies between software implementations, where a software implementation could be an operating system’s modules, class libraries or even procedure libraries (Bachmann et al., 2000). Beugnard et. al. (Beugnard et al., 1999) describes how complete interface specifications can be described by categorising interface properties into four distinct levels (figure 2.1) (McGurren, 2004; Bachmann et al., 2000):
2.2 Interfaces
16
Level 4: Quality−of−Service Level
Dynamically Negotiable
Level 3: Synchronisation Level
Level 2: Behavioural Level
Level 1: Syntactic Level
Negotiable
Figure 2.1: Levels of interface specification adapted from Beugnard et. al. (Beugnard et al., 1999)
2.3 Introducing Component-Based Development
17
Syntactic Level: The format of method and function signatures as prescribed by the grammar of a programming language caters for this interface level. API’s already adequately cater for syntactic level interfaces. Behavioral Level: A behavioral specification is a formal description of what should happen when a software artifact executes, and is often called a contract (Cicalese and Rotenstreich, 1999). Languages such as Eiffel (Eiffel Software, 2004) and OCL (Clark, 2002) support behavioral level interfaces. Synchronisation Level: At this level properties describing component synchronisation, mutual exclusion, atomicity and transactions are specified. Java already implements a lightweight version of synchronisation through its “synchronised” keyword. Quality-of-Service Level: The previous three levels reasoned about properties that could be precisely defined. The quality-of-service level, however, is concerned with quantifying component properties such as “average response” and “quality of result.” These are a measure of one’s trust in a component (Councill, 2001) and is usually specified by third party certification.
2.3
Introducing Component-Based Development
In section 1.3 component-based development was specifically cited as a means of curbing the problems associated with monolithic software development by explicitly placing reuse as core to the process. Component-based development can be simply defined as “... the building of software systems out of prepackaged generic elements.”(Meyer and Mingins, 1999) “... [this] involves the technical steps for designing and implementing software components, assembling systems from pre-built software compo-
2.3 Introducing Component-Based Development
18
nents, and deploying assembled systems into their target environments.”(Bass et al., 2000) A graphical description of the component architectural style, adapted from (Bachmann et al., 2000), can be seen in figure 2.2. It includes (Bachmann et al., 2000): 1. Components - These form the building blocks of the system. 2. Clearly defined interfaces on the components to describe the services that the component offers. 3. The components assembled in accordance to clearly defined contracts to describe the interaction between component instances. 4. Multiple instances of component types, which describe families of component instances in the same way that an object is an instance of a class. 5. Each instance of these component types can be deployed either statically or dynamically forming a component-based piece of software. A statically deployed component is deployed at implementation time. A dynamically deployed component is deployed at runtime. 6. The combination of component types, their interfaces and an explicit description of their valid patterns of interaction forms a component model. 7. The component model is supported by a component framework. A framework will consist of a set of supporting services and other components that are useful and sometimes necessary in building the component-based application. 8. The component model provides an array of runtime services that enforces the component model.
2.3 Introducing Component-Based Development
19
3 − Contract
2 − Interfaces
1 − Component
5 − Deployed 9 −Multiple instances of a Component Type may be Deployed 4 − Component Type
7 − Component Framework
8 − Runtime Services
6− Component Model
Figure 2.2: A Generic Component Architecture adapted from Bachmann et al. (Bachmann et al., 2000).
2.3 Introducing Component-Based Development
20
With this architecture comes the potential to take software development from being a practiced craft to a fully fledged engineering discipline (Johnson, 2002; Whittaker and Voas, 2002) that includes the predictable assembly (Wallnau, 2003) of software systems. This component-based software engineering approach can be defined as “... the practices needed to perform [component-based development] in a repeatable way to build systems that have predicable properties.” (Bass et al., 2000) While this “holy grail” is yet to be achieved, positive research towards component-based software engineering (Wallnau, 2003; Cheesman and Daniels, 2001; Hamlet, 2001) plus software support for component-based development principals (Eddon, 1999; Ran et al., 2001; Sadd, 2003) are emerging. Two examples are Progress Dynamics (Sadd, 2003) or the .NET framework (Microsoft, 2006a).
2.3.1
A Component-Based Development Process
Several subtly different approaches to component based development exist (Allen and Frost, 1998; Cheesman and Daniels, 2001; Wallnau, 2003), and these are often supported by existing component technologies (Ran et al., 2001; Eddon, 1999; Progress Software, 2003). Here, Chessman and Daniels’ process of specifying component-based software is discussed to contextualise the research (Cheesman and Daniels, 2001). Built upon the UML notation, the process is portable to a wide variety of platforms and component technologies1 . All projects follow two processes simultaneously - A management process with a subservient development process (Cheesman and Daniels, 2001, page 25). The management process accounts for time constraints and the setting of milestones and goals. 1
Chessman and Daniels also extend the current UML notation to explicitly handle components. However, the focus of this discussion is on the process.
2.3 Introducing Component-Based Development
21
Business requirments
Use Case Models
Requirements Business Concept Models
Use Case Models
S P E C I F I C A T I O N
Existing Assets (Resue)
Component Identification
Component Interaction
Component Specification
Technical Constraints
Components
Provisioning
Assembly
Applications
Component Specification and Architectures
Test Tested Applications
Deployment
Figure 2.3: The component based development process adapted from (Cheesman and Daniels, 2001). The development process, which we are concerned with, creates working software from requirements. The diagram in figure 2.3 describes Chessman and Daniels componentbased development process. The process is driven by five workflows, with a workflow being a sequence of actions that produce an output of observable value (Kruchten, 1999): Requirements: The requirements of the system are gathered and organised in a useful way. Two new artifacts are outputted by this workflow; The business concept model and the use case model. The business concept model is a conceptual model of the business domain that provides a common vocabulary to be used by software engineers and project managers in relation to the project. The use case model is
2.3 Introducing Component-Based Development
22
a set of use cases describing all identified functional requirements of the system. Specification: A complete set of requirements, a business concept model and the set of use case models are taken as input and combined with other existing information regarding software assets. This extra information could include existing documentation, designs or recovered or existing software components. These are used to produce a complete set of component specifications and a component architecture as output. The component specifications describe, in detail, what component types will be required. The component architecture shows how instances of these types will interact. The specification workflow can be subdivided further into three major tasks: Component identification: Taking the business concept model and the use case model as input. The component identification stage identifies an initial set of component interfaces and an architecture. Component interaction: This stage examines how the system’s operations will be achieved using the identified component architecture, thus refining upon the component identification workflow. Component specification: Detailed specifications for components are created along with an interface information model artifact. The interface information model describes operations, states and constraints that are enforced on the component. Provisioning: The component specifications and architecture, taken as input, are used to determine the components that are available, the ones that must be built and the ones that must be bought. It is the job of the provisioning workflow to make available the required components for subsequent workflows. The reuse of components is explicitly catered for here as can be seen in figure 2.4. Furthermore,
2.4 The Legacy Dilemma Revisited
23
the reuse is not confined to components, and can include any existing software assets. The potential of reengineering towards components, from existing legacy applications, to supplement the provisioning workflow is the focus of this thesis. Assembly: The components, a suitable user interface and existing software assets such as recovered components or components from a repository, are combined to form an application. Testing and Deployment: During this workflow, standard testing and roll out of the new application occurs. Individual components will be unit tested and the entire assembly will be functionally tested.
2.4
The Legacy Dilemma Revisited
The previous section introduced the concepts of Cheesman and Daniel’s componentbased development process. In particular, it is suggested as a means of introducing the widespread reuse of software across systems. However, component-based development remains a relatively new concept. This implies that the majority of existing software is written using different or even obsolete development paradigms. This existing source code should somehow be exploited for reuse in modern component-based development processes, since it is prohibitively difficult to reimplement the implicit business rules in the existing system (Verhoef, 2000). The provisioning workflow (figure 2.3), whose task it is to make components available for subsequent development, caters explicitly for just such exploitation. Figure 2.4 expands the provisioning workflow presented in figure 2.3. showing that components may be acquired from three sources: • Components may be bought from external vendors. • Components may be built.
2.4 The Legacy Dilemma Revisited
24
Existing Assets (Reuse) Techincal Constraints
Components
Repository
Components
Other Assets
Bought Components Built Components Existing Components
Component Development
Buy Components From External Sources
Reengineering Towards Components Leveraging Existing Components
Provisioning Component Specification and Architectures
Figure 2.4: A detailed description of the provisioning workflow. • Components may be reused from a repository of existing components. Of particular interest is the ability to take existing software components from a repository for reuse. This repository may be established in advance, particularly as part of a development philosophy such as product line software development (Bergey et al., 2000; Simon and Eisenbarth, 2002; Eisenbarth and Simon, 2001). Alternatively this repository could be populated by recovering components from existing software. This can be legacy source code reengineered towards components that can populate the repository. Reengineering legacy systems towards components for reuse in existing or new systems is the primary concern of this thesis.
2.5 Encapsulation
2.5
25
Encapsulation
Encapsulation is a means of reducing interdependence between parts of a software system (Snyder, 1986). By applying encapsulation to portions of software appropriately, increased ease of development for software engineers can be afforded to software engineers (Zweben et al., 1995). This section explores the evolution of encapsulation in software development, describes encapsulation in detail, discusses the core quality measures for encapsulation - coupling and cohesion - and finally discusses why componentbased development is yet another improved means of development that better supports encapsulation in software.
2.5.1
A Brief History of Encapsulation in Software Development
2.5.1.1
Monitors
During the early 1970’s Hoare introduced the concept of monitors as a means of controlling access to procedures and local data in a running program (Hoare, 1974). A monitor can be declared according to the following template: monitorname:monitor begin ... declarations of local data to the monitor procedure declarations end The monitor construct would allow any number of processes in the operating system to request access to the monitor source code. However, never would more than one process be allowed to be executing the source code or accessing the local data of the monitor at any particular time. In this fashion a monitor would achieve process encapsulation by grouping operations and data that should be only be executed together.
2.5 Encapsulation
26
The grouping of data and procedures would be referred to simply by the monitor name, hence achieving the desired abstraction effect afforded by encapsulation. 2.5.1.2
Information Hiding
At roughly the same time as Hoare published his work on monitors (Hoare, 1974), Parnas had begun to introduce the concept of information hiding (Parnas, 1972, 1971, 2002). Parnas first introduced the concepts of information hiding in (Parnas, 1972). Information hiding, similar to the process encapsulation afforded by Hoare monitors, advocates the hiding of portions of a program’s data and operations associated with that data behind a defined interface (Parnas, 2002, 1972). Unlike Hoare’s monitors, information hiding is advocating encapsulation and abstraction on the static structure of the program, rather than encapsulation of the program in terms of the running processes of the operating system. The principles of information hiding provides the necessary basis for dividing a software system into modules, hiding complexity of the system and interacting through well defined interfaces (Wikipedia, 2006b). This form of encapsulation would eventually form the basis for mainstream software development and design. 2.5.1.3 Object Oriented Programming Object oriented programming is a style of programming that supports the concepts of information hiding as a first class language construct. The first object oriented programming languages emerged during the 1960’s with Simula (Simula, 2006). Object oriented programming introduces modern programming concepts such as inheritance, polymorphism and, most relevant to this discussion, information hiding through data encapsulation. This encapsulation is afforded to the user using the class construct in object oriented languages. As research began to highlight the importance of information hiding (Parnas, 1972;
2.5 Encapsulation
27
Zweben et al., 1995) and the need to promote these concepts during software development, object oriented languages, with their explicit use of data encapsulation, began to rise in popularity. By the early 1990’s object oriented programming had gained widespread acceptance in software development and has been shown to provide significant benefits in ease and manageability of development (Zweben et al., 1995). Object oriented languages achieve better encapsulation over their non-object oriented counterparts by providing several key language concepts to the programmer: • The ability to group related operations and data using the class construct. • The ability to limit access to methods and data to a given scope. This is achieved using keywords such as public, private and protected. • The ability to abstract over related class types using inheritance and polymorphism. One possible interpretation of encapsulation is that its purpose is to protect portions of a system from operations and data that are irrelevant to those portions. In the non-object oriented code example in figure 2.5 no such protection is put in place in the program. All four procedures potentially have access to all the data of the code fragment, in spite of the fact that “procedure1” and “procedure2” only access variables r, s, t, and u and “procedure3” and “procedure4” only access v, w, x, y and z. A clear division of data and the operations that act over that data exists in the code fragment, however no first class entity of the programming language exists that makes explicit or enforces this encapsulation of data is available. Furthermore, within our two divisions of the code fragment, “procedure2” and “procedure4” are only ever accessed via “procedure1” and “procedure3” respectively and never from “main”. Again, as the fragment currently stands the potential to access “procedure2” and “procedure4” from “main” does exist. The language offers no means of encapsulation that would “hide” access to “procedure2” and “procedure4” from “main”.
2.5 Encapsulation
int r,s,t,u,v,w,x,y,z; void main() { procedure1(); procedure4(); } void procedure1() { r = s + t; procedure2(); } void procedure2() { u = r + s + t; } void procedure3() { v = w + x; procedure4(); } void procedure4() { x = y; y = v; z++; z += v; }
Figure 2.5: A code fragment from a C program.
28
2.5 Encapsulation
29
Object oriented languages provide these needed language constructs. The revised code example in figure 2.7 is a revised version of the code example in figure 2.5 that makes appropriate use of the object oriented language’s class and scoping constructs in C++. Using the class construct the separate variable groupings mentioned above have been placed into separate classes and grouped with the procedures that operate on those variables. The variables of these classes have been scoped as private since we wish to encapsulate this data in within the scope of the class and deny access to the data from outside the class. Likewise one of the procedures within each of the classes has been marked as “private” (“procedure2” and “procedure4”) because no procedures outside of their respective classes access these procedures. Notice, in particular, how even tight scoping can be achieved by declaring certain variables as local variables within the methods. In this case u is declared locally in “procedure2,” w is declared locally in “procedure3” and y and z are declared locally in “procedure4.” This is because those variables are used exclusively by the procedures that they are now declared in. The classes themselves and the procedures within them that we wish to provide access to are marked with the “public” modifier. This allows access to these elements, program wide. The result of these measures is a reduction in the list of operations and data that classes and procedures can access, through information hiding. This encapsulation results in a reduction in complexity when creating and maintaining the application by separating the concerns of the program into explicitly scoped groupings of data and associated operations, accessible only through clearly defined interfaces. In figure 2.6 a visualisation of this encapsulation is shown to help clarify what has been achieved. Further, encapsulation benefits are provided by object oriented languages through effective use of the object oriented concepts of “inheritance” and “polymorphism.” Inheritance allows one to define a hierarchy of class types in a program. The inheriting type will inherit the characteristics (data and operations) of the type it inherits from. Take the “Animal” inheritance hierarchy in figure 2.8, which is visualised in figure 2.9,
2.5 Encapsulation
30
Program main()
MyClass1
MyClass2
private data r, s and t
private data v and x
procedure1
procedure2 local data u
procedure3 local data w
procedure4 local data y and z
Figure 2.6: A visualisation of the encapsulation exercise shown in figure 2.7.
this time using Java syntax (Sun Microsystems, 2006). The hierarchy describes a set of animals that share some characteristics, and become more specialised as we descend the hierarchy. Carefully note the use of the ”protected” modifier. “private,” as we saw, limits access to the enclosing class. “protected,” however limits access to the enclosing class and any classes that inherit from that class. Classes outside of the inheritance hierarchy will still have no access to the protected members of the class. Polymorphism is a feature of object orientation that operates on inheritance hierarchies and provides the ability to treat a derived class just like it’s parent class, sometimes to the extent that the derived class’ use becomes invisible to the programmer. This encapsulation effect is demonstrated in the code example in figure 2.10. The code example models a scenario where a animal is caught in the “Wild,” brought to a “Clinic” to be treated and then put into captivity in a “Zoo.” This example makes use of the inheritance hierarchy in figure 2.8. Notice how the “capture” method will capture a specific type of
2.5 Encapsulation
void main() { MyClass1 cl1 = new MyClass1(); MyClass2 cl2 = new MyClass2(); cl1.procedure1(); cl2.procedure2(); } public class MyClass1 { private int r,s,t; public void procedure1() { r = s + t; procedure2(); } private void prodecure2() { int u; u = r + s + t; } } public class MyClass2 { private int v,x; public void procedure3() { int w; v = w + x; procedure4(); } private void procedure4() { int y,z; x = y; y = v; z++; z += v; } }
Figure 2.7: A revised version of the code fragment in figure 2.5.
31
2.5 Encapsulation
32
class Animal { protected int morale = 0; public void raiseMorale() { morale++; } public void decreaseMorale() { morale--; } } class Biped extends Animal { } class Quadruped extends Animal { } class Monkey extends Biped { } class Orangutan extends Biped { } class Dog extends Quadruped { } class Cat extends Quadruped { }
Figure 2.8: An inheritance hierarchy code sample.
Animal
Biped
Monkey
Quadruped
Orangutan
Dog
Cat
Figure 2.9: A visualisation of the inheritance hierarchy in 2.8.
2.5 Encapsulation
33
animal depending on the circumstances. However, the type of the animal is not of any concern to the “Clinic” class as the clinic will treat any type of animal and place it in the “Zoo”. Using polymorphism in the example this form of information hiding can be achieved. All operations in the “Clinic” occur on the type “Animal” and the ”Clinic” class remains agnostic to the actual type of the instance it is dealing with. In this way encapsulation over an inheritance hierarchy can be achieved, shielding, where possible, portions of the program from the the complexities of the type hierarchies.
2.5.2
Coupling and Cohesion
By the late 1970’s researchers had begun to arrive at a consensus regarding the merits of encapsulation and abstraction during software development and design. The focus next began to shift to how to assess the quality of encapsulation. The led to two commonly accepted understandings of encapsulation quality being formed - Coupling and Cohesion. These indicators of “good” design were conceived over thirty years ago (Stephens et al., 1974). Coupling is the degree of interdependence between component’s or modules and cohesion is the extent to which an individual component or module’s individual parts are needed to perform the same task (Yourdon and Constantine, 1979). Low coupling and high cohesion often indicate a more replaceable (and reusable) component and, by measuring coupling and cohesion we can get an indirect measure of replaceability and reusability. We define and measure coupling between two modules in terms of the type and degree of communication between them (Fenton, 1991). Figure 2.11 is an example of a recovered component (“Transforms”) from later in the thesis. Notice the high number of internal connections on the components relative to the interconnections between the components. In line with the suggested use of encapsulation as suggested by the previous section (section 2.5.1.3) it could be said that this component is well encapsulated since many calls that are irrelevant to clients are encapsulated in the component, with a minimised number of calls between the components.
2.5 Encapsulation
public class Zoo { public incarcerate(Animal animal) { if (animal.getType().equals(Dog.Class.getType())) { Dog dog = animal; dog.raiseMorale(); } else { Monkey monkey = animal; monkey.decreaseMorale(); } } } public class Wild { String loc; Wild(String location) { loc = location; } public Animal capture() { if (loc == "Europe") return new Dog(); else return new Monkey(); } } public class Clinic { public static void main(String args []) { Wild theWild = new Wild("Europe"); Zoo theZoo = new Zoo(); Animal capturedAnimal = theWild.capture(); treatAnimal(capturedAnimal); theZoo.incarcerate(capturedAnimal); } treatAnimal(Animal theAnimal) { theAnimal.raiseMorale(); } }
Figure 2.10: Using polymorphism for encapsulation code example.
34
2.5 Encapsulation
35
Figure 2.11: An example of a loosely coupled and highly cohesive component.
Correspondingly these components are loosely coupled due to the low interdependence between the components, and they display high cohesion as a virtue of the fact that there is a large number of internal, hidden calls relative to the intercomponent calls. Fenton (Fenton, 1991) describes a classification of 6 types of coupling, between two modules x and y, that can be arranged by increasing strength (0-5) and are based upon the type of communication between two modules: • 0: x and y have no communication; that is they are totally independent of one another. • 1: x and y communicate by parameters, where each parameter is either a single data element or a homogeneous set of data items that incorporate no control element. This type of coupling is necessary for any meaningful communication between modules. • 2: x and y accept the same record type as a parameter. This type of coupling may cause interdependency between otherwise unrelated modules.
2.5 Encapsulation
36
• 3: x passes a parameter to y with the intention of controlling its behavior; that is the parameter is a flag. • 4: x and y refer to the same global data. This type of coupling is undesirable; if the format of the global data must be changed, then all common coupled modules must also be changed. • 5: x refers to the inside of y; that is, it branches into, changes data in, or alters a statement in y. He continues to provide a calculable metric for coupling, c(x, y) = i +
n n+1
where c is the coupling between two modules x and y, i is the level of coupling on the six part scale and n is the number of interconnections between x and y. To measure cohesion Yourden and Constantine (Yourdon and Constantine, 1979) provided a seven point scale of decreasing cohesion. Functional cohesion, where the module performs a single well defined function, is the best and subsequent items are presented here in terms of decreasing cohesion: • Functional: the module performs a single well defined function. • Sequential: the module performs more than one function, but they occur in an order prescribed in the specification. • Communicational: the module performs multiple functions, but all on the same body of data (which is not organised as a single type or structure). • Procedural: the module performs more than one function, and they are related only to a common procedure of the software.
2.5 Encapsulation
37
• Temporal: the module performs more than one function and they are only related by the fact that they must occur within the same timespan. • Logical: the module performs more than one function, and they are related only logically. • Coincidental: the module performs more than one function and they are unrelated.
2.5.3
Encapsulation Features of Component-Based Development
As stated early in this chapter, software components and component-based development claims to provide a better means of software development than the current state of the practice in the software industry. Software components are intended to build upon the existing object oriented technologies (Meyer and Mingins, 1999) by adding to and improving the means of encapsulation during development (Meyer and Mingins, 1999; Cheesman and Daniels, 2001). A software component may constitute any number of classes. Thus, encapsulation can be implemented on a much larger scale. This is an important feature of encapsulation that becomes desireable as an application or domain being modeled grows large. Take, for example, the classes in the diagram in figure 2.12. The edges in the diagram represent dependencies in the program between the classes. After brief examination of the diagram you will notice that two distinct groupings of classes exist (d,e,f,g,h,u,v and w,x,y,z,a,b,c) and that all communication between these two groupings only occurs via three classes (u,v,w). Similar to the problem posed in figure 2.5 in the previous section, a mechanism for aggregating classes and hiding complexity through encapsulation would be useful (see figure 2.13). Software components provide this explicit construct as a first class entity. Unlike packages in Java, for example, which also may be considered in this light, typical component technologies provide one or several explicit,
2.5 Encapsulation
38
z h x
y
g u w e
a
v
f
b d c
Figure 2.12: Many classes without and encapsulation policy.
localised interfaces that declare the public services of the component, thus increasing encapsulation. When we encapsulate the two components, as with figure 2.13 the interface (public classes with respect to the component) on the component become u, v and w. By making explicit what classes are public and private to a component one removes the potential for breaking the desired encapsulation. Earlier, in section 2.3, it was noted how a component framework offers an array of runtime services to the programmer. One such service is an event-based model of programming. In such a model, client code may register with the component to listen for a specific event that occurs in the component. When such an event is fired the client code can respond to the event with the invocation of a specified procedure. Figure 2.14 shows a sample C#.Net code fragment (Microsoft, 2006a) that demonstrates client code registering to listen for a “ComponentShutDown” event. In this circumstance, when the component is shut down, the component (“Component”) will raise an event. Because the client code has registered to listen for this event it will notice that the event has occurred and respond to it by invoking it’s own client code (“ClientProcedure”). Better encapsulation of the state of the component is achieved by hiding more internal data of the component using the event based model. The alternative to using such a model
2.5 Encapsulation
39
z h x
y
g u w e
a
v
f
b d c Component 1
Component 2
Figure 2.13: Many classes from figure 2.12 encapsulated by a component. public class ClientCode { ClientCode() { Component.ComponentShutDownEvent += new EventHandler(ClientProcedure); // Registers client to listen for an event with the component. } public void ClientProcedure(Eventargs e) { // Some operations that respond to the event here. } }
Figure 2.14: Event handling code sample. would have been to have the client source code continually poll the component to see if there has been a change of state in the component. Instead, with the event based model, the client code becomes a passive entity and the relationship between client and component becomes inverted. The state of the component is no longer a concern of the client code until it informed by an event. The onus is on the component to provide clients with notification of an event and information about that event (including changes in state) via the “Eventargs e” argument passed by the component to the client code in figure 2.14.
2.5 Encapsulation
40
192.168.11.2
192.168.11.1
Component1
Component2 HTTP
Figure 2.15: A deployment diagram of two distributed components.
A form of geographic or topological encapsulation is also provided by a component framework. Component frameworks such as J2EE or .Net provide a mechanism for components2 (Microsoft, 2006a; Inc, 2005) on different machines, which could potentially be in very different geographic locations, to register with a component framework. This makes the component available for use in a distributed fashion. However, calls to these components may be made as though they were on the same machine. Take, for example two components on different machines, as shown in the deployment diagram in figure 2.15. Once properly registered with the framework a method call between the two components could potentially be as simple as shown in figure 2.16. In this way information regarding the location of components and the required information to reach these components over a network is encapsulated by the component and the framework and completely hidden from clients of that component. Also noted in section 2.3 was that components can communicate between each other using clearly defined interfaces. The intention here being that the only knowledge we have about a component should be through its interface and that all other information about the component should be encapsulated, including the original language implementation of that component. This suggests that a component-based system could po2
The specific named for these types of component are Enterprise Java Beans (J2EE) and Web Services (.Net)
2.5 Encapsulation
41
//Component1 definition .. . // definition of class within the component public class MyClass { public void someMethod() { Component2.Component2Class component2Class = new Component2.Component2Class(); component2Class.component2Method(); } } .. . // remainder of component.
Figure 2.16: A call between the distributed components shown in 2.15. tentially be composed of many components written in many different languages. The .Net component framework, for example, supports the definition of components written in over 50 different languages (Ritchie, 2006).
Chapter 3 Software Reengineering “It is the neglect of timely repair that makes rebuilding necessary.” - Richard Whately
3.1 Agendas for Reengineering Software Systems
3.1
43
Agendas for Reengineering Software Systems
When software needs to evolve to prolong its lifetime, software development teams may have only three choices: 1. Purchase a new system. 2. Develop a new system. 3. Leverage the existing system. The third choice is often the only feasible option, since the former two routes are generally too expensive (Rochester and Douglass, 1991). As a result, a large body of research has been produced in the areas of reengineering, maintaining and leveraging the existing systems. Reengineering is a subset of software maintenance, specifically directed at leveraging existing systems. Several definitions for reengineering exist (Chikofsky and Cross II, 1990; Arnold, 1993; Corp., 1989). These definitions differ only in allowing or disallowing the behaviour of a system to be altered as a result of applying a reengineering technique. We will use the widely accepted Chikofsky and Cross (Chikofsky and Cross II, 1990) definition of reengineering: “... the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form.” Thus, we can view reengineering as an extension of maintenance where the new form is an evolved version of the system (Tilley et al., 1994; Leintz and Swanson, 1980). This definition does not explicitly exclude alteration of the systems behaviour (Arnold, 1993), however it does remain ambiguous. Many fields of research exist within the category of Software Reengineering and Maintenance:
3.1 Agendas for Reengineering Software Systems
44
• Software Comprehension (O’Brien and Buckley, 2001). • Design Recovery (Biggerstaff, 1989) – Architectural Recovery (Aldrich et al., 2002). – Component Recovery (Koschke, 2000a). • Refactoring and Restructuring (Chikofsky and Cross II, 1990). – Language Transformations (Terekhov, 2000). – Rearchitecting (Fowler et al., 1997). – Wrapping (Aldrich et al., 2002). • Data Analysis. – Slicing (Weiser, 1982). – Control flow analysis (Urschler, 1975). – Normalisation (Connolly and Begg, 2004). • Reuse Identification. – Clone Detection (Baxter et al., 1998). – Frequency Spectrum Analysis (Ball, 1999). – Fan-in analysis (Fenton, 1991). The following sections in this chapter focus only on the relevant topics from software reengineering and maintenance that are applicable to the objectives of this thesis, namely component recovery, reuse identification and dynamic and static analysis.
3.2 Dynamic versus Static Analysis
3.2
45
Dynamic versus Static Analysis
In reengineering some form of analysis of the software artifact must be undertaken. This analysis can derive information such as call relations, data flows or metrics of complexity, some of which may be necessary before reconstituting the system in a new form. Techniques employed to analyse software can be broadly categorised as static and dynamic (Tip, 1995; Ritsch and Sneed, 1993). The difference between the two lies in the distinction between programs and processes1 . A program is a static representation and is characterized by source code. A process is an instance of that program executing and is dynamic. The scenario is analogous to a recipe and baking a cake (O’Gorman, 2001); the recipe being the program and the baking of it being the process. Thus, static analysis will present information based upon the source representation of the system. A dynamic analysis will glean it’s information based on source execution at runtime. This runtime information is typically retrieved in the form of a coverage profile or program trace (Ball, 1999) using a form of software instrumentation (Wilde, 1998). Deciding on which approach to employ is a matter of context. For example, consider control or data flow analysis. In the case of a static analysis the resulting data set can be program wide. But, this can be problematic where programs are large, yielding a massive data set after analysis. Attempts to identify software components within legacy software, for the purpose of extraction or modernization are well documented in the reengineering literature with varying degrees of success (Riva and Deursen, 2001; Johnson, 2002; Cimitile and Visaggio, 1995; Girard and Koschke, 1997; Quilici, 1993; Eisenbarth, Koschke and Simon, 2001; Zhao et al., 2004). With the exception of a few solutions such as concept analysis based feature location (Koschke, 2004) described by Eisenbarth et al. (Eisenbarth, Koschke and Simon, 2001), most rely heavily upon static analyses and utilise little or 1
This is the operating system notion of a process.
3.3 Reuse Identification Techniques
46
no information gleaned from the dynamic execution of the software. However, static and dynamic approaches may be viewed as complementary when analysing software (Ball, 1999). Techniques that exclude dynamic analyses deny access to key information regarding (Ritsch and Sneed, 1993): 1. The software elements that are used and those that are not, for given execution scenarios. 2. Performance information. 3. Relationships between code and particular business transactions. 4. Sequence of execution. The first and third points are particularly relevant to this thesis’ core agenda - reengineering towards components - as this states that it is possible to relate source code to a prescribed execution scenario (often realised by a test case) and then to further relate an execution scenario to a business transaction that it instantiates. This offers the potential to identify implementations of behaviors of interest, during the targeting phase of the component recovery process.
3.3
Reuse Identification Techniques
As identified for component-based development in the previous chapter, software reuse is a core concern for a software engineer. Software reengineering and maintenance literature provides us with several means of identifying reuse in software systems.Several types of reuse exist. Based on the review of reuse undertaken here two broad categories of reuse appear to emerge: 1. Reuse internal to a system.
3.3 Reuse Identification Techniques
47
2. Reuse across systems. The latter type of reuse is probably the most familiar type of reuse and realises the “Software Reuse” approach to software development (Nauer and Randell, 1968) defined as: “the process of creating software from existing software rather than building software systems from scratch.” (Krueger, 1992) This type of reuse can be be realised in any number of ways including, component deployment from repositories, the use of libraries in the form of header files or web services (Priéto-Diáz, 1991). The principals of component-based development are intended to foster this type of reuse (Cheesman and Daniels, 2001). Reuse internal to a system can exist in several ways, and are identifiable by their detection techniques, as illustrated in the following sections.
3.3.1
Clone Detection
A software clone is duplicated code within that system (Baxter et al., 1998). Typically between 5% and 10% of a typical appliation consists of code clones (Baxter et al., 1998). Clones in a system tend to be viewed as a maintenance risk, since a change to a cloned piece of code may require changes to the other clones of that piece of code that are not immediately obvious to the maintainer. This is particularly true given that identifying clones in a system is not straightforward, since, subtle changes to the piece of code being cloned may have occured in the cloning process. As a result several algorithms have been prolifereated that attempt clone detection in software (Baxter et al., 1998; Baker, 1997; Johnson, 1994). These algorithms work off the source code text, or the abstract systax tree of the partially compiled program to identify its clones. It is also worth noting that the existence of a clone is not always bad and may indicate to the software engineering portions of code that are highly reusable, since a clone is the explicit
3.3 Reuse Identification Techniques
48
reuse, by a programmer, of some implementation abstraction (Johnson, 1994). Clones will not be made apparant to the user if a dynamic analysis approach such as Software Reconnaissance is used. For example, if code that implements logging is cloned in a system and the logging feature is traced using a test case, only one example of the duplicated logging code will be captured.
3.3.2
Fan-in Analysis
Calls to a procedure from various parts of a system demonstrates another type of reuse. This is known as fan-in. A fan-in analysis determines the number of incoming calls for a procedure or class (Fenton, 1991). The fan-in analysis can also provide other valuable insights in to a system, including the identification of aspects, since procedures called from many diverse locations can indicate the presence of aspects (Marin et al., 2004). However, while fan-in is useful in identifying direct procedural reuse the reuse is not shown to be associated with an particular domain feature set as with the reuse perspective defined in this thesis.
3.3.3
Frequency Spectrum Analysis
Execution traces can be used to determine how often a software element is used for a given run of the program. Measuring this type of reuse is called frequency spectrum analysis (FSA)(Ball, 1999). This analysis can provide runtime reuse frequency information for particular elements or patterns of reuse for groups of elements. Calculation of the reuse perspective also relies on dynamically generated information.
3.4 Dependency Graphs int int int x = z = y =
49
x; y; z; 1; 1; x + z; Figure 3.1: Code example 1. int x
int y
int z
1
Figure 3.2: A possible graph representation of code example 1 in figure 3.1.
3.4
Dependency Graphs
A dependency graph is a graph representation of dependencies in a software system. This intermediate representation of the system is a convenient depiction of the source code that easily affords itself to analysis (Larsen and Harrold, 1996) and code optimization of programs (Ferrante and Warren, 1987). In the code example in figure 3.1 we have a number of statements and assignments. To help understand data flow one might decide to model the assignment and declarations in the program as with the graph in figure 3.2. In the next code example is a program with an ‘if’ statement and a ‘while’ loop (figure 3.3). It is possible to model the control flow structures of this program in a graph for as shown in figure 3.4. Early use of dependency graphs saw them used to help implement code optimiza-
3.4 Dependency Graphs
50
(1) int x = 0; (2) int y = 0; (3) while (x == 0) (4) { (5) if (y > 10) (6) { (7) x = -1; (8) } (9) else (10) { (11) x++; (12) } (13) y++; (14)} (15)x = 0; (16)y = 0; Figure 3.3: Code example 1. int x = 0; int y = 0; while (x > 0) if (x > y)
x = −1;
x++;
y++;
x = 0; y = 0;
Figure 3.4: A possible graph representation of code example 2 in figure 3.3.
3.4 Dependency Graphs
51
public class Animal { } public class Dog extends Animal { } public class Cat extends Animal { } public class Greyhound extends Dog { } Figure 3.5: Code example 1.
Animal
Cat
Dog
Greyhound Figure 3.6: A possible graph representation of code example 3 in figure 3.5.
tions and analyses such as program slicing (Ottenstein and Ottenstein, 1984; Ferrante and Warren, 1987). Ferrante and Ottenstein describe what they call the “Program Dependency Graph” which combines data flow and control flow for a program in a single graph. To model programs written in object oriented languages, dependencies such as inheritance relationships may also be included. Take the code example in figure 3.5 where a three tier inheritance hierarchy exists. A dependency graph modelling this type of dependency can be seen in figure 3.6. Similarly, other constructs that introduce dependencies in object oriented languages such as polymorphism and the friend construct may be modeled in a graph representation.
3.4 Dependency Graphs
52
int procedure1() { procedure2(7); procedure3("hello",3); } int procedure2(int x) { procedure3("hello again",x); } int procedure3(String str, int y) { } Figure 3.7: Code example 4.
Another commonly derived dependency is the method or procedure calls made in a program. Take the code example in figure 3.7 and its corresponding graph representation in figure 3.8. The resulting dependency graph is known as a call graph (Fenton, 1991). In (Larsen and Harrold, 1996) the authors describe a “System Dependency Graph” where the special dependency relations for object oriented software, described above, and the call relation dependencies are combined with the dependencies of the program dependency graph to form a large comprehensive dependency graph of the system. Their application of the graph is to enable slicing in object oriented software. In this context the call graph can be seen as a subset of the system dependency graph. The analyses performed by the technique proposed in this thesis only require the call relations of a program. Therefore it is the call graph dependency graph representation that is used in this thesis as a basis for analyses.
3.5 Reengineering Towards Components
53
Procedure1 Procedure2
Procedure3 Figure 3.8: A possible graph representation of code example 4 in figure 3.7.
3.5
Reengineering Towards Components
3.5.1
Design Recovery
Design recovery is a subset of reverse engineering (Chikofsky and Cross II, 1990) Chikofsky and Cross (Chikofsky and Cross II, 1990) in their taxonomy, define design recovery as: “a subset of reverse engineering in which domain knowledge, external information, and deduction or fuzzy reasoning are added to the observations of the subject system to identify meaningful higher level abstractions beyond those obtained directly by examining the system itself” Other descriptions of design recovery do exist (Stoemer et al., 2003; Dean and Chen, 2003; Sartipi et al., 2000; Malton and Schneider, 2001), but they all essentially capture similar basic concepts - that the implicit agenda behind design recovery is to help the programmer understand the system and its design. Biggerstaff (Biggerstaff, 1989) first brought the term into the mainstream in 1989 with his accompanying tool DESIRE. Here, the inadequacies of source code alone in
3.5 Reengineering Towards Components
54
an understanding context are identified. Application domain, programming style, supplementary documentation are just a few factors external to the source code, that have an impact on the understanding of the source code (Shaft, 1995). Design recovery can include elements of domain knowledge regarding the system, the system’s context, documentation supporting the system and input from an expert developer of the system. Core to this topic is the concept of a domain model. A domain model records the expectations of a programmer regarding the real-world situation the system is modelling, during an understanding process, and attempts to match these expectations with source code; hence introducing traceability from hypotheses to source code. An attempt at automation was made in Biggerstaff’s DESIRE tool (Biggerstaff, 1989). The tool is analyzed further in (Biggerstaff et al., 1993) where he identifies what is known as the concept assignment problem of matching expectations and hypotheses to source code programming implementations (Brooks, 1983). Where these source implementations are clichéd they are known as programming plans (Brooks, 1983). Creating domain models automatically has proved difficult (Biggerstaff et al., 1993). Research in the area of plan detection (Quilici, 1993; Quilici et al., 1997; Quilici and Yang, 1996; Rich, 1984; Woods and Quilici, 1996), and pattern detection (O’Cinneide, 2001; O’Cinneide and Nixon, 1999, 2000, 2001; Heuzeroth et al., 2003), though worthwhile, and partially grounded in comprehension theory, has not yet reached a level of practical application. At present, the best application for automated design recovery through plan detection would seem to be in vertical domains where a far narrower range of plans and expectations would exist, thus making the solution space manageable (i.e. the coding alternatives for each plan) (Quilici et al., 1997). Given that automating design recovery is not currently practical, semi-automated approaches are being investigated as viable solutions. In recent years, semi automated approaches, such as Reflexion Modeling, CME and FEAT have been used with very promising results (Kosche and Daniel, 2003; Murphy and Notkin, 1997; Murphy et al.,
3.5 Reengineering Towards Components
55
1995; Sartipi, 2001; Tran et al., 2000; Murphy et al., 2001; Walenstein, 2002; Chung et al., 2005; Robillard and Murphy, 2002; Lindvall et al., 2002). These processes follow these general steps2 : 1. Hypothesise categories and relationships between the hypothesised categories in the application under analysis. 2. Map parts of the application into these categories creating a hypothesised model. 3. Extract a concrete, lower level model of the application. 4. Compare the hypothesised model against the concrete model of the system. 5. Refine the results and repeat the process until satisfied. Dynamic analysis techniques have also shown promise as a means of design recovery (Ritsch and Sneed, 1993; Heuzeroth et al., 2003; Komondoor and Horwitz, 2003; Rajlich and Wilde, 2002). Dynamic analysis offers the potential to remove the need for source code domain knowledge3 prior to analysing the system. For example, dynamic analysis techniques for feature location, such as Software Reconnaissance or concept analysis (Wilde and Scully, 1995; Eisenbarth et al., 2003; Wong et al., 1999) use knowledge of the system’s execution with respect to test cases that exhibit certain business transactions to relate code to business function. 2
Some of these steps may be implicit in the use of the technique or appear to be merged to the user, however, they do exist. 3 Knowledge of the style of source code written for that domain. E.g. all compilers may have the same approximate design, therefore someone with domain knowledge of compiler development would expect certain modules to exist in the implementation.
3.5 Reengineering Towards Components
3.5.2
56
Clustering for Architectural Recovery and Component Recovery
With repsect to step one (the encapsulation phase) of reengineering towards components, the most relevant reengineering and maintenance techniques are those that involve clustering. Clustering is a widely used technique of software maintenance and reengineering that identifies the contents of potential modules in a system and the cohesive interfaces between those modules. The contents of these modules are called clusters (Hutchens and Basili, 1985). Clustering is often used to aid software comprehension, design recovery, component recovery and architectural reconstruction (Doval et al., 1999; Mitchell et al., 2002; Rennard, 2000; Ogando et al., 1994; Choi and Scacchi, 1990; Lindig and Snelting, 1997; Gall and Klösch, 1995; Patel et al., 1992; Valasareddi and Carver, 1998; Yeh et al., 1995; Kazman and Carrière, 1997; Murphy and Notkin, 1997). Component recovery and architectural recovery are highly related and yet are subtly different software analysis tasks, used for different purposes. An architectural recovery process will generally follow two steps (Koschke, 1999): 1. Identify the code that implements each component in a system. 2. Identify dependencies between the code of the components of the system. This type of analysis is used to redocument systems, communicate their design and to help software engineerings understand unfamiliar systems. In contrast, the goal of component recovery is to identify individual components in a systems and extract them, possibly for reuse in other systems (Koschke, 2000b). To achieve this, a limited form of architectural recovery will occur, however, the global view that architectural recovery achieves is generally not required. We suggest that a component recovery process follows these generic steps (illustrated earlier in section 2.4), which is similar to architectural recovery:
3.5 Reengineering Towards Components
57
1. Identify the code that implements the component of interest only. 2. Identify dependencies on the code of the component of interest only. 3. Conform with a component model by wrapping the component with a component wrapper. This is discussed in section 3.5.5. It is important to note the final step, where a component wrapper is applied to achieve conformance with a component model. The majority of component recovery techniques described here recover components that conform to an older and simpler definition (table 2.1 tier 1, basic reuse) of a component and therefore the last step is often not necessary. Unless otherwise explicitly stated the review of clustering techniques for component recovery in this section does not include the final step. However, this does not pose a dilemma, since, if the first two steps are carried out to define a cohesive, reusable component, the application of a component wrapper becomes relatively trivial. Approaches to clustering can be placed into three broad categories, based upon the type of information they act upon (Koschke, 2000a): • Dataflow-based approaches. • Structure-based approaches. • Domain-model-based techniques . 3.5.2.1
Dataflow-based Approaches
Dataflow-based approaches to clustering, cluster based upon data relationships in the source. The relationships examined can be data types (Doval et al., 1999), abstract data types (Ogando et al., 1994; Yeh et al., 1995) or simply the declared variables themselves (Hutchens and Basili, 1985; Gallagher and Lyle, 1991). The way in which we clustered parts of a code fragment to form valid encapsulations in the example in section 2.5.1.3 could be considered a form of data clustering.
3.5 Reengineering Towards Components
58
Hutchens and Basili (Hutchens and Basili, 1985), describe a dataflow clustering technique, based upon whether data is passed, received, used or altered for two or more procedures. Their work demonstrates some of the earliest evaluation of clustering as a means of architectural recovery. They compared the structure recovered by their technique against descriptions produced by software engineers of the systems to determine the success or failure of the approach. Their results exhibited preliminary success for architectural recovery and set a precedent for evaluating future automated clustering techniques geared toward architectural recovery. Unfortunately, their technique was limited by their inability to perform analysis of abstract data types and pointer usage. The work of Livadas and Johnson (Livadas and Johnson, 1994) overcame some of these shortcomings through the use of system dependency graphs (SDG’s). Livadas and Johnson successfully implemented several clustering algorithms in (Livadas and Johnson, 1994), based upon data type usage identified in an SDG, to recover objects from source code which was not object-oriented. In (Gall and Klösch, 1995) the authors also implemented a clustering technique based on the analysis of data-types. The goal of their work was to identify abstractions in procedural code which could be transformed to object-oriented code, with the specific goal of reuse. Their approach is semi-automated, and human oriented. Their approach begins by extracting low level program representations such as data-flow diagrams, and call graphs. Using these, two types of component are identified algorithmically: • Data store entities (DSE): that is, a clustering of source code that uses the same persistent data. • Non-data store entities (NDSE): that is, a clustering of source code that use the same internal data. Components identified in this fashion are then compared against a domain model
3.5 Reengineering Towards Components
59
generated from human derived information such as requirements documents. Mappings are made manually between the domain concepts and the recovered components. In doing so, it becomes clearer what components are valid and what components are not. The human orientation for the component recovery technique was ahead of its time. The semi-automated, human-oriented nature of the approach, are characteristics that are seen as best practice in component recovery literature (Koschke, 2000a). More recently, dataflow clustering techniques have been incorporated into aggregated approaches for component recovery (Ogando et al., 1994). These are discussed later in section 3.5.3. 3.5.2.2 Structure-Based Approaches Structure-based approaches to clustering operate by analysing the structure of the system. Examples include (Girard and Koschke, 1997; Schwanke, 1991; Siff and Reps, 1997). Some structure-based approaches will operate by applying a specific metric to the relations between elements in the system. For example, Schwanke (Schwanke, 1991) calculates a similarity measure between variables in procedures in a system to create a weighted relationship between them. Unfortunately, using this method on its own produced poor results. Results were improved when the author introduced AI-based tuning method to weight the metric, over time, in favour of the user’s clustering preferences. Other structure-based approaches that derived a weighted relation-ship between clusters in a system include (Chiricota et al., 2003) and (Muller et al., 1993). Another style of structure-based approach exploits graph theory algorithms to a dependency graph of a systems to cluster elements together. One of the most common examples is dominance analysis (Cimitile and Visaggio, 1995; Girard and Koschke, 1997), which clusters procedures together in a similar fashion to the first example shown in section 2.5.1.3. Thus the technique suggests code clusters based on high encapsulation in the system. In (Cimitile and Visaggio, 1995) the authors effectively demonstrate
3.5 Reengineering Towards Components
60
how modules of software can be identified using dominance analysis with the goal of reuse in mind. A more recent flavour of clustering has seen the use of Concept Analysis in clustering systems. Concept analysis is a mathematical technique used to analyse binary relations (Koschke, 2004). In recent years is has been successfully applied to the software engineering field (Snelting, 1998, 1996; Eisenbarth et al., 2003). One potential application of concept analysis is the identification of modules in source code. For example, in (Lindig and Snelting, 1997) the authors attempt to reengineer modules from legacy code by examining the binary relationship between procedures and global variables. Their results were mixed. An architectural recovery seemed only possible where an underlying structure existed in the first place. Two of the three case studies they performed their analysis on had undergone serious structural degradation as a result of years of ongoing maintenance and did not yield a recovered architecture as a result of their analysis. In (Siff and Reps, 1997) the authors also apply concept analysis to recover modules. This time the binary relation was placed over functions and their properties (i.e. - arguments, return values). 3.5.2.3 Domain-Model Based Approaches The concept of a domain model was introduced in 3.5.1. A domain model, formed using the domain knowledge of a user, can be very effective in producing an accurate4 decomposition of a software system (Murphy and Notkin, 1997; Kazman and Carrière, 1997) particularly as a prelude to reuse (Patel et al., 1992). One of the most common domain model-based approaches to software clustering is the Reflexion Modelling technique (Murphy and Notkin, 1997; Murphy et al., 1995, 2001; Murphy, 1996, 2003). Reflexion Modelling forms part of the process of component recovery process proposed by this thesis and is described in greater detail in chapter 5. Other domain-model based 4
Accurate from that user’s point of view.
3.5 Reengineering Towards Components
61
approaches include FEAT (Robillard and Murphy, 2002) and the CME (Chung et al., 2005).
3.5.3
Aggregated Recovery Approaches
It has been shown in literature that clustering techniques generally perform poorly as a means of component or architectural recovery, when used in isolation (Koschke, 2000a; Kazman and Carrière, 1997). Furthermore, it has been suggested that future approaches to solving this problem should aggregate existing individual approaches, include the human more in decisions of the process, make use of dataflow information and place more emphasis on domain knowledge (Koschke, 1999). The approach proposed in this thesis is an aggregated process for component recover that satisfies these recommendations. A good example of an aggregated approach is Ogando et al. (Ogando et al., 1994) who proposed an aggregated approach to recovering objects from source code. Their approach uses a combination of bottom-up and top-down understanding to achieve object recovery. From a top down stand point, objects are identified using two techniques: • Routines are grouped together based on what global variables they use. For example, if a single global variable is used by four procedures then they are grouped together. • User defined data types and the routines that use them are grouped together. These clustering techniques provide an initial architectural recovery of the system, thus facilitating understanding from the top down. From a bottom up perspective a human-oriented grouping of subcomponents is performed. For example, sometimes the automated clustering techniques used will suggest that a routine will belong to many, different objects. These conflicts are resolved in a bottom-up, semi-automated fashion by presenting them to the user. The type of domain knowledge used was reported to be mainly derived from the naming conventions in the code.
3.5 Reengineering Towards Components
62
In (Girard and Koschke, 1997) a framework for component recovery is proposed that uses dominance analysis as the primary technique and combines it with two dataflow clustering techniques and another graph-based structural clustering technique. 1. First all mutually recursive routines are clustered. 2. Then each global variable and the procedures that use them are clustered together. These are called abstract state encapsulations (ASE). 3. Them each user defined data type, and the procedures that use them, are clustered together. These are called abstract data types (ADT). 4. Finally the dominance analysis is performed on the collapsed call graph to yield further component suggestions. The results of the authors’ studies showed a marked improvement upon simply using dominance analysis alone. Two aggregated approaches to architectural/component recovery are described in (Koschke, 1999) and (Kazman and Carrière, 1997) where over fifteen clustering techniques are placed at the user’s disposal to apply at his discretion. These approaches demonstrate the most comprehensive solutions to date. Importantly, both approaches incorporate domain knowledge input from the user, which increased the effectiveness of their solutions. The ultimate goal of this thesis would not be to replace techniques like these, rather to evaluate our technique with an eventual view to integrating it into larger aggregated processes.
3.5.4
Componentisation Processes
In this thesis componentisation refers to techniques used to convert entire systems to a component based implementation, hence it is more similar to architectural recovery than component recovery. In a componentisation process the improved encapsulation
3.5 Reengineering Towards Components
63
mechanisms are applied to no-component-based parts of a system similar to what was described in section 2.5.3. An early componentisation process was described in (Choi and Scacchi, 1990). Using their module interconnection language, NuMIL, the authors describe their suggested process for augmenting a program with, what would be described in modern terms as component architecture description. The modules that they describe, however, are not consistent with the definition of component used in this thesis. Another componentisation approach that is related to the recovery of components is from Aldrich et. al’s application of the ArchJava component based software development language to a legacy system (Aldrich et al., 2002). Reflexion Modelling is used to explicitly and accurately identify component boundaries in the system before applying the ArchJava language to it (see chapter 5). Though successful in their goals of applying a component language extension to an existing system, it does not explore the objective of identifying individual components with the goal of reuse in mind. P. D. Johnson in (Johnson, 2002) describes another componentisation approach using black-box reengineering. Black-box reengineering is any reengineering approach that only requires the maintainer to understand the system down to the functionality level and not the detail of implementation (Understanding to the implementation level during reengineering is known as white-box reengineering). A good example of black-box reengineering techniques are the many feature location approaches that exist5 (Eisenbarth et al., 2003; Wilde and Scully, 1995; Zhao et al., 2004; Wong et al., 1999). These techniques often do not require understanding of the implementation to locate source code responsible for implementing a feature. Removing the requirement for detailed understanding of implementation details, of course, presents time saving benefits at the comprehension stage during a “reengineering to components” process. P. D. Johnson’s process follow three steps: 5
Where a feature is any operation that produced a result of observable value (Eisenbarth et al., 2003).
3.5 Reengineering Towards Components
64
1. Identify business components: Apply a chosen technique that identifies components in code. 2. Create wrapper components: Supplement the code chosen to be recovered as a component with wrapping code so that it may conform to the definition of a component (Bergey et al., 2000). This is a necessary step when reengineering towards components and is described in the next section. 3. Deploy wrapper components: Use the recovered components in a system. In the case of P. D. Johnson’s process he only considers deployment in the existing system to replace the same piece of the system that was chosen to be wrapped as a component. Importantly, this process is only a framework process for componentisation, and leaves many of the details of the precise process steps and what tools and techniques to use to achieve these specific steps, up to the discretion of the user. The process proposed by this thesis partially fits into this framework by describing in detail a set of steps that can be used to fulfil step one (component encapsulation) of componentisation. Also, it’s crucial that the differences between what is proposed by this thesis and the above componentisation process are understood; Reconn-exion implements targeted component recovery where individual components are chosen by the software engineer and encapsulated. Componentisation is a process that is applied to every component in a system. For this thesis, only the first step is within the scope, i.e. - Only the encapsulation of a new component is of concern, not it’s alteration to conform with a component model or integration in a new system. However, for completeness, step two is considered briefly in the next section.
3.5 Reengineering Towards Components
3.5.5
65
Component Wrappers
Wrapping is a mechanism by which legacy source code may be supplemented to modernise the system to conform with new development paradigms (Bergey et al., 2000). By legacy system we mean that it meets the following requirements (Juric et al., 2000): • Has existing code. • Is currently useful. • Is currently used. • Does not conform to the component model that we are applying the wrapping technique for. Due to the recent appearance of component-based development, few legacy systems or the software assets that constitute them, conform to the requirements of existing component models and frameworks6 . Therefore, before source code from a legacy system can be considered fully reengineered towards components, it must first be amended so as to achieve conformance with the component model and framework to which it will be applied. This is known as component wrapping (Comella-Dorda et al., 2000). A simple process for wrapping legacy system as a JavaBean, for example, follows these three steps (Comella-Dorda et al., 2000): 1. Modularise by identifying the component’s code and distinct interfaces in the legacy system. 2. By identifying the interfaces the points of contact to the remainder of the system are identified. 6
Interestingly the legacy source code does conform to a model of sorts - the operating system - that enforces constraints that today seem ubiquitous, such as process management, memory management and file management.
3.5 Reengineering Towards Components
66
3. Now the sufficient information necessary about the component is present to implement a wrapper bean for each component. A more generic approach perhaps would be to apply a component wrapper using the language independent component model. In this model a state of the art architectural description language that has been extended to allow the definition of components, such as xADL (Galvin et al., 2004) could be used. An example of this can be found in (Le Gear et al., 2004). A generic approach using this technology would follow these steps: 1. Identify a portion of legacy source to wrap as a component. 2. Identify its interface boundaries. 3. Apply the mark-up languages appropriately to describe a xADL component type. Component wrapping is a necessary step when reengineering towards components but it is not a core contribution of this paper and is not discussed further in this thesis.
Chapter 4 Software Reconnaissance “What is the difference between exploring and being lost?.” - Dan Eldon, photojournalist.
68
Execution Paths Through a System
Test case can isolate a path for the feature it exhibits
Figure 4.1: Identifying Features from Running Systems. Software Reconnaissance1 is a dynamic analysis technique, that, through the acquisition of coverage profiles (Ball, 1999), yielded by exercising carefully selected test cases on instrumented code (see section 4.2.1), allows mappings to be created (see sections 4.1) between program features and the software elements that implement them (Wilde and Scully, 1995). Figure 4.1 illustrates how Software Reconnaissance works at an abstract level. The term program feature is understood as being a realised functional requirement that produces an observable result of value to the user (Eisenbarth, Koschke and Simon, 2001; Eisenbarth et al., 2003; Eisenbarth, Kosche and Simon, 2001). The software 1
The dynamic search method is another name for the technique (Rajlich and Wilde, 2002).
4.1 A Functionality View of Software
69
elements2 , to which the features are mapped vary in granularity, depending upon the level of instrumentation, and may be branches of the decision tree (Wilde and Scully, 1995), individual statements (Wong et al., 1999) or procedures (Eisenbarth et al., 2003). The appropriate choice of granularity depends upon the context of use.
4.1
A Functionality View of Software
The mappings created between program features and code during Software Reconnaissance create a functionality view of software. Described more formally (Wilde et al., 1992), given a set of potentially overlapping functionalities3 , FUNCS = {f1 , f2 , . . . , fN } and a set of source elements, ELEMS = {e1 , e2 , . . . , eN } it should be possible to construct an implements relation,IMPL, over FUNCS X ELEMS, revealing what functionalities are implemented by what source elements. The link that allows us to construct a relation, is the test case, since a test case Ti represents an execution scenario that may exhibit one or more functionalities, F(Ti ) = {fi, 1 , fi, 2 , . . . } and will also exercise a set of source elements which can be identified through instrumentation (see section 4.2.1), E(Ti ) = {ei, 1 , ei, 2 , . . . } 2
In Norman Wilde’s seminal papers on Software Reconnaissance (Wilde et al., 1992; Wilde and Scully, 1995), he repeatedly refers to components. However, due to the potential confusion between this definition and the modern concept of components described in section 2.1, the term software elements is used in this thesis. 3 Feature and functionality are used interchangeably.
4.1 A Functionality View of Software
70
Put another way, we can define an EXERCISES relation over T X ELEMS where EXERCISES(t, e) is true if a software element e is exercised by a test case t and where t is a set of test cases defined as, T = {t1 , t2 , . . . , tN } Furthermore the relation EXHIBITS over T X FUNCS, may be defined, where EXHIBITS(t, f ) is true if test case t exhibits functionality f (Wilde and Scully, 1995). This type of analysis allows us to identify a number of interesting sets. These are discussed in the following subsections.
4.1.1
Common Software Elements
The set of common software elements refers to the source code that will always be executed regardless of the test case. This is illustrated in figure 4.2, where the large circles represent test cases and the small circles represent the software elements executed by them. The small, black circles are members of the set of common software elements, while the small, white circles are not. The set generally contains utility code of the system (Wilde and Scully, 1995). The set of common software elements, CELEMS, is defined as: CELEMS = {e:ELEMS|∀t ∈T, EXERCISES(t, e)}
4.1.2
Potentially Involved Software Elements
The potentially involved software elements for a feature includes software elements exercised by any test case exhibiting the feature f. This is illustrated in figure 4.3 where the large ovals represent test cases and the small circles represent the software elements executed by them. The small, black circles are members of the set of potentially involved software elements. This set will include software elements directly involved
4.1 A Functionality View of Software
71
Software elements
Common software elements in black
Test cases exhibiting any feature Figure 4.2: Common Software Elements.
4.1 A Functionality View of Software
72
in the implementation of f, however, it may also include elements that do not directly implement the feature f. The set of potentially involved software elements also tends to be quite large as a percentage of the entire system for most features (Wilde and Scully, 1995). When trying to map from feature to location, the primary use of the set of potentially involved software elements is as a foundation for more refined sets. The set of potentially involved software elements, IELEMS, is formally defined as: IELEMS(f ) = {e:ELEMS|∃t∈T, EXHIBITS(t, f ) ∧EXERCISES(t, e)}
4.1.3
Indispensably Involved Software Elements
The set of indispensably involved software elements, IIELEMS, is a refinement of the set of potentially involved software elements. IIELEMS is the set of software elements exercised by all test cases exhibiting f. Figure 4.4 illustrates this set. The small, black circles are members of the set indispensably involved software elements and the small white circles are not. This yields a set the same size or smaller than that of the set of potentially involved components for the same feature. However, the problem of scale remains when trying to locate the code that implements a specific feature (Wilde and Scully, 1995). The elements that solely implement the feature compared to the the size of the set is sometimes a small proportion. Again the set of indispensably involved software elements is more useful in defining more refined sets as described in the following sections. The set of indispensably involved software elements, IIELEMS, can itself be formally defined as: IIELEMS(f ) = {e:ELEMS|∀t∈T, EXHIBITS(t, f ) ⇒EXERCISES(t, e)}
4.1 A Functionality View of Software
Software Elements
Test cases exhibiting the same feature Figure 4.3: Potentially Involved Software Elements, shaded in black.
73
4.1 A Functionality View of Software
74
Software Elements
Test cases exhibiting the same feature Figure 4.4: Indispensably Involved Software Elements, are shaded in black.
4.2 Related Work
4.1.4
75
Uniquely Involved Software Elements
The set of uniquely involved software elements, UELEMS, is a further refinement upon previously described sets. It contains software elements used only by the functionality f and no other. The set is arrived at by taking the set of software elements exercised by any test case exhibiting f except for any elements that are also exhibited by features that do not exhibit f. Figure 4.5 illustrates this set where the large ovals represent test cases and the small circles represent the software elements executed by them. The small, black circles are members of the set of uniquely involved software elements and the small white circles are not. The set can be defined as follows: UELEMS(f ) = IELEMS(f ) - {e:ELEMS | ∃t∈T, ¬EXHIBITS(t, f) ∧ EXERCISES(t, e)} It has been shown experimentally that UELEMS provides a useful starting point when trying to understand the implementation of a particular feature (Wilde and Scully, 1995; Wilde et al., 1992) and that IIELEMS provides a context for understanding when using UELEMS.
4.2
Related Work
4.2.1
Software Instrumentation Enabling Software Reconnaissance
Obviously some form of instrumentation is required for software reconnaissance. Instrumentation is a dynamic software analysis technique that involves the inclusion of output statements in source code to help developers understand programs (Wilde, 1998). Several approaches to instrumentation exist, and have been comprehensively reviewed in (Wilde, 1998). An instrumented program, when run, will output a trace or profile of execution. A trace will show what was run during execution, and in what sequence. A profile will only show what was run during execution (Ball, 1999).
4.2 Related Work
76
Software Elements
Uniquely Involved Software elements in black
Test cases exhibiting features other than f1 Test cases exhibiting the same feature (f1)
Figure 4.5: Uniquely involved software elements, are shaded in black.
4.2 Related Work
4.2.2
77
Best Practices When Applying Software Reconnaissance
Early applications of Software Reconnaissance, which were concerned with finding general starting points for understanding specific program features, determined that instrumentation to the branches of the decision tree was necessary for optimal results (Wilde and Casey, 1996; Gunderson et al., 1995; Wilde et al., 2001; Wilde and Scully, 1995; Fantozzi, 2002). However, more recent variations of the technique use procedure level instrumentation (Eisenbarth et al., 2003; Eisenbarth, Koschke and Simon, 2001; Zhao et al., 2004) and statement level instrumentation (Wong et al., 1999). The former approach is used for feature mapping to source code by incorporating concept analysis while the latter focuses on software comprehension using execution slices. Most influential to the ultimate outcome of the technique’s use is the user’s choice of test cases (Wilde and Casey, 1996). For example, if chosen carefully, no more than two testcases may be required (Rajlich and Wilde, 2002) - One test case exhibiting the feature and another that does not4 , but the exact nature and amount of test cases will vary depending upon the technique’s use. In general, it has been experimentally shown that the fewer test cases used to identify a feature, the better the outcome of the technique (Wilde and Casey, 1996). Also, simply choosing existing test suites used in regression testing will not always suffice (Wilde and Casey, 1996). Test cases, for software testing purposes, are usually more complex since they tend to be attempting to reveal errors and therefore may exercise many features in a single test case (Eisenbarth et al., 2003). However, with the rise of new approaches to testing, such as unit testing and test case driven development (JUnit, 2006) the nature of test cases themselves are changing. In a publication from Eisenberg and Volder (Eisenberg and De Volder, 2005) the authors manage to successfully apply software reconnaissance using an existing suite of JUnit test cases. 4
Of course, this will mean that the involved and indispensably involved software elements sets will be the same.
4.2 Related Work
4.2.3
78
Previous Work Using Software Reconnaissance
In (Wilde et al., 1992) Norman Wilde published his seminal work on feature location and defined many of the fundamentals that would eventually become Software Reconnaissance. In this paper a feature location case study on a telecommunications switch application called PBX is presented. Even with feature location techniques at an early stage of research maturity, the study managed to return results with up to 50% accuracy. Such was the importance of the publication it received the “Most Influential Paper” accolade at the International Conference on Software Maintenance that year. However, the authors did draw the following conclusions from their experience: • The technique cannot replace the knowledge of an informed human. • The technique cannot identify source code responsible for a feature if it is always executed. In 1995 Wilde and Scully published “Software Reconnaissance: Mapping Program Feature to Code” (Wilde and Scully, 1995) which defined and demonstrated Software Reconnaissance for the first time. Much of the functionality view of software described in section 4.1 is derived from this work. The authors describe a case study on a portion of a C compiler with 15 KLOC approx. and successfully demonstrated the feasibility of the approach. In particular they identify the set of unique software elements as a useful point for a software engineer to begin searching for the implementation of a feature of the system he is looking for. In (Wilde and Casey, 1996) the authors evaluate Software Reconnaissance with the agenda of enabling technology transfer to industry. Three case studies on industrial or commercial systems are presented. The systems included: • The visitor control program: An application for logging and issuing passes for visitors to a company.
4.2 Related Work
79
• The graph display system: An XWindows graph display and browsing application. • The test coverage monitor: A program that provides test coverage information when executing test cases. Their experience in transferring the technique to industry allowed the authors to draw some interesting conclusions: • Test cases should be as few and as simple as possible when implementing Software Reconnaissance. • Software Reconnaissance is best applied to an unfamiliar system where focus on a single area is needed. • The use of existing test cases as part of a regression testing suite is not suiteable for applying Software Reconnaissance. New test cases, designed to exhibit specific features are needed. • For any technology being transfered to industry, flexibility and portability are needed to enable industrial trials. A means of visualising program traces to aid Software Reconnaissance is described in (Lukoit et al., 2000) using a tool call TraceGraph. The tool is applied in two case studies: • JointSTARS which is a defence system from the US DoD and approximately 300 KLOC in size. • The Mosaic web browser. Based on observations from the studies the authors noticed:
4.2 Related Work
80
• Only features that the user can control can be located using Software Reconnaissance. • The quality of results will depend upon the test cases used. • In some cases the human eye in conjunction with the visual representation provided by trace graph was able to replace the set difference operator of older Software Reconnaissance tools. The type of Software Reconnaissance used in the case studies is a slight variation on the original Software Reconnaissance technique design for multi-threaded applications (Wilde et al., 1997). The instrumentor required must be able to return a single trace where many traces and processes exist. This extension to Software Reconnaissance was proposed as a solution to problems that practitioners were experiencing while applying Software Reconnaissance during a case study on a system called InterBase in (Gunderson et al., 1995). In (Wilde et al., 2001) the authors compare Software Reconnaissance to a different static analysis feature location technique called the Dependency Graph method. The authors present a case study on a legacy fortran application called CONVERT3, which was used by the U.S. Navy as a raytracer for modelling explosions. Two teams applied the techniques to find two features and observed the following differences with respect to the two feature location techniques: • Because the dependency graph method forces the user to understand more of the source code the technique may be more suitable for users who lack domain knowledge regarding the system. • Software reconnaissance requires far less browsing of the code to locate a feature. • For large, infrequently changing programs Software Reconnaissance is a better alternative to the Dependency Graph method.
4.2 Related Work
81
In general the empirical evidence strongly suggests that Software Reconnaissance is a useful technique when trying to locate features in unfamiliar source code (Wilde and Scully, 1995; Wilde et al., 2001, 1997; Wilde and Casey, 1996; Wilde et al., 1998, 2003; Loeckx and Sieber, 1987; Wilde, 1998, 1994; Wilde et al., 1992; Fantozzi, 2002; Lukoit et al., 2000; Gunderson et al., 1995). Other techniques that resemble Software Reconnaissance are used in other problem domains such as software architecture reconstruction (Yan et al., 2004; Riva and Deursen, 2001) and component recovery from legacy software (Eisenbarth et al., 2003; Eisenbarth, Koschke and Simon, 2001), which is discussed at greater length in section 3.5. Finally, the activities required to undertake a Software Reconnaissance resemble many debugging activities, with, of course, the crucial difference that Software Reconnaissance is attempting to locate features not faults
5
(Wilde and Scully, 1995; IEEE,
1990).
5
However, it could be argued that a fault is simply an undesired feature.
Chapter 5 Software Reflexion Modelling “Any sufficiently advanced technology is indistinguishable from magic.” - Arthur C. Clarke.
83
One of the most successful software understanding and design recovery techniques of recent years has been Reflexion Modelling (Murphy et al., 2001). Impressive improvements in time taken by software engineers in understanding systems have been reported while using the technique. The details of the Software Reflexion Modelling process are explained in the next section, however the underlying principals of the technique are based on what are know as “collapsing strategies” (Stoermer et al., 2004). Given a software system, modelled as a dependency graph, we can decide to group certain elements of the graph together. This is the essence of a collapsing strategy. Figure 5.1 explains this in three steps: 1. We start with a dependency graph of a system (A). 2. Certain elements of this are chosen to be collapsed together (B)(Marked in black). 3. After the collapse is performed we have a slightly more abstracted graph (C). Many of the clustering techniques discussed in section 3.5.2 use collapsing strategies to achieve their goal. Importantly, a collapsing strategy alone is not a clustering technique. Clustering consists of both an analysis of the software followed by a decision to cluster based upon that analysis. For example, dominance clustering is a popular clustering technique (Girard and Koschke, 1997). First dominance analysis is performed and that determines dominating nodes in a call graph (the analysis). This is followed by a decision to cluster based upon the dominating nodes (collapsing strategy). Collapsing strategies that are preceded by a manual analysis guided by human input have shown to be helpful in allowing software engineerings to recover the designs of software systems (Refl, 2005; Murphy and Notkin, 1997; Robillard and Murphy, 2002; Chung et al., 2005). The manual analysis performed is typically a mnemonic analysis (Refl, 2005; Murphy and Notkin, 1997; Robillard and Murphy, 2002; Chung et al., 2005). Mnemonics, in this context, usually refers to the naming conventions of software
84
(A)
(B)
(C)
Figure 5.1: Collapsing strategy in operation.
5.1 The Reflexion Modelling Process
85
elements in a system (Refl, 2005). Software Reflexion Modelling (Murphy and Notkin, 1997), FEAT (Robillard and Murphy, 2002) and the CME (Chung et al., 2005) are all modern examples of this. The next section discusses the Software Reflexion Modelling process in greater detail.
5.1
The Reflexion Modelling Process
Software Reflexion modeling is a semi-automated, diagram-based, structural summarisation technique that programmers can use to aid their comprehension of particular software systems. Introduced by Murphy et al. (Murphy et al., 1995), the technique is primarily aimed towards aiding software understanding. Reflexion Modelling follows a six step process, illustrated in figure 5.2: 1. The programmer who wishes to understand the subject system hypothesises a high-level conceptual model of the system. 2. The computer extracts, using a program analysis tool, a dependency graph of the subject system’s source code called the source model. 3. The programmer then creates a map which maps the elements of the source model onto individual nodes of the high-level model (collapsing strategy). 4. The computer then assesses the call relationships and data accesses in the source code to generate its own high-level model (called the reflexion model). This model shows the relationships between the source code elements mapped to different nodes in the programmer’s high-level model. This allows comparisons between the computer’s model with the programmer’s model and the tool can report consistencies or inconsistencies in three ways:
5.1 The Reflexion Modelling Process
86
• A dashed edge in the reflexion model represents dependencies between elements of the programmer’s high-level model that exist in the source model, but were not actually included in the high-level model. • A dotted edge in the reflexion model represents a hypothesised dependency edge of the programmer’s high-level model that does not actually exist in the source model. • A solid edge in the reflexion model represents a hypothesised edge of the programmer’s high-level model that was also found to exist in the source model. 5. By targeting and studying the inconsistencies highlighted by the reflexion model the programmer can either alter their hypothesised map or high-level model to produce a better recovered model of the system. 6. The previous two steps repeat until the software engineer is satisfied that the recovered model is correct. Many approaches to software understanding and design recovery have since used Software Reflexion modelling as a basis for their techniques (Tran et al., 2000; Kosche and Daniel, 2003; Hassan and Holt, 2004; Chung et al., 2005; Robillard and Murphy, 2002). The success of Reflexion as a software understanding technique can be attributed to it’s parallels with the state of the art in cognitive psychology (This is explored in detail later in this chapter) and the state of the art in component recovery (Koschke, 2000a).
5.1 The Reflexion Modelling Process
87
5.
A B
C
1. High−Level Model
A
A
B
B
C
C
3. Map
Reflexion Model
2. Source Model
5.
Figure 5.2: The Software Reflexion Modelling Process
4.
5.2 Related Work
5.2
Related Work
5.2.1
Early Experiences with Reflexion Modelling
88
In Gail Murphy’s seminal paper on Reflexion Modelling she formally defines the Reflexion Modelling technique and applies it in an example to the NetBSD UNIX operating system (Murphy et al., 1995). In only a few hours the user of the technique was able to produce an accurately recovered model of the system. A case study that better demonstrates the usefulness of Reflexion Modelling shortly followed this publication in (Murphy and Notkin, 1997). A Microsoft software engineer, with 10 years development experience, applied the technique in order to gain a sufficient understanding of the Microsoft Excel spreadsheet product with the aim of later performing experimental engineering. After one month of applying the technique, the software engineer said that he had gained a level of understanding of the system that would have normally taken two years using other available approaches. While this reported improvement may seem dramatic, the result is consistent with subsequent Reflexion Modelling case studies (Murphy et al., 2001), including those undertaken by this thesis.
5.2.2
Extensions and Further Uses of Reflexion Modelling
In (Kosche and Daniel, 2003) the original Reflexion Modelling technique is adapted to incorporate the use of hierarchies in the high-level model description. To demonstrate, the approach it is applied to two C compilers, using a generic compiler reference architecture as an initial high-level model. While the authors evaluation is weak, the case studies were useful in highlighting a number of points: • Architectural recovery remains a difficult task. • The Reflexion Modelling approach to architectural recovery is highly iterative.
5.2 Related Work
89
• Much of the Reflexion Modelling process remains manual. • Domain knowledge is necessary for the success of Reflexion Modelling, at least in the absence of other software understanding aids. This finding is also supported in (Christl et al., 2005). In (Tran et al., 2000) a form of Reflexion modelling is used to repair the architecture of two open source systems. Unlike the original Reflexion modelling technique and similar to the approach used in (Kosche and Daniel, 2003), hierarchies are used to described the high level model, which they call the architectural level. The authors suggest four possible courses of action if the architecture is in need of repair: 1. Splitting: A high-level entity may be split into two separate nodes. 2. Kidnapping: Source model elements may have their mappings changed, thus moving them from one source model entity to another. 3. Change high level model dependencies: Unexpected dependencies may be placed in the high-level model, thus making them expected. 4. Change source model dependencies: The source code may be refactored to make is conform to architectural constraints. This is a highly invasive procedure compared with the other options. The authors applied their technique to the Linux Kernel and the VIM text editor. They used the original, published architectures of both systems as a base and repaired the architecture of both systems. However, it is important to note that this repair occurred without changing the source code (i.e. - Splitting, Kidnapping and changes to the high-level model dependencies were made). This makes their evaluation purely a modelling exercise and does not examine repair or refactoring in the traditional sense.
5.2 Related Work
90
In (Christl et al., 2005) the authors augment the Reflexion Modelling technique with two automatic clustering techniques to help aid the user in the mapping process. The two clustering techniques used are called MQ Attract and CountAttract and identify mappings based on measures of coupling and cohesion in the system. The techniques assume that a good set of modules in a system are those that have a low level of inter module coupling. Their technique was applied in a case study of the SHriMP graph visualisation tool (Storey and Muller, 1995) and, consistent with Reflexion-based case studies, showed good results. Their approach is similar to the one proposed in this thesis in the sense that Reconnexion augments Reflexion Modelling with a further reengineering technique. However, in (Christl et al., 2005) the authors use only static analysis techniques. As identified in the previous chapter, dynamic analysis techniques can facilitate the software comprehension stage of component recovery by providing a link from the initial, behavioral understanding of a system down to the architectural and implementation details of that system. Software Reconnaissance and Reflexion Modelling are proposed as a means of facilitating this understanding in Reconn-exion. Another dimension of information that the use of static analysis techniques will not address is historical information with respect to the software system being analysed (temporal analysis). The authors in (Hassan and Holt, 2004) derive historical information for software systems from source code control repositories and use it to augment the original Reflexion Modelling approach using what they call sticky notes. When iterating through the Reflexion Modelling process the authors suggest that extra information from the source control system, can help explain divergences in a model, could be useful to help answer important questions for the software developer such as: • Who introduced the unexpected dependency? • When was the unexpected dependency introduced?
5.2 Related Work
91
• Why was the dependency introduced? The authors evaluate their approach on a large, open source operating system called NetBSD to demonstrate how their approach could be applied. In (Aldrich et al., 2002) the author proposed and applied a new architectural description language extension to Java called ArchJava. While the focus of the paper is not Reflexion Modelling the authors do use Reflexion modelling to help them reengineering a case study system to apply their architectural description language. In effect, Reflexion Modelling is used, in this instance, to help componentised the system. However, no changes to the Reflexion Modelling process are made to make it suit componentisation and no comments or evaluation are attributed to Reflexion Modelling as a means of componentisation in the publication. A further application of Reflexion Modelling that has gained popularity is its use as a means of enforcing architectural decisions in an evolving software system (Tvedt et al., 2002; Eick et al., 2001; Sefika et al., 1996; Hochstein and Lindvall, 2003; Tvedt et al., 2004; Lindvall et al., 2002). Using Reflexion modelling in this fashion requires the software engineer to produce a high-level model at design time, as part of the normal forward engineering process. Then, as the parts of the system are implemented, they are mapped to the high-level model and the Reflexion model is created to reveal if dependencies exist that should not be there. In (Tvedt et al., 2002) and (Lindvall et al., 2002) the authors use this approach during the reimplementation of a commercial software system, written in Java, called the Experience Management System. Using the approach as part of the development in the company the authors observed several benefits: • The premise of the reimplementation project was to produce a better system. The concrete, as implemented view, provided by the Reflexion model, was useful in demonstrating to management that the project was succeeding.
5.2 Related Work
92
• Developers were initially resistant to the new architecture, which was based upon the mediator pattern. Using Reflexion Modelling, programmers who did not adhere to the pattern could be identified immediately. • Even with the teams best efforts, the desired architecture could not be fully achieved. Using Reflexion Modelling they were at the very least able to document these violations where they would normally have been overlooked. • Catching architectural problems early ultimately led to an easier to understand and more evolvable software system and prevented the normally inevitable decay of the architecture (Eick et al., 2001). A more detailed record of this study, by the authors is available in (Tvedt et al., 2004) and (Hochstein and Lindvall, 2003).
5.2.3
A Cognitive Basis for Reflexion Modelling
The component recovery process described in this thesis is as much an exercise in memory efficiency and recall as it is creating a new component, since the process is semiautomated and the recovered artifact is based on a system that already exists. Reflexion Modelling, and hence the tailored version of Reflexion Modelling that Reconn-exion employs, can be seen as software and process support for many memory retention and recall practises, as proposed and demonstrated by the field of cognitive psychology. The following is a review of the state of the art in cognitive psychology with respect to recovery, retention and recall practices, much of which is collated in (Brace and Roth, 2005), (Littleton et al., 2005) and (Fleming, 2006). The human memory processing structure can be divided into three main workflows (Brace and Roth, 2005), as illustrated in figure 5.3: 1. Encoding involves taking information from the outside world via the senses and
5.2 Related Work
93
Encoding:
Storage:
Retrieval:
Putting information into
Retaining information
Getting information back
in memory
out of memory
Memory
Figure 5.3: Adapted from (Brace and Roth, 2005).
creating a representation for that piece of information so it can be stored internally in the brain. 2. Storage is the process of retaining the encoded information in the brain for long periods. 3. Retrieval involves getting coded information from where it is stored in memory. Two types of retrieval process exist: Recognition and recall. The success of reflexion can be directly attributed to its support for encoding and retrieval. These are discussed in the following sections. 5.2.3.1
Encoding
Encoding is responsible for taking in, preparing and placing information into storage in the brain. Information can be encoded to different depths (Craik and Lockhart, 1972). For example, learning a passage of a book by simply repeating it over and over would constitute a shallow depth of encoding, since the information is simply encoded using its physical characteristics (i.e. the sound of the passage in this case). This type of encoding is known as maintenance encoding and is not a very robust means of encoding information. A deeper means of processing the information would be to encode it in terms of its meaning. Encoding in this fashion requires more effort, but tends to be far more lasting and useful in memory. It is known as semantic processing or elaborative rehearsal.
5.2 Related Work
94
Reflexion can be seen as a form of elaborative rehearsal since the user is not only linking the system to appropriate abstractions (that may typically associate to domain concepts) but also learning the syntax of the system. This also begins to explain why a lack of domain knowledge can have a detrimental effect on the use of technique. Elaborative rehearsal has been shown to dramatically improve encoding (Craik and Tulving, 1975) and there is no reason to believe that it should be any different with the application of Reflexion. A further interesting addendum to elaborative reasoning is what is known as the generation effect (Craik and Tulving, 1975). This states that an individual is more likely to remember information generated themselves rather than information presented to them by a third party. This helps to further explain the success of Reflexion where high level elements and mappings are created and named by the user, thus making it easier for the individual to remember information recovered from the system. The encoding of large bodies of information is also more robust if spread over many separate sessions rather than confined to a single one. This is known as the spacing effect (Ebbinghaus, 1913). The technique described in this thesis places no restrictions on the software engineer to complete his task in a single sitting. The work may be saved and restored over the course of many sessions. Indeed, in many of the case studies reported in this thesis, this is exactly what happened. This may in part contribute to a more effective process. For large bodies of information, organising parts of it into meaningful categories and hierarchies of categories has been shown to dramatically increase the amount of information that can be encoded, in a given time frame (Bousfield, 1953; Bower et al., 1969). The information encoded also tends to be far more lasting and easily retrievable from memory also. The subsequent retrieval tends to occur by category also. This is known (in cognitive psychology terms) as clustering. The nature of this categorisation is almost identically supported by the Reflexion process and tool support provided in
5.2 Related Work
95
this thesis, through the use of high-level models and mappings. Evidence also exists that new information is not encoded as received. It has been shown that when we are presented with new information that we use our knowledge of past experiences to make sense of the of it. This has been termed effort after meaning (Bartlett, 1932), and describes schemas in memory that exist based upon our past experiences. New information is interpreted in terms of these schemata in memory. With respect to Reflexion Modelling the existing schemata is the application domain knowledge that that person has. In this instance, by application domain knowledge we mean the typical implementation of a programming solution for that domain. For example, while all compilers are implemented quite differently, they all may have many design concepts, specific to that domain in common. Knowing this constitutes application domain knowledge. Having no application domain knowledge is equivalent to having no schemata into which new information can fit. The presence of these schemata makes it easier to encode information. This further explains why the Reflexion Modelling technique can be more effectively used by those with extensive domain knowledge regarding the application and programming domains. A user of Reflexion Modelling who has domain knowledge will have an existing schema in memory, therefore the creation of an initial high-level model will be much easier. 5.2.3.2 Retrieval Retrieval processes are responsible for fetching information from memory and bringing it to consciousness. A well encoded piece of information is useless if it cannot be effectively retrieved, however, it will also come as no surprise that the effectiveness of retrieval can be heavily dependant on how the information was previously encoded, as described in Tulving’s encoding specificity principle (Tulving, 1983, 1975). Information is retrieved from memory by accessing existing cues (routes to the piece of information in the brain). Information previously encoded using elaborate rehearsal
5.2 Related Work
96
provides far more cues than a shallowly encoded piece of information and is therefore far easier to retrieve. The previous section showed how the Reflexion process facilitates the user to encode information using hierarchical, clustering, elaborate rehearsal, the spacing effect and the generation effect. This, combined with the fact that recognition provides stronger cues than recall, makes for a very deeply encoded piece of information with many cues for retrieval (Tulving, 1983, 1975). In practice this means that information discovered about the system using Reflexion is very easily accessible by that person at a later date. Furthermore the presence of domain knowledge also provides cues at discovery time, making it easier again to encode new information into an existing schema present due to domain knowledge. 5.2.3.3 Human Learning Learning is the process of change as a result of experience (Littleton et al., 2005). Several models of the human learning process have been proposed in literature. Of most interest to us is a learning approach called category learning. Again this theory of learning is proposed by the field of cognitive psychology. A state of the art review of learning approaches proposed in cognitive psychology is collated in (Brace and Roth, 2005), (Littleton et al., 2005) and (Fleming, 2006). We have already touched on the subject when we spoke clustering in the memory process in section 5.2.3.1. Category learning is the learning that occurs when people come to understand that certain objects or entities belong together in particular categories (Littleton et al., 2005). The learning process is theorised as having these steps: • A group of entities are observed by the individual. • The individual forms a hypothesis that a portion of these belong together. • The individual tests his hypothesis against his existing knowledge.
5.2 Related Work
97
• The hypothesis is accepted, rejected indefinitely or put on hold until new knowledge causes a change in understanding. We can see from this brief overview that the Reflexion Modelling process is almost an identical implementation of category learning. This direct support for what is thought to be the way in which human learning processes actually operate in part explains the success of the Reflexion Modelling technique (Littleton et al., 2005; Brace and Roth, 2005; Fleming, 2006). Furthermore, studies in category learning have convincingly shown that prior knowledge and past experiences play an integral part in the formation of categories (Murphy and Medin, 1985; Murphy and Allopenna, 1994; Kaplan and Murphy, 2000). In the same way we can explain why existing domain knowledge is of benefit to those who use the Reflexion Modelling technique. Another form of learning model popular in cognitive psychology is constructivism (Huitt, 2003; Ausbel, 1968; Bruner, 1990; Seaman, 1999). An authoritative collation of constructivism research can be found in (Huitt, 2003). Constructivism states that an individual learner must actively “build” knowledge and skills and that information exists within these built constructs rather than in the external environment (Huitt, 2003; Bruner et al., 1956). This learning model suggests that the learners processing of stimuli from the environment produces adaptive behavior thus inducing learning. The Reflexion Modelling process supports this form of learning by continually create stimuli for the user. Each iteration of the Reflexion Model produces stimuli in the form of feedback regarding the correctness of his high-level model. If the stimuli challenges the user understanding of the environment (in this case the model of the system) an adaptive change is induced (i.e. the learner learns something new about the system). This adaptive change is eventually realised when the learner alters his high-level model for the following iteration.
5.2 Related Work
5.2.3.4
98
Learning Preferences
Separate to the underlying learning process are the individual preferences of people with respect to learning. People exhibit different likes and dislikes when assimilating new information and integrating it with their understanding of the world. These are our learning preferences, over which we have little control, and form part of a learning style that includes a range of other influencing factor over which we do have control (i.e. diet or environment) (Fleming, 2006). One of the more popular learning preference models is known as VARK1 and was first pioneered in 1992 (Fleming and Mills, 1992; Fleming, 2006). The model suggests that there are four types of learning modes that individuals may have a preference towards (or a mixture of these, making that person multimodal) (Fleming, 2006): Visual This is the preference to take information in in the form or charts, graphs, symbols or hierarchies. It does not include movies or text based presentations. Aural This is the preference to learn from information that is heard or spoken to you. For example, lectures, tapes, email or group discussions. Read/Write This is a preference to learn from information presented to you in the form of words. This is typically reading or writing or both. Kinesthetic This is the preference to “learn by doing.” People of this preference learn better from practical experience or dialogue rather than through direct tuition. Statistically the mixture of preferences among people is diverse, with over 58% of people being multimodal. The process and tool support examined in this thesis is particularly suited to the multimodality of the general population. Three out of the four learning preferences are directly appealed to by the process. The kinesthetic preference 1
Which stands for Visual, Aural, Read/write and Kinesthetic, the fours types of learning preference examined by the VARK assessment.
5.2 Related Work
99
is immediately satisfied through the hypothesis → test → refine nature of the process, strongly appealing to a learn-by-doing approach. The interpretation of results in the Reflexion models and the creation of maps is presented in a textual form to the user, thus satisfying the read/write preference. Finally, the visual preference is addressed through the presentation of reflexion models using a graphical output. This broad appeal, allied with other theories on learning, encoding and retrieval, may explain the positive results and comments with respect to Reflexion and Reconnexion as a learning tool. Experience from this thesis would also seem to support this conjecture.
Chapter 6 Research Methodology “In the fields of observation chance favors only the prepared mind.” - Louis Pasteur, lecture.
6.1 Scientific Method
101
A secondary goal of this thesis is to evaluate the proposed approach. To do this research methodology must first be discussed. A research methodology is a strategy of inquiry which moves from the underlying philosophical assumptions of the researcher through empirical research design and data collection (Myers, 1997). Science offers the modern researcher an arsenal of methodologies that may be used when evaluating a hypothesis. It is the purpose of this chapter to explain existing research methods, relate them to the approach adopted by this thesis and justify this approach to the reader.
6.1
Scientific Method
The foundations of modern science rests upon what is known as “the scientific method.” The purpose of the method is to ensure repeatable, rigorous evaluation of hypotheses across science. Depending on specific context the method will vary, however, across many science disciplines it follows these common steps (O’Callaghan, 2005; Basili, 1996; Perry et al., 1997a): 1. Observation In this initial step a scientist observes some interesting property of the world he wishes to investigate. This guides the formation of subsequent hypotheses and experimentation. 2. Hypothesis Based on these observations a scientist will form a hypothesis. The hypothesis aims to explain the observations made. 3. Evaluation Next an evaluation of the hypothesis is designed and implemented. Special care is needed when designing the evaluation. 4. Collection and Interpretation of data Once the evaluation is performed the gathered data must be scrutinised and understood. The degree of certainty with which we can make statements regarding the data is known as validity (O’Brien et al., 2005).
6.2 Validity
102
5. Conclusion Having assessed the validity and interpreted the data it is now possible to draw conclusions with respect to the hypothesis. The conclusions may support or refute the hypothesis. Sometimes it may not be possible to draw a conclusions from an evaluation and the results may be deemed inconclusive. 6. Relating the conclusion to existing knowledge In order to attain a greater understanding of the meaning of conclusions drawn, the hypothesis, the experiments and the results should be positioned within existing literature. 7. Reporting and publishing results This final step ensures that the knowledge gained is not lost. The publication of results will allow other researchers to confirm claims made and more importantly to build further hypotheses that extend the work.
6.2
Validity
All research is based on some underlying assumptions about what constitutes valid research evaluation (step 3 of scientific method) and which research methods are appropriate (Myers, 1997). Validity refers to the meaning of research results (O’Brien et al., 2005) and the degree to which one can make statements about those results from within the researchers adopted research method. Thus it specifically concerns steps 4 and 5 of the scientific method presented at the beginning of the chapter. Validity is described in three ways (Perry et al., 1997b): External validity is the degree to which the conclusions of a study are applicable to the wider population. The larger and more representative the sample population used, the more applicable the results will be. Internal validity is the certainty with which we can say that the known independent variables in the study are the only causes of what was observed in the depen-
6.3 Quantitative and Qualitative Research Methods
103
dant variables. Internal validity can be maintained by producing many streams of complimentary evidence (Kitchenham et al., 2005) that support the hypothesis being researched. Construct validity refers to the degree to which the structure of the experiment affords the measurement of what the experimenter set out to measure. For example, the experiment may be valid, but the variable under scrutiny may not describe what is proposed in the hypothesis. These descriptions of validity are often in conflict and difficult to balance. For example, to maintain a high level of external validity we may decide to perform studies of industrial programmers in their workplace as opposed to the laboratory. This makes the population more representative and is known as ecological validity (Kellogg, 2003; Perry et al., 1997b). However, in doing so, control over the variables of the experiment may not be possible, thus affecting internal validity.
6.3
Quantitative and Qualitative Research Methods
Once the researcher has decided upon a guiding philosophy for his research, he must then choose from the array of research methods, ones that will be appropriate in evaluating his hypothesis. The types of research method can be categorised in many ways, however the most common distinction that is made is between qualitative and quantitative research methods (Myers, 1997). Quantitative methods imply the ability to numerically measure facets of an experimental setting and includes methods such as: • Surveys - A written or oral survey of questions can be presented to a population and statistical results inferred from the answers (Yip, 1995).
6.3 Quantitative and Qualitative Research Methods
104
• Laboratory experiments - using a laboratory experiment the researcher can control independent variables that affect the object of the hypothesis under scrutiny. Usually a single variable can be adjusted by the researcher and the resulting effect it has can be observed. If the outcome is in line with the prediction of the hypothesis then one can say that the experiment has produced evidence in support of the hypothesis. Evaluating the outcome of such experiments is often associated with statistical hypothesis testing approaches (Carew et al., 2005). • Formal methods - An example of a formal method would be econometrics, which is a combination of mathematical economics, statistics, economic statistics and economic theory (Myers, 1997). Qualitative methods produce data of a textual nature, as opposed to the numerical output of quantitative methods. Qualitative methods can use a range of types of qualitative data1 to evaluate the hypothesis under scrutiny, such as data produced by: • Action research - This type of research is aimed at examining hypotheses that can be applied directly to an industrial setting and their benefit assessed. This is not to be confused with applied science. In the case of action research there is a real contribution back to the scientific community as well as industry, as a result of the application of the hypothesis (Myers, 1997). • Case Study - This type of method is an empirical enquiry that investigates ones hypothesis in a real-life context, known as in-vivo (Basili, 1996). However, the boundaries between what is under evaluation and the context are not necessarily clear (Myers, 1997). Often with a case study there is only one, or a few data points (participants from a population). Therefore the data is not suited to a statistical evaluation. A richer insight into the context is achieved through qualitative data capture. 1
Often produced by technique that can also generate qualitative data.
6.3 Quantitative and Qualitative Research Methods
105
• Ethnography - In an ethnographic study the researcher immerses himself in the context of the hypothesis under study. This is often very time consuming, however it also provides a rich data set. A typical ethnographic study in an IT organisation may involve spending several months working as part of a software development team (Myers, 1997). • Grounded theory - This research method suggests that a hypothesis to explain a certain phenomenon can emerge from an analysis of the gathered data rather than an a-priori hypothesis that guides the formation of the data gathering (Myers, 1997). Data used to realise these qualitative research methods can be gathered from various data sources including: • Observation - The participant or object is simply observed with no interference apart from the study set up. • Interviews - The participant is prompted or questioned to express their views and answers to various topics of relevance to the study. • Questionnaires - These are similar to the quantitative surveys mentioned above. However using qualitative methods the questionnaire can also include essay style answers allowing the participant to express his or her opinion. • Documents or texts - Documentation, emails, letters, memos, faxes, dictations and diaries can all be used as valid data sources. • Researchers impressions - The researcher himself may draw conclusions from observation during the study before analysis. • Think-aloud - The think-aloud method, pioneered by Erisson and Simon during the 1980’s (Ericsson and Simon, 1993), is implemented by having the participant
6.4 The Culture of Research Evaluation in Computer Science
106
of a study speak his thoughts out loud while performing the tasks of the experiment. Think-aloud is known to provide the richest insight into a persons mental state at a given moment in time (Russo et al., 1989) when carried out in line with Ericsson and Simon’s best practice guidelines (Ericsson and Simon, 1993).
6.4
The Culture of Research Evaluation in Computer Science
Thus far research methods in general have been discussed. While generally applicable to the research question of this thesis, the more specific research culture of computer science must also be considered when arguing for a chosen set of research methods. Existing reviews highlight a severe lack of research evaluation of any philosophical tradition in computer science (Glass et al., 2002; Segal et al., 2005). Only 14% of research papers surveyed in (Glass et al., 2002) were found to be evaluative. Even in a journal such as Empirical Software Engineering, whose focus is intended to be that of empirical studies, it was found that between the years of 1997 and 2003 only 53% of the papers within it were evaluative (Segal et al., 2005). Of the evaluative papers a hypothesis testing-based, quantitative approach dominated the evaluations. Furthermore, the evaluations tended to be laboratory-based, did not refer to other scientific disciplines and were not people focussed.
6.4.1
Arguing for Hybrid Approaches to Research in Computer Science
The culture of research in computer science, as highlighted in the previous section and by Basili in (Basili, 1996), indicate that computer science is an emerging discipline with an immature research model. Other, more established disciplines have seen a
6.4 The Culture of Research Evaluation in Computer Science
107
research scenario emerge where the research community divides into two groups - theorists and practitioners. In physics, for example, theoretical physicists create mathematical models of the universe, while experimental physicists test these models. Likewise in medicine, theorists and practitioners of their emerging science exist. However, their fundamental difference to computer science is that the essence of what they are studying is unchanging - The nature of the universe will always be examined by physicists and medical researchers will always be concerned with the human species. Computer scientists, on the other hand, not only attempt to improve the process that operates on an artifact in question, but the artifact itself can also be improved. Thus, in computer science the model of evaluation must be cognizant of both the process and the product (Basili, 1996). The closest scientific analogy to this scenario can be found in the manufacturing domain, where research is undertaken to improve the processes for producing products. However, similar to computer science, the product itself can also be improved. Therefore the role of the researcher in computer science is to understand the evolving nature of processes and products and the relationship between them (Basili, 1996). Moreover, in evaluating technologies or techniques that aim to improve software development, the human will always be a key element in its operation, and therefore its evaluation. This complicates experimentation since different results will be obtained, depending upon the people involved (Basili, 1996). Research in cognitive sciences have developed a long established evaluation approach called the socio-cultural perspective that suggests that to evaluate a hypothesis involving people, studies should be undertaken using real activities, in real situations, in their natural environment (O’Brien et al., 2005). Advocates of the approach argue that the richness of context of such a setting cannot be replicated by any feasible laboratory controlled evaluation. From a computer science perspective this translates to the suggestion that all models created by computer science theorists should eventually be evaluated by computer science practi-
6.4 The Culture of Research Evaluation in Computer Science
108
tioners in software laboratories where real, commercial software is actually being developed (known as in-vivo) (Harrison, 2006). As it currently stands, however, there is a gross imbalance between the body of theoretical models produced by computer science theorists and a corresponding body of work that evaluates these models, in favour of the former (Buckley, 2002; Basili, 1996). Proponents of purist positivist research philosophies often cite that in-vivo evaluation is not repeatable, therefore the results cannot be corroborated. Segal provides a retort that best counters this standpoint, “An argument is often made against field studies is that they cannot be replicated - but neither can a software software engineering activity in the real world (one cannot dip one’s toes in the same river twice!). Validation of the study cannot be based on the replication of the study but on the replication of the interpretation: the question to ask is, would other researchers from the same scientific cultural tradition as the original researcher(s) and given the same data, come to the same conclusions?” It should be noted that performing experiments in-vivo does not preclude the gathering of quantitative data. However, the degree to which once can draw conclusions from quantitative data gathered in an in-vivo experiment can be limited, since many subtle, immeasurable factors may be occurring external to those measurable factors. Such is the nature of human-based evaluations. Therefore, there is a convincing need for the use of qualitative data sources. Take, for example, in a hypothetical evaluation where the performance of a programmer using a tool is investigated and where the user unexpectedly underperforms. Quantitative measures of time and productivity gathered will measure this underperformance and results can be reported on this data. However, later upon the gathering of data using a qualitative data source, such as an interview, we find that that particular
6.5 A Research Model for This Thesis
109
participant had a headache that day, impeding his performance, thus highlighting that his underperformance had little to do with the process under evaluation. Nor was it an accurate reflection of the average person using the process under evaluation. Therefore, vital information would have been overlooked had both qualitative and quantitative methods been used. Also notice how both the interpretivist and positivist philosophies are complementing one another in this instance. Furthermore, the key to a compelling evaluation is to provide a convincing argument in favour of ones hypothesis. To this end, mounting evidence should ideally be provided by many streams (Kitchenham et al., 2005). For example, a combination of quantitative measures, qualitative data sources, assessing both the product and process would evaluate a hypothesis from many angles. This is known as triangulation and is a means of creating a large body of evidence in support of ones hypothesis while also appeasing a range of research philosophies (Myers, 1997).
6.5
A Research Model for This Thesis
Deciding upon a research methodology depends primarily upon the research objectives. This thesis is attempting to perform an initial evaluation of a repeatable process for component encapsulation, that is useful to software engineers and is industrially applicable. These objectives immediately highlight certain requirements when choosing an appropriate methodology for the thesis: • Industrial applicability requires an ecologically valid setting for evaluation. • Investigating usefulness of the component encapsulation approach to programmers requires methods that can reveal the full complexity of human-computer interaction. Again, an ecologically valid setting would be advisable, however it also presents strong motivation for the usage of qualitative methods of evaluation.
6.5 A Research Model for This Thesis
110
• The outcome of the component encapsulation process is a software artifact, which is quantifiable. Therefore, quantitative measures in the form of software metrics would seem appropriate in this case when assessing the product. Complementary qualitative measures should also be used to buttress the findings. An important observation on the requirements cited is that the required research methods do not fall under a single research philosophy or method grouping. Immediately we see the opportunity for of triangulation, that was discussed in the previous section.
6.5.1
Empirical Techniques Employed
Some quantitative measures are used in an attempt to provide an objective evaluation of the product of our process: • Software metrics used to assess the product of the process. • Project data, such as length of time for a project or lines of code. Assessing the process for its usefulness requires more intricate methods that can examine the full richness of complexity of context and human-computer interaction. The available qualitative methods can account for this complexity. Importantly the evaluation is carried out in-vivo (Basili, 1996), helping to preserve a high level of ecological validity (O’Brien et al., 2005). This in-vivo evaluation takes the form of several case studies. Qualitative data sources used in the evaluation include: • Observation: The participant will be observed and video recorded. The result of this form of observation can then be analysed to gain insight in to the process. • Diaries: The participant will produce a diary of his experience of the process.
6.5 A Research Model for This Thesis
111
• Note-taking: During any point of the case study, interesting information with respect to the study that becomes apparent can be recorded in the form of notes taken be the investigator. • Think-aloud: During the participant’s actuation of the process the participant will be encouraged to speak his thoughts out loud. This data, it is expected, will provide a deep insight into the mental state and impressions of the participant during the process. • Interviews: After the process has taken place, interviews will be used to further assess the process and to also assess the product components produced as a result of the process. • Project documents: Existing documentation with respect to the subject system can be used to further help the assessment of both the process and product of the process. From these streams of evidence a strong triangulation of evidence is built. Both the process and the product are evaluated using several research methods from both the quantitative and qualitative categories, with think-aloud data being the most used stream of qualitative evidence. Several actions have been taken to raise the validity of the studies: • All of the studies performed in the thesis are designed to have high external, ecological validity. This is due to the in-vivo nature of all the evaluations. • A high level of internal validity is maintained by creating several streams of evidence through triangulation. • Construct validity is kept to a high level by clearly identifying the attributes of the process and the product that lead to quality components. This has been discussed during the earlier literature review.
6.5 A Research Model for This Thesis
112
The appropriateness of these measures has also been confirmed in a pilot study undertaken on Reconn-exion, found in (Le Gear and Buckley, 2005a).
Part II “Component Reconn-exion”: Reengineering Towards Components Using Variations on Reconnaissance and Reflexion
Chapter 7 Reconn-exion “Live out of your imagination, not your history.” - Stephen Covey.
7.1 A Conjecture for Prompting Component Abstractions
115
In the previous two chapters an in-depth review of Software Reconnaissance and Reflexion Modelling was provided. Extensions on both these techniques are used and combined in this chapter to form a process called “Reconn-exion.” This, allied with an evaluation of this process, are the core contributions of this thesis.
7.1
A Conjecture for Prompting Component Abstractions
7.1.1
A New Reuse Perspective Derived from Software Reconnaissance
In section 2.5.2 the reusability of components was highlighted as being an important quality attribute. Previously in section 3.3 several types of software reuse internal to a system were defined along with techniques for identifying it. In this section we define a new type of software reuse and a means of identifying it. Software Reconnaissance (chapter 4), showed how source code responsible for implementing observable behavior of a system could be identified by gathering execution profiles. However, another facet of the functionality view provided by Software Reconnaissance is the set of shared software elements. The set of shared software elements contains software elements that are neither unique to the functionality in question nor utility code common to all features. Rather the software elements are shared across some features but not all. This is calculated as SHARED(f ) = IIELEMS(f ) - UELEMS(f ) - CELEMS The set contains software elements that are indispensably involved except for software elements that are unique to that feature or common software elements. The set of shared software elements gives a snapshot of the software elements being reused by the
7.1 A Conjecture for Prompting Component Abstractions
116
features of running system, from the context of feature f. Though, difficult to visualise graphically, figure 7.1 is provided to facilitate your understanding of the SHARED set. In Norman Wilde’s seminal paper on Software Reconnaissance he remarks on the potential worth of the source code shared across features exhibited by a system, but never investigates it further (Wilde and Scully, 1995). We can extend this view by combining the sets of shared software elements for all features in the feature set. This gives a reuse view for the entire domain(s) or feature class(es) profiled and is calculated as:
n [
SHARED(fk )
k=1
Where n is the number of features in a particular feature set. A this point we no longer think of features and source code elements shared by features. The view produced should be simply considered as another reuse view giving an interesting perspective on the software system. We call this view a feature-based reuse perspective of a software system or the reuse perspective for short. In particular this view should contain software elements that are generic, reused and architecturally core in the system. By reused we mean that the software element is used more than one within the context of the features under examination in the reuse perspective. By generic we mean that the potential exists for the elements to be reused in a wider context (i.e. reusable). Finally, by architecturally core, we are again alluding to the idea that the software elements are some what generic. Such architecturally core source code may be generic boilerplate or framework code. The truth of our conjectures on the contents of the reuse perspective will be examined during the evaluation in this thesis and is one of it’s core contributions. Our hypothesis is that this shared code, that is reused in implementing more than one feature, is a useful starting point that warrants further investigation when identify-
7.1 A Conjecture for Prompting Component Abstractions
Figure 7.1: The three sets used to form the shared set.
117
7.2 A Hypothesis for Encapsulating Components Using Reflexion Modelling
118
ing code that is reused, reusable, or forms part of architecturally core components of that system. This set is called the ‘reuse perspective’ and is a core contribution of this thesis.
7.2
A Hypothesis for Encapsulating Components Using Reflexion Modelling
Another component quality attribute identified in chapter 2 was the replacibility quality attribute. This should equally be addressed in a component recovery process just as the reusability quality attribute was addressed in the previous section. Due to the successes of Reflexion Modelling as a means of partitioning a system into higher level abstractions it is one of the conjectures of this thesis that Reflexion Modelling could also be adapted to unambiguously encapsulate components of existing systems and aid in the definition of their interfaces, thus supporting the important replacibility quality attribute necessary for a good component. We propose the following guidelines for Reflexion Modelling specifically for the encapsulation of components: 1. The programmer creates a high-level model that contains only two nodes: • A high-level node representing a first attempt at the component he wishes to encapsulate • A second high-level node that represents the remainder of the system (see figure 7.2). 2. The programmer maps the appropriate elements of the software system to the nodes in the high-level model 3. The user then iterates through the traditional Reflexion Modelling process, allowing the tool to build the reflexion model and the programmer to study the edges
7.2 A Hypothesis for Encapsulating Components Using Reflexion Modelling
119
between the nodes for expected and unexpected dependencies. Refinements at this stage are limited to only changing the map, not the high-level model. This process iterates until he is satisfied that the component has been encapsulated. This, we suggest, explicitly identifies the interface of the component. This is illustrated by figure 7.2. 4. The programmer then proceeds to divide up the rest of the system into its major constituent parts by first altering the high-level model and then altering the mappings of the map appropriately. The division of the remainder of the system is usually guided by the programmers domain knowledge of the major services provided by the system. This is shown in figure 7.3. 5. Again, the programmer will continue with several iterations until he is satisfied with the new breakdown of the system. 6. The model will now potentially show the dependencies that the component displays with several parts of the system. We suggest that this identifies the roles the component plays in the system. Of course, the potential for variation upon these guidelines does exist. Some expected variations include: • A user of the process may decide to define a sub architecture for the component that he is encapsulating. • A user may not be able to fully clarify the contents of his component until he has begun to break down the remainder of the system. • A component with a single interface may be encapsulated. It will, at most, have one role in the system.
7.2 A Hypothesis for Encapsulating Components Using Reflexion Modelling
120
Encapsulated Component Interface of component with system
Remainder of System
Figure 7.2: Encapsulating a component and making its interface explicit.
7.3 Hypothesising a Process For Component Encapsulation
Desired Component
Part 1 of remainder of system
Part 2 of remainder of system
121
Multiple Interfaces of the component
Part 3 of remainder of system
Figure 7.3: Identifying multiple interfaces on a component using Reflexion Modelling.
7.3
Hypothesising a Process For Component Encapsulation
Two novel approaches, derived from Software Reconnaissance and Reflexion Modelling and designed to aid in the identification and encapsulation of components in software systems have been defined by the two previous sections. This section brings these together as a new process for component recovery called Reconn-exion. Two motivations drive the decision to combine the reuse perspective and the variation on Reflexion Modelling together as a single, aggregated process for component recovery and they are: • The reuse perspective, generated through the proposed adaptation of Software Reconnaissance, may be a useful means of narrowing the search for reusable, generic and core components of a system. However, it lacks the means of follow-
7.3 Hypothesising a Process For Component Encapsulation
122
ing through and allowing the user to explicitly encapsulate them. Reflexion-based techniques, however, have a proven track record of being able to clearly identify the boundaries and contents of components in existing systems. Given this observation, both the reuse perspective and the variation on Reflexion Modelling appear complimentary to one another. • Likewise, Reflexion Modelling has the potential to make components that are recovered more replaceable. However, the identification of clusters, forming the map in a normal Reflexion model is most often based upon naming conventions in the source code (Kosche and Daniel, 2003; Christl et al., 2005). Existing tools explicitly support this by allowing the user to define regular expressions that encompass groupings of software elements with a specified naming convention (Murphy, 2003). Naming conventions have been shown as an excellent means of aiding comprehension and are pervasive throughout industry (Refl, 2005). However, solely relying upon the naming conventions of a system does create a single point of failure for the reflexion modelling technique, hence the variation on it proposed by this thesis and, indeed, by other authors (Christl et al., 2005; Hassan and Holt, 2004). Any means of reducing the dependence upon naming conventions may be a benefit. The reuse perspective is one potential means of reducing this dependence since it is generated by analysing the behavior of the system. The proposed process, illustrated by figure 7.4 will consist of the following steps: 1. The proposed adaptation of Software Reconnaissance is performed on the subject system automatically producing the reuse perspective as output. 2. Then the participant is presented with the reuse perspective of the subject system. He uses this, combined with the existing naming convention approach, to prompt initial mappings for possible component abstractions in the system.
7.3 Hypothesising a Process For Component Encapsulation
123
Execution Profiles 2. The user browses the Reuse Perspective 1.
3. A component to recover is chosen
Reuse Perspective Software Reconnaissance Tool
4. Based on the analysis of the resue perspective and initial Reflexion Model, using the adapted appraoch, is created.
Reflexion Modelling Adaptation High−Level Model
Map
5. The Process is repeated until the component is encapsulated.
Figure 7.4: The Component Reconn-exion process.
3. From these component abstractions the participant chooses a component of interest that he wishes to recover. 4. The participant can then create his initial Reflexion model, as prescribed by this thesis’ proposed variation, with the map being prompted by the examination of the reuse perspective. 5. Further iterations in the variation on Reflexion Modelling are undertaken. The participant is free to refer to the reuse perspective at any stage. This continues until the component is encapsulated. This integrated process for component recovery is called “Component Reconnexion” and is another core contribution of this thesis.
7.4 A Small Example
124
Table 7.1: Features identified for the house application. No. Feature 1 Start and stop the application. 2 Rotate house. 3 Translate house. 4 Scale house. 5 Change house colour.
7.4
A Small Example
This section introduces a small example to demonstrate the Reconn-exion process.
7.4.1
The House Application
A small house application is used as the example application. The application was an undergraduate, third year graphics project, created by the author. Figure 7.5 is a screen shot from the application. The purpose of it is to allow the user to manipulate the drawing of the house on the screen. The user can scale the size of the house, change its colour, rotate it and translate it to a different position. The application is approximately 1000 LOC in size, contained in one file and written in Java.
7.4.2
Part 1: A Reuse Perspective
The first part of Reconn-exion requires that a reuse perspective of the system be generated. This involves: • Identifying the features of the application. • Exercising appropriate test cases to exhibit and profile the named features. • Examining the profiles to calculate the reuse perspective.
7.4 A Small Example
Figure 7.5: A screenshot of the house application.
125
7.4 A Small Example
126
The features that were identified by the author in the house application are listed in table 7.1. Next test cases were designed and executed for each feature and their execution was profiled. In this example only one test case is created for each feature, however, one could potentially execute many test cases for a feature, with small profiling differences. The profiles for each of the test cases exercised the following procedures:
• Test case 1, exhibits feature 1 (Start and Stop the application) and exercises: – AssignOne9921966.main – House_GUI.House_GUI – House_Transforms1.Bezier_Walls – House_Transforms1.Draw_House – House_Transforms1.Line_Clipped_House – House_Transforms1.Setup – House_Transforms1.bezier – House_Transforms1.clipcode – House_Transforms1.clipline – House_Transforms1.drawLine_Level1 – House_Transforms1.drawLine_Level2 – House_Transforms1.middle – House_Transforms1.paint • Test case 2, exhibits feature 2 (Rotate house) and exercises: – AssignOne9921966.main – House_GUI.House_GUI – House_Transforms1.Bezier_Walls – House_Transforms1.Draw_House – House_Transforms1.Line_Clipped_House – House_Transforms1.Rotate_House – House_Transforms1.Setup – House_Transforms1.Translate_House – House_Transforms1.bezier – House_Transforms1.clipcode – House_Transforms1.clipline – House_Transforms1.drawLine_Level1 – House_Transforms1.drawLine_Level2 – House_Transforms1.matrix_multiplication – House_Transforms1.middle – House_Transforms1.paint • Test case 3, exhibits feature 3 (Translate house) and exercises: – AssignOne9921966.main – House_GUI.House_GUI – House_Transforms1.Bezier_Walls – House_Transforms1.Draw_House – House_Transforms1.Line_Clipped_House
7.4 A Small Example
127
– House_Transforms1.Setup – House_Transforms1.Translate_House – House_Transforms1.bezier – House_Transforms1.clipcode – House_Transforms1.clipline – House_Transforms1.drawLine_Level1 – House_Transforms1.drawLine_Level2 – House_Transforms1.get_firatX – House_Transforms1.get_firstY – House_Transforms1.matrix_multiplication – House_Transforms1.middle – House_Transforms1.paint • Test case 4, exhibits feature 4 (Scale house) and exercises: – AssignOne9921966.main – House_GUI.House_GUI – House_Transforms1.Bezier_Walls – House_Transforms1.Draw_House – House_Transforms1.Line_Clipped_House – House_Transforms1.Scale_House – House_Transforms1.Setup – House_Transforms1.Translate_House – House_Transforms1.bezier – House_Transforms1.clipcode – House_Transforms1.clipline – House_Transforms1.drawLine_Level1 – House_Transforms1.drawLine_Level2 – House_Transforms1.matrix_multiplication – House_Transforms1.middle – House_Transforms1.paint • Test case 5, exhibits feature 5 (Change house colour) and exercises: – AssignOne9921966.main – House_GUI.House_GUI – House_Transforms1.Bezier_Walls – House_Transforms1.Draw_House – House_Transforms1.Line_Clipped_House – House_Transforms1.Setup – House_Transforms1.bezier – House_Transforms1.change_color – House_Transforms1.clipcode – House_Transforms1.clipline – House_Transforms1.drawLine_Level1 – House_Transforms1.drawLine_Level2 – House_Transforms1.middle – House_Transforms1.paint
Given the profiles listed the reuse perspective can be calculated from the shared sets, as described in section 7.1. Two elements comprise the resulting reuse perspective for the house application:
7.4 A Small Example
128
House_Transforms1.Translate_House : This method is a generic method that can be used to translate coordinates to different positions. It is directly involved in implementing the translate house feature (no. 3), and indirectly involved in the implementation of the rotate and scale house features, making it a generic component, core to this application. House_Transforms1.matrix_multiplication : All the features of the house application that involve graphical transforms (no.’s 2,3,4), are implemented, in part, using matrix multiplication. However, the use of the method is not necessarily confined to use with graphical transforms and could potentially be used to multiply any two matrices. Again this makes the method reusable within the application but also reusable in many graphical or maths based applications. The size of the reuse perspective is small because the set of common software elements is large. As predicted, the elements of the reuse perspective seem architecturally core, generic and reused. Steps one and two of the Reconn-exion process states that the reuse perspective should be presented to a software engineer, where he can examine and use portions or all of it to help him begin the component recovery process. This example provides an excellent starting point to begin recovering a “transforms” component, that encapsulates generic graphical transformation routines, from the house application. Both elements of the reuse perspective, in this case, will form part of the new component. Given larger applications, with larger reuse perspectives, it may be possible to notice several potential components within the reuse perspective.
7.4.3
Part 2: Encapsulating with Reflexion
Now that we have chosen the component we wish to encapsulate we must describe the component using our adaptation of Reflexion Modelling. We begin by creating a high level model of the application consisting of a transforms component node and a remain-
7.4 A Small Example
129
Figure 7.6: First house application Reflexion model. Figure 7.7: First house application Reflexion model map.
der of system component node. For our initial map we map the elements prompted to us by the reuse perspective to the transforms component node and the remainder of the elements to the rest of system component node. This is illustrated by figures 7.6 and 7.7. However, the map prompted from the reuse perspective is very much a seed set with regard to mappings. From this initial reflexion model, prompted by the reuse perspective, we can now attempt the encapsulation of our component. By examining the edges between the transforms component and the rest of the system, further software elements that should be mapped to the transforms component were identified (see figure 7.6). The second reflexion model produced took further elements into the mapped
7.4 A Small Example
130
Figure 7.8: Second house application Reflexion model.
definition of the transforms component. This is shown in figures 7.8 and 7.9. In this case it is interesting to note the seemingly high degree of two way coupling between the transforms component and the rest of the system. On further investigation it was found that these dependencies were solely on the java framework and could later be ignored. This is the reason that we were later able to remove this link. The edges entering and leaving the transforms component of the reflexion model in figure 7.9 represent the interface that that component provides to and requires of the rest of the system (an investigation of the edge with the jRMTool will provide a list of the software elements involved in that edge). However, we can gain better insight into the different roles that that component plays in the system by investigating in further detail the transforms component’s interaction with other components of the system. For this reason we now begin to divide up the remainder of the system in an effort to reveal multiple interfaces of our component. We decide that the rest of system component can be divided into “GUI,” a component that handles the display of graphics, and “Main,” which is the main driver of the application. This resembles a model-view-controller
7.4 A Small Example