On the Use of Relevance Feedback in IR-Based Concept Location
Gregory Gay*, Sonia Haiduc**, Andrian Marcus**, Tim Menzies* * West Virginia University, Morgantown, WV, USA ** Wayne State University, Detroit, MI, USA
Software change
IR-based concept location Query
Ranked list of results
Source code
Challenge: the query
• Text in the query needs to match the text in the source code • Difficult to formulate good queries - unfamiliar source code - unknown target -> hard to describe something that you do not know
Eclipse bug #13926 Bug description: JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display.
Queries • Q1: jface text editor black rectangle insert text • Q2: jface text editor rectangle insert context information • Q3: jface text editor content assist • Q4: jface insert selected completion proposal context information
Queries and results • Q1: jface text editor black rectangle insert text – position of modified method: 7496
• Q2: jface text editor rectangle insert context information – position of modified method: 258
• Q3: jface text editor content assist – position of modified method: 119
• Q4: jface insert selected completion proposal context information – position of modified method: 723
Whole change request: 54
IR CL in unfamiliar software Developers: • Rarely begin with a good query: hard to choose the right words • Analyze very briefly list of results before reformulating query • Even after reformulation, vague idea of what to look for -> queries not always better • Can recognize whether the results retrieved are relevant or not to the problem
Questions • Is there a way to make the query formulation easier on the developers? • Is there a way to ensure that the subsequent queries lead us in the right direction? • Can we do this by following the common practices of the developers? • Can we improve IR-based CL using this approach?
Relevance feedback •
•
Uses developer feedback about relevancy of returned results to automatically reformulate queries Queries are reformulated by: – –
• • •
Adding terms from relevant documents Removing terms from irrelevant documents
Iterative process Common technique in text retrieval Used also in SE
JFace Text Editor Leaves a Black Rectangle on Content Assist text insertion. Inserting a selected completion proposal from the context information popup causes a black rectangle to appear on top of the display.
1. createContextInfoPopup() in org.eclipse.jface.text.contentassist.ContextInformationPopup
2. configure() in org.eclipse.jdt.internal.debug.ui.JDIContentAssistPreference
3. showContextProposals() in org.eclipse.jface.text.contentassist.ContextInformationPopup
+ words in
documents
- words in
New Query
documents
IRRF tool • IR Engine: Lucene – based on the Vector Space Model (VSM) – input: methods, query – output: a ranked list of methods ordered by their textual similarity to the query
• Relevance feedback: Rocchio algorithm – the classic algorithm for RF; used also in SE – models a way of incorporating relevance feedback information into the VSM
Evaluation • Extracted bug descriptions and set of methods modified in the bug fixes from bug tracking systems • Consider bug descriptions as initial queries for IR • Measure #methods investigated until reaching a modified method before and after using RF • Relevance feedback: – one developer provides feedback – feedback round ends after marking N methods as relevant or irrelevant (N = 1, 3 ,5)
Stop criteria • Target method in top N results • More than 50 methods analyzed • Position of target methods in the ranked list of results increases for 2 consecutive rounds -> query moving away from wanted methods
Systems
System Eclipse
Vers. LOC Methods 2.0 2,500,000 74,996
Classes 7,500
jEdit
4.2
300,000
5,366
750
Adempiere
3.1.0
330,000
28,622
1,900
Results System Eclipse
RF improves IR 6
RF does not improve IR 1
jEdit
3
3
Adempiere
4
1
All
13
5
Results • Eclipse: Report #
Baseline
IRRF N=1
IRRF N=3
IRRF N=5
19686
428
453 (5r)
48 (16r)
46 m(9r)
Report #
Baseline
IRRF N=1
IRRF N=3
IRRF N=5
1607211
354
103(5r)
36 (12r)
28 (6r)
Report #
Baseline
IRRF N=1
IRRF N=3
IRRF N=5
1628050
52
3 (3r)
5 (2r)
7 (2r)
• jEdit:
• Adempiere:
Questions – revisited (1) • Is there a way to make the query formulation easier on the developers? – automatic query formulation
• Is there a way to ensure that the subsequent queries lead us in the right direction? – add terms from relevant documents, remove terms from irrelevant documents – stop when we move away from the target (results worsen for 2 consecutive rounds)
Questions – revisited (2) • Can we do this by following the common practices of the developers? – developers still analyze only a few results in the result list before reformulation
• Can we improve IR-based CL using relevance feedback? – in some cases yes
Current and future work • Studies involving more systems and more developers • Automatically calibrating the parameters for a specific system and a specific set of change requests • Study the circumstances when RF does not improve IR
Developers: ⢠Rarely begin with a good query: hard to choose the right words. ⢠Analyze very briefly list of results before reformulating query. ⢠Even after ...
BACKGROUND: A software engineering systematic map is a defined method to ...... Mendes, E. & Travassos, G. H. (2007), 'Cross versus within-company cost.
To the best of our knowledge there is only one clear example of a systematic .... contribution was considered, which for example could be a process, method, tool etc. ..... Visualize Your Data: When counting the frequencies of publications in ...
requirement engineering process in software engineering pdf. requirement engineering process in software engineering pdf. Open. Extract. Open with. Sign In.
Sep 26, 2011 - into an application used by nearly a million people to store over two million code ... âContinuous Integration is a software development practice ...
directed system for software engineering process improvement. Both products are used ... associated with software process improvement; and Software Shock (Dorset House), a treat- ment that focuses on ..... Security Testing 497. 18.6.3 ..... the Unive
Apr 9, 1993 - To Change. Consult. Guru for. Advice. New Req., Bug Fix. âHow does a change in one source code entity propagate to other entities?â No More.
individual components? â How is function or data structure detail separated from .... (1) User interface classes define all abstractions that are necessary for Human ... enables data mining or knowledge discovery that can have an impact on the ...
13.4.7. Data Structure 349. 13.4.8. Software Procedure 351. 13.4.9 ...... (e.g., Resisting the Virtual Life, edited by James Brook and Iain Boal and The Future ..... gan Kaufmann, 2000) suggest that the widespread impact of the PC will decline as.
How is function or data structure detail separated from ... data that are used by the components ..... elements such as data flow diagrams or analysis classes,.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Case Studies in ...
IP telephony runs on top of IP and utilizes the IP service model. It is not about ... resembles Web-hosting in IP world or NetCentrex in PSTN world ... Page 10 ...