Integrating Software Engineering Data Using Semantic Web Technologies Yuan-Fang Li

Hongyu Zhang

Monash University

Tsinghua University,

Melbourne, Australia

Beijing, China

MSR 2011, May 2011

•  A huge amount of SE data have been accumulated over the years…

Bugzilla Mailings Source Code

Requirements

CVS/ SVN

Execution traces Crash

Developer

•  The abundance of SE data enables us to extract useful information from the data in order to better understand manage software engineering activities.

Mining software repository

Logs …

Metrics Customer 2

  Collecting and integrating SE data is a non-trivial task: •  Great variability in the SE data: data may come from different sources, for different purposes, in different formats, languages, etc. •  The data are often disparate and distributed as well.

The lack of an open, commonly-agreed schema hinders the integration of SE data. 3

Semantic Web

 

 

 

Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Empowered by standard technologies such as RDF, SPARQL, OWL, and SKOS. They have been successfully applied in many domains to provide a solution to data integration and knowledge management

Overall Structure

•  Ontology Definition •  Data Collection and Translation •  Data Integration

A Case Study on Eclipse /1  

 

Proof of concept: an initial study of Eclipse 3.0 - a large open source project containing more than 10,000 files. We integrate the following data from the Eclipse 3.0:  

 

 

 

Object-oriented language data:  

Model OO language elements such as class, method, attribute, visibility…

 

Use MOOSE to collect OO language data

Program dependency data  

Model dependable and dependent classes

 

Use Dependency Finder to collect the dependency data

Metrics data  

Model various complexity metrics

 

Use Understand for Java to collect metrics data

Note the data may come from different sources  

A Case Study on Eclipse /2 We developed a program to automatically convert different datasets into RDF triples and store them in a native (on-disk, persistent) Sesame triple store.

Querying the Semantic Repository /1  

Having integrated data, we can then perform queries to understand the software project.  

SPARQL queries can be issued over the integrated RDF dataset

  Query examples:

Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value

Querying the Semantic Repository /1  

Having integrated data, we can then perform queries to understand the software project.  

SPARQL queries can be issued over the integrated RDF dataset

  Query examples:

Find the top 10 Eclipse 3.0 classes that are larger than 500 LOC (lines of code) and have WMC (weighted methods per class) larger than 10, ordered by descending LOC value Find all classes that use the public attribute x defined in class org.eclipse.swt.graphics.Point

Querying the Semantic Repository /2

  More query examples:

Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass.

Querying the Semantic Repository /2

  More query examples:

Find classes in Eclipse 3.0 that depend on package org.apache.tools.ant but not on the org.eclipse.ant.core package, and that have more than one subclass. Find subclasses of org.eclipse.jdt.internal.compiler.ASTVisitor that will be affected if method traverse() in class org.eclipse.jdt.internal.compiler.ast.ASTNode is changed.

•  A huge amount of SE data have been

accumulated. •  We propose to apply Semantic Web techniques to representing and integrating SE data. •  We believe effective representation and integration of SE data can pave the way for more powerful analysis, mining and reasoning. Mailings

Bugzilla

Source Code

CVS/ SVN

Crash

Developer Requirements

Execution traces

Logs …

12

Thank you! Hongyu Zhang School of Software Tsinghua University Beijing 100084, China Email: [email protected] Web: http://info.thss.tsinghua.edu.cn/hongyu

Integrating Software Engineering Data Using Semantic ...

SE data enables us to extract useful information from the data in order to better understand manage software engineering activities. Mining software repository.

4MB Sizes 1 Downloads 172 Views

Recommend Documents

Mining Software Engineering Data
Apr 9, 1993 - To Change. Consult. Guru for. Advice. New Req., Bug Fix. “How does a change in one source code entity propagate to other entities?” No More.

Semantic Web Enabled Software Engineering ...
respectively create the desired statement from a class attribute, and from an association (see Listing 4). 1 rule Museum2Resource {. 2 from m : Museum! ... Listing 3 Generated ATL code excerpt for translating a Museum instance into an RDF ...... such

Approaches to Relating and Integrating Semantic Data ...
+ This work was also partly funded by the Industrial Development Authority. (IDA) Ireland. ... methodology and architecture that will enable application.

Approaches to Relating and Integrating Semantic Data ...
08/SRC/I1403 — Federated, Autonomic Management of End-to-End. Communications .... personnel records) or as a mechanism to link people as acquaintances using the ... BigOWLim, OntoBroker, BaseVISor) and in open source. (e.g., Pellet ..... on Softwar

Mining Software Engineering Data
Apr 9, 1993 - time, change message, co- changing Units ... have to wait for a free page). Auto- ... Use dimensions from a psychometric text analysis tool:.

integrating fuzzy logic in ontologies - Semantic Scholar
application of ontologies. KAON allows ... cycle”, etc. In order to face these problems the proposed ap- ...... porting application development in the semantic web.

A multimedia recommender integrating object ... - Semantic Scholar
28 Jan 2010 - Users of a multimedia browsing and retrieval system should be able to navigate a repository of multimedia ... 1Here we do not consider link-based systems mainly used in WEB search engines. ..... 4We use cookies to track sessions, and do

Distance Education Trends: Integrating new ... - Semantic Scholar
Nevertheless, it is second-generation Web tools that promise to take ... Weblogs are best used as student portfolios that keep record of an individual's progress ...

How Google is using Linked Data Today and ... - Semantic Scholar
3 DERI, NUI Galway IDA Business Park, Lower Dangan Galway, Ireland, ... The Web is the seminal part of the Application Layer of network architectures. Two major trends are currently ... the Social Web (also called Web 2.0). The Web of Data ...

A Generalized Data Detection Scheme Using ... - Semantic Scholar
Oct 18, 2009 - We evaluated the performance of the proposed method by retrieving a real data series from a perpendicular magnetic recording channel, and obtained a bit-error rate of approximately 10 3. For projective geometry–low-density parity-che

Automatic Test Data Generation using Constraint ... - Semantic Scholar
some types of trees (Xie et al., 2009). Therefore, execution paths exploration strategy is the main disadvantage of the POA. The straightforward solution to deal with this problem is by bounding the depth of a execution path or the number of iteratio

Eye Movement Data Modeling Using a Genetic ... - Semantic Scholar
Yun Zhang is with School of Computer Science, Northwestern. Polytechnical University ... Manu. Tech./ Northwestern Polytechnical University, Xi'an, Shaanxi, P.R. .... movements that reach a velocity of over 500 degrees/second. From less ...

Eye Movement Data Modeling Using a Genetic ... - Semantic Scholar
Dagan Feng is also with School of Information Technologies, The. University of .... effective receptor density falls dramatically within one degree of the central ...