An Investigation of the Relationships between Lines of Code and Defects Hongyu Zhang

Tsinghua University Beijing, China 2009.9

Software Metrics 

Science and engineering can be neither effective nor practical without measurement.



We measure software in order to better understand its status and to control its behavior “You cannot control what you cannot measure”

35

250

avg CC

30 25

184

195

259

268

278

300 278

212

141

250 200 150

20

100

15

50

10

0

1999 2000 2001 2002 20032004 2005 2006 2007

max CC

40

avg CC max CC

Metrics-based quality analysis 





It is widely believed that there are relationships between external software characteristics (e.g., quality) and internal product attributes. Discovering such relationships has become one of the objectives of software metrics. Software metrics and software quality:  

Metrics can be used for quality control Metrics can be used for defect prediction

LOC-based quality analysis 

There are many code attributes:   





Complexity metrics (Vg, Number of functions, etc) AST metrics (number of if statements, blocks, etc) …

Many defect prediction models are built on top of these metrics. We investigate the relationship between LOC and defects, and perform defect prediction based on LOC.  

LOC: Lines of Code, the simplest code metric LOC has strong correlations with other code metrics

Dataset 

Eclipse dataset 



Contain measurement and defect data for Eclipse versions 2.0, 2.1 and 3.0. Defect data  





mined from Eclipse’s bug databases and version achieves. pre-release defects (defects reported in the last six months before release) post-release defects (defects reported in the first six months after release).

Measurement data 

contain 198 code metrics , including complexity metrics and AST metrics

Empirical Analysis of LOC 

An empirical analysis of the program LOC and defects (Zhang, APSEC’07) 







We studied sizes of 18 large open source Java systems. A small number of programs are vey large, but a large number of programs are small We find that the distribution of LOC can be formally represented using the lognormal functions We call this phenomenon the small program phenomenon.

The Distribution of LOC in Eclipse/1 • For Eclipse, we rank the programs by their size (from the largest to smallest), and observe the same small program phenomenon.



Most of programs are small. For Eclipse 3.0:  



38.03% of the programs are smaller than 32 LOC 56.42% of the programs are smaller than 64 LOC

Still, a small number of very large programs:  

4.39% of programs are larger than 512 LOC 1.13% of programs are larger than 1024 LOC.

The Distribution of LOC in Eclipse/2 

The lognormal distribution of LOC:  − (ln x − µ ) 2   , f ( x) = exp 2 2σ σ x 2π   1

Eclipse 2.0 2.1 3.0

μ 3.9006 3.9383 3.9006

σ 1.3451 1.3621 1.3744

R2 0.9979 0.9978 0.9976

Se 0.0161 0.0166 0.0171



The lognormal distribution of program sizes reveals the regularity behind software construction.



The skewed distribution of program size also implies that the distribution of defects across programs is skewed.

Correlation between LOC and defects 



We measure the Spearman correlation between LOC and Number of Defects To statistically test if there is relationship between LOC and defects, we make the following hypotheses: H0: there is no relationship between LOC and Number of Defects. H1: there is relationship between LOC and Number of Defects.



The Spearman rank test rejects the null hypothesis and conclude that there is a weak but positive relationship between LOC and defects.  

The spearman correlation is from 0.259 to 0.585. Larger programs tend to have more defects

The Ranking Ability of LOC /1 

Further studies show that a small number of programs account for a large number of defects 





For example, top 10% of the largest program account for about 46% Eclipse 3.0 defects, We could quickly locate a large number of defects by simply ranking the programs by metrics Termed “ranking ability” by Fenton et al. (2000) File Level Top 5%

Pre-release defects

2.0 2.1 3.0

Post-release defects

2.0 2.1 3.0

24.57% 28.82% 32.98% 34.16% 28.09% 29.97%

Top 10%

37.01% 43.46% 46.28% 46.87% 40.52% 44.05%

Top 15%

46.99% 53.97% 55.05% 55.73% 47.72% 52.41%

Top 20%

53.48% 61.01% 62.29% 61.88% 54.31% 60.62%

γ

The Ranking Ability of LOC /2 

The cumulative distribution of defect can be visualized as an “Alberg diagram” like diagram.



The modules are ordered by LOC.

1.2

% Defects

1 0.8 0.6

Actual

0.4

Weibull

0.2 0 0

Package level (pre-release)

0.2

0.4

0.6 % Modules

0.8

1

1.2

Post-release

File level (post-release)

γ

The Ranking Ability of LOC /3 Further analysis shows that the “ranking ability of LOC” can be modeled by a Weibull function  P ( x ) = 1 − exp −   Eclipse

γ

β

R2

 x      γ   β

Se

Pre-release

2.0

0.259 0.897 0.991 0.023

defects

2.1

0.207 0.830 0.995 0.017

3.0

0.193 0.780 0.992 0.019

Post-release

2.0

0.190 0.811 0.986 0.026

defects

2.1

0.242 0.853 0.988 0.026

3.0

0.203 0.827 0.993 0.019

% Defects



(γ > 0, β > 0)

1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Actual Weibull

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 % Programs

Pre-release

The Implications of the Empirical Results 

The regularity of LOC distribution implies that: 







a small percentage of programs have high complexity (small program phenomenon). The small program phenomenon shows that in practices, programmers do not adhere to complexity thresholds strictly. A small percentage of most complex programs are responsible for a large number of defects, while a large number of less complex programs contain a small number of defects. By using LOC, we can quickly locate a large number of defects.

Predicting defect counts /1  To explore the LOC’s ability in defect prediction, we examine the defect density of the top k % largest programs (dd_k %): dd_k % = (the number of defects the top k % largest modules contain) / (the total KLOC of top k % largest modules) * 100% 1.2

% Defects

1 0.8 0.6

Actual

0.4

Weibull

0.2 0 0

0.2

0.4

0.6 % Modules

0.8

1

1.2

Pre-release

14

Predicting defect counts /2  As an example, we calculate the defect density

values from the 10% largest Eclipse 3.0 programs, and then use the obtained values to predict the number of total defects in the system.

Pre-release defects

Post-release defects

2.0

Defect Density 7.22

File Level #Actual defects #Predicted defects 7635 5756

2.1

4.41

4975

4359

12.39%

3.0

5.20

7422

6790

8.51%

2.0

2.03

1692

1615

4.54%

2.1

0.98

1182

966

18.29%

3.0

1.78

2679

2332

12.95%

15

MRE 24.61%

Predicting Defect-Prone Components /1 





Many attributes (such as complexity metrics, AST metrics, etc.) are used in existing defect prediction models. We propose a LOC-based method for predicting defective components. Classification models are built. Technique

Classifier in WEKA

Neural Network

Multilayer Perceptron

Logistic Regression

Logistic

Naive Bayes

NaiveBayes

Decision Tree (C4.5)

J48

K-Star

KStar

Classifying Defective Components /2 

Before training the prediction models, we firstly use logarithmic filter to transform data n into their natural logarithms ln(n) 





the transformation makes the data range narrower and make it easy for classifiers to learn.

We then construct the classification models using LOC data. We use the 10-fold cross-validation to evaluate classification models.

Classifying Defective Components /3 

To evaluate the predication model, we use Recall Precision, F-measure, and Accuracy: TP TP , Pr ecision = Re call = TP + FN TP + FP

2 × Re call × Pr ecision F − measure = Re call + Pr ecision 

TP + TN Acc = TP + TN + FP + FN

The values of Recall, Precision, F-measure and Accuracy are between 0 and 1, the higher the better.

Classifying Defective Components /4 

The cross-validation results for Eclipse 3.0 dataset: Pre-release Classifier

Recall (%)

Multilayer Perceptron Logistic Regression Naive Bayes

85.5%

Decision Tree K-Star

Precisio Fn measure (%) 72.0% 0.78

Post-release Acc (%)

Recall (%)

70.0%

67.7%

Precisio Fn measure (%) 66.5% 0.67

Acc (%) 68.5%

86.7%

72.0%

0.79

70.5%

70.0%

67.6%

0.69

69.9%

89.4%

71.2%

0.79

70.7%

77.0%

65.0%

0.71

69.4%

88.9%

71.7%

0.79

71.0%

71.2%

62.8%

0.67

66.4%

87.5%

71.6%

0.79

70.3%

68.7%

67.0%

0.68

69.1%

Classifying Defective Components /5 

For the Eclipse dataset, all classification models obtain good results: 

For pre-release results:    



Recall values are above 85% Precision values are above 71% F-measures are about 0.79 Acc values are about 70%

For post-release results:   

Recall ranging from 67% to 77% Precision ranging from 63% to 68% F-measure and Acc values are about 70%

Replication Study on NASA dataset 

As a replication study, we experiment with the NASA IV&V Facility Metrics Data Program (MDP) repository. 

 



The data is collected from many NASA projects such as flight control, spacecraft instrument, storage management, and scientific data processing. Developed in C/C++/Java Very different from the Eclipse system.

The NASA datasets contain software measurement data and associated defect data.

Results of Replication Study 

The results of the replication study are confirmative:  









The distribution of LOC and defects are highly skewed. There is a weak but positive relationship between LOC and defects The distribution of defects is Weibull distribution when modules are ranked by LOC (R2 from 0.948 to 0.998) A small percentage of the largest modules (e.g., top 10%) contain a large percentage of defects (e.g., 51% - 100%). We can predict the defect counts based on defect density (e.g., MMRE = 16.94% based on the top 10% largest modules). We can predict defect-prone components based on LOC (with Recall 90.1%, Precision 60.3% and F-measure 0.72)

Conclusion  We investigated the relationship between LOC and  

Defects. Our experiments show that simple static code attributes such as LOC can be useful indicators of software quality. Future work:  Further analysis of LOC-Defect relationship  Cross-project defect prediction  Can we use a model built from one project for a



new project ? Within Company/Cross Company prediction 23

Thank you! Hongyu Zhang Associate Professor School of Software, Tsinghua University Beijing 100084, China   Email:  [email protected]   Web: http://thss.tsinghua.edu.cn/hongyu.htm 

An Investigation of the Relationships between Lines of ...

We measure software in order to better understand its ... the objectives of software metrics. ... For example, top 10% of the largest program account for about.

630KB Sizes 1 Downloads 148 Views

Recommend Documents

Untitled - Angel Between the Lines
You need to call Fran's ... 009_002 Setting: Conference Room/ Office Lobby / Office .... 009_003 SETTING: Conference Room/ Demon Bar (Sunnydale).

Untitled - Angel Between the Lines
CELLPHONE IS ANSWERED.) .... perfume and even cheaper jewelry. ..... can put my escape plan into effect the ..... I understand her kids only want the best.

The relationships between two indices of mandibular ...
Mandibular body BMD was calculated by manual analysis of DXA scans. ... sample of 40 patients, giving data which showed ... (Rs) using SPSS PC+,20.

Relationships between degradability of silk scaffolds ...
May 23, 2010 - impacts the rate of new collagen-extracellular matrix formation by ... collapse, hindering mass transfer and leading to necrosis [19]. If the.

Exploring relationships between learners' affective ...
Stimuli and Software ... When arousal is moderate, valence is expected to be predictive of learning ... Learners' JOLs are typically predictive of overall learning.