Software defect prediction using relational association rule mining

2014 ◽  
Vol 264 ◽  
pp. 260-278 ◽  
Author(s):  
Gabriela Czibula ◽  
Zsuzsanna Marian ◽  
Istvan Gergely Czibula
Author(s):  
Bharavi Mishra ◽  
K.K. Shukla

Software defect prediction, if is effective, enables the developers to distribute their testing efforts efficiently and let them focus on defect prone modules. It would be very resource consuming to test all the modules while the defect lies in fraction of modules. Information about fault-proneness of classes and methods can be used to develop new strategies which can help mitigate the overall development cost and increase the customer satisfaction. Several machine learning strategies have been used in recent past to identify defective modules. These models are built using publicly available historical software defect data sets. Most of the proposed techniques are not able to deal with the class imbalance problem efficiently. Therefore, it is necessary to develop a prediction model which consists of small simple and comprehensible rules. Considering these facts, in this paper, the authors propose a novel defect prediction approach named GUHA based Classification Association Rule Mining algorithm (G-CARM) where “GUHA” stands for General Unary Hypothesis Automaton. G-CARM approach is primarily based on Classification Association Rule Mining, and deploys a two stage process involving attribute discretization, and rule generation using GUHA. GUHA is oldest yet very powerful method of pattern mining. The basic idea of GUHA procedure is to mine the interesting attribute patterns that indicate defect proneness. The new method has been compared against five other models reported in recent literature viz. Naive Bayes, Support Vector Machine, RIPPER, J48 and Nearest Neighbour classifier by using several measures, including AUC and probability of detection. The experimental results indicate that the prediction performance of G-CARM approach is better than other prediction approaches. The authors' approach achieved 76% mean recall and 83% mean precision for defective modules and 93% mean recall and 83% mean precision for non-defective modules on CM1, KC1, KC2 and Eclipse data sets. Further defect rule generation process often generates a large number of rules which require considerable efforts while using these rules as a defect predictor, hence, a rule sub-set selection process is also proposed to select best set of rules according to the requirements. Evolution criteria for defect prediction like sensitivity, specificity, precision often compete against each other. It is therefore, important to use multi-objective optimization algorithms for selecting prediction rules. In this paper the authors report prediction rules that are Pareto efficient in the sense that no further improvements in the rule set is possible without sacrificing some performance criteria. Non-Dominated Sorting Genetic Algorithm has been used to find Pareto front and defect prediction rules.


Author(s):  
CHING-PAO CHANG

Reducing software defects is an essential activity for Software Process Improvement. The Action-Based Defect Prediction (ABDP) approach fragments the software process into actions, and builds software defect prediction models using data collected from the execution of actions and reported defects. Though the ABDP approach can be applied to predict possible defects in subsequent actions, the efficiency of corrections is dependent on the skill and knowledge of the stakeholders. To address this problem, this study proposes the Action Correction Recommendation (ACR) model to provide recommendations for action correction, using the Negative Association Rule mining technique. In addition to applying the association rule mining technique to build a High Defect Prediction Model (HDPM) to identify high defect action, the ACR builds a Low Defect Prediction Model (LDPM). For a submitted action, each HDPM rule used to predict the action as a high defect action and the LDPM rules are analyzed using negative association rule mining to spot the rule items with different characteristics in HDPM and LDPM rules. This information not only identifies the attributes required for corrections, but also provides a range (or a value) to facilitate the high defect action corrections. This study applies the ACR approach to a business software project to validate the efficiency of the proposed approach. The results show that the recommendations obtained can be applied to decrease software defect removal efforts.


2019 ◽  
Vol 5 ◽  
pp. e187 ◽  
Author(s):  
Rainer Niedermayr ◽  
Tobias Röhm ◽  
Stefan Wagner

BackgroundTest resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios.AimsWe take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being “trivial”. We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions.MethodWe compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk (LFR). We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level.ResultsOur results show that inverse defect prediction can identify approx. 32–44% of the methods of a project to have a LFR; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even 11 times less likely to contain a fault.ConclusionsInverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios.


2018 ◽  
Author(s):  
Rainer Niedermayr ◽  
Tobias Röhm ◽  
Stefan Wagner

Background. Test resources are usually limited and therefore it is often not possible to completely test an application before a release. To cope with the problem of scarce resources, development teams can apply defect prediction to identify fault-prone code regions. However, defect prediction tends to low precision in cross-project prediction scenarios. Aims. We take an inverse view on defect prediction and aim to identify methods that can be deferred when testing because they contain hardly any faults due to their code being "trivial". We expect that characteristics of such methods might be project-independent, so that our approach could improve cross-project predictions. Method. We compute code metrics and apply association rule mining to create rules for identifying methods with low fault risk. We conduct an empirical study to assess our approach with six Java open-source projects containing precise fault data at the method level. Results. Our results show that inverse defect prediction can identify approx. 32-44% of the methods of a project to have a low fault risk; on average, they are about six times less likely to contain a fault than other methods. In cross-project predictions with larger, more diversified training sets, identified methods are even eleven times less likely to contain a fault. Conclusions. Inverse defect prediction supports the efficient allocation of test resources by identifying methods that can be treated with less priority in testing activities and is well applicable in cross-project prediction scenarios.


Sign in / Sign up

Export Citation Format

Share Document