Software Defect Prediction Based on GUHA Data Mining Procedure and Multi-Objective Pareto Efficient Rule Selection

Author(s):  
Bharavi Mishra ◽  
K.K. Shukla

Software defect prediction, if is effective, enables the developers to distribute their testing efforts efficiently and let them focus on defect prone modules. It would be very resource consuming to test all the modules while the defect lies in fraction of modules. Information about fault-proneness of classes and methods can be used to develop new strategies which can help mitigate the overall development cost and increase the customer satisfaction. Several machine learning strategies have been used in recent past to identify defective modules. These models are built using publicly available historical software defect data sets. Most of the proposed techniques are not able to deal with the class imbalance problem efficiently. Therefore, it is necessary to develop a prediction model which consists of small simple and comprehensible rules. Considering these facts, in this paper, the authors propose a novel defect prediction approach named GUHA based Classification Association Rule Mining algorithm (G-CARM) where “GUHA” stands for General Unary Hypothesis Automaton. G-CARM approach is primarily based on Classification Association Rule Mining, and deploys a two stage process involving attribute discretization, and rule generation using GUHA. GUHA is oldest yet very powerful method of pattern mining. The basic idea of GUHA procedure is to mine the interesting attribute patterns that indicate defect proneness. The new method has been compared against five other models reported in recent literature viz. Naive Bayes, Support Vector Machine, RIPPER, J48 and Nearest Neighbour classifier by using several measures, including AUC and probability of detection. The experimental results indicate that the prediction performance of G-CARM approach is better than other prediction approaches. The authors' approach achieved 76% mean recall and 83% mean precision for defective modules and 93% mean recall and 83% mean precision for non-defective modules on CM1, KC1, KC2 and Eclipse data sets. Further defect rule generation process often generates a large number of rules which require considerable efforts while using these rules as a defect predictor, hence, a rule sub-set selection process is also proposed to select best set of rules according to the requirements. Evolution criteria for defect prediction like sensitivity, specificity, precision often compete against each other. It is therefore, important to use multi-objective optimization algorithms for selecting prediction rules. In this paper the authors report prediction rules that are Pareto efficient in the sense that no further improvements in the rule set is possible without sacrificing some performance criteria. Non-Dominated Sorting Genetic Algorithm has been used to find Pareto front and defect prediction rules.

2014 ◽  
Vol 264 ◽  
pp. 260-278 ◽  
Author(s):  
Gabriela Czibula ◽  
Zsuzsanna Marian ◽  
Istvan Gergely Czibula

Author(s):  
CHING-PAO CHANG

Reducing software defects is an essential activity for Software Process Improvement. The Action-Based Defect Prediction (ABDP) approach fragments the software process into actions, and builds software defect prediction models using data collected from the execution of actions and reported defects. Though the ABDP approach can be applied to predict possible defects in subsequent actions, the efficiency of corrections is dependent on the skill and knowledge of the stakeholders. To address this problem, this study proposes the Action Correction Recommendation (ACR) model to provide recommendations for action correction, using the Negative Association Rule mining technique. In addition to applying the association rule mining technique to build a High Defect Prediction Model (HDPM) to identify high defect action, the ACR builds a Low Defect Prediction Model (LDPM). For a submitted action, each HDPM rule used to predict the action as a high defect action and the LDPM rules are analyzed using negative association rule mining to spot the rule items with different characteristics in HDPM and LDPM rules. This information not only identifies the attributes required for corrections, but also provides a range (or a value) to facilitate the high defect action corrections. This study applies the ACR approach to a business software project to validate the efficiency of the proposed approach. The results show that the recommendations obtained can be applied to decrease software defect removal efforts.


A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis


Author(s):  
LIGUO YU ◽  
STEPHEN R. SCHACH

A software system evolves as changes are made to accommodate new features and repair defects. Software components are frequently interdependent, so changes made to one component can result in changes having to be made to other components to ensure that the system remains consistent; this is called change propagation. Accurate detection of change propagation is essential for software maintenance, which can be aided by accurate prediction of change propagation. In this paper, we study change propagation in three leading open-source software products: Linux, FreeBSD, and Apache HTTP Server. We use association rules-based data-mining techniques to detect change-propagation rules from the product version history. These rules are evaluated with respect to different training data sets and different test data sets. We discuss the applicability of using association-rule mining for change propagation, and several related issues. We find that a challenging issue in association-rule mining, concept drift, exists in software systems. Concept drift complicates the task of change-propagation prediction and requires special approaches, different from currently-used techniques for predicting change propagation.


2013 ◽  
Vol 327 ◽  
pp. 197-200
Author(s):  
Guo Fang Kuang ◽  
Ying Cun Cao

The material is used by humans to manufacture the machines, components, devices and other products of substances. Association rules originated in the field of data mining, people use it to find large amounts of data between itemsets of the association. Apriori is a breadth-first algorithm to obtain the support is greater than the minimum support of frequent itemsets by repeatedly scanning the database. This paper presents the construction of materials science and information model based on association rule mining. Experimental data sets prove that the proposed algorithm is effective and reasonable.


Author(s):  
K.GANESH KUMAR ◽  
H.VIGNESH RAMAMOORTHY ◽  
M.PREM KUMAR ◽  
S. SUDHA

Association rule mining (ARM) discovers correlations between different item sets in a transaction database. It provides important knowledge in business for decision makers. Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data sets from different sites incurs huge network communication costs. In this paper, an improved algorithm based on good performance level for data mining is being proposed. In local sites, it runs the application based on the improved LMatrix algorithm, which is used to calculate local support counts. Local Site also finds a center site to manage every message exchanged to obtain all globally frequent item sets. It also reduces the time of scan of partition database by using LMatrix which increases the performance of the algorithm. Therefore, the research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs, superior running efficiency, and stronger scalability than direct application of a sequential algorithm in distributed databases.


2020 ◽  
Author(s):  
Oguz Celik ◽  
Muruvvet Hasanbasoglu ◽  
Mehmet S. Aktas ◽  
Oya Kalipsiz

Sign in / Sign up

Export Citation Format

Share Document