Software Defect Prediction Based on GUHA Data Mining Procedure and Multi-Objective Pareto Efficient Rule Selection

Software defect prediction, if is effective, enables the developers to distribute their testing efforts efficiently and let them focus on defect prone modules. It would be very resource consuming to test all the modules while the defect lies in fraction of modules. Information about fault-proneness of classes and methods can be used to develop new strategies which can help mitigate the overall development cost and increase the customer satisfaction. Several machine learning strategies have been used in recent past to identify defective modules. These models are built using publicly available historical software defect data sets. Most of the proposed techniques are not able to deal with the class imbalance problem efficiently. Therefore, it is necessary to develop a prediction model which consists of small simple and comprehensible rules. Considering these facts, in this paper, the authors propose a novel defect prediction approach named GUHA based Classification Association Rule Mining algorithm (G-CARM) where “GUHA” stands for General Unary Hypothesis Automaton. G-CARM approach is primarily based on Classification Association Rule Mining, and deploys a two stage process involving attribute discretization, and rule generation using GUHA. GUHA is oldest yet very powerful method of pattern mining. The basic idea of GUHA procedure is to mine the interesting attribute patterns that indicate defect proneness. The new method has been compared against five other models reported in recent literature viz. Naive Bayes, Support Vector Machine, RIPPER, J48 and Nearest Neighbour classifier by using several measures, including AUC and probability of detection. The experimental results indicate that the prediction performance of G-CARM approach is better than other prediction approaches. The authors' approach achieved 76% mean recall and 83% mean precision for defective modules and 93% mean recall and 83% mean precision for non-defective modules on CM1, KC1, KC2 and Eclipse data sets. Further defect rule generation process often generates a large number of rules which require considerable efforts while using these rules as a defect predictor, hence, a rule sub-set selection process is also proposed to select best set of rules according to the requirements. Evolution criteria for defect prediction like sensitivity, specificity, precision often compete against each other. It is therefore, important to use multi-objective optimization algorithms for selecting prediction rules. In this paper the authors report prediction rules that are Pareto efficient in the sense that no further improvements in the rule set is possible without sacrificing some performance criteria. Non-Dominated Sorting Genetic Algorithm has been used to find Pareto front and defect prediction rules.

Download Full-text

A novel software defect prediction based on atomic class-association rule mining

Expert Systems with Applications ◽

10.1016/j.eswa.2018.07.042 ◽

2018 ◽

Vol 114 ◽

pp. 237-254 ◽

Cited By ~ 9

Author(s):

Yuanxun Shao ◽

Bin Liu ◽

Shihai Wang ◽

Guoqi Li

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Rule Mining ◽

Software Defect ◽

Class Association Rule

Download Full-text

Software defect prediction based on correlation weighted class association rule mining

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.105742 ◽

2020 ◽

Vol 196 ◽

pp. 105742 ◽

Cited By ~ 2

Author(s):

Yuanxun Shao ◽

Bin Liu ◽

Shihai Wang ◽

Guoqi Li

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Rule Mining ◽

Software Defect ◽

Class Association Rule

Download Full-text

Software defect prediction using relational association rule mining

Information Sciences ◽

10.1016/j.ins.2013.12.031 ◽

2014 ◽

Vol 264 ◽

pp. 260-278 ◽

Cited By ~ 64

Author(s):

Gabriela Czibula ◽

Zsuzsanna Marian ◽

Istvan Gergely Czibula

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Rule Mining ◽

Software Defect

Download Full-text

INTEGRATING ACTION-BASED DEFECT PREDICTION TO PROVIDE RECOMMENDATIONS FOR DEFECT ACTION CORRECTION

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194013500022 ◽

2013 ◽

Vol 23 (02) ◽

pp. 147-172

Author(s):

CHING-PAO CHANG

Keyword(s):

Prediction Model ◽

Association Rule ◽

Association Rule Mining ◽

Software Process ◽

Negative Association ◽

Defect Prediction ◽

Rule Mining ◽

Mining Technique ◽

Software Defect ◽

Recommendations For Action

Reducing software defects is an essential activity for Software Process Improvement. The Action-Based Defect Prediction (ABDP) approach fragments the software process into actions, and builds software defect prediction models using data collected from the execution of actions and reported defects. Though the ABDP approach can be applied to predict possible defects in subsequent actions, the efficiency of corrections is dependent on the skill and knowledge of the stakeholders. To address this problem, this study proposes the Action Correction Recommendation (ACR) model to provide recommendations for action correction, using the Negative Association Rule mining technique. In addition to applying the association rule mining technique to build a High Defect Prediction Model (HDPM) to identify high defect action, the ACR builds a Low Defect Prediction Model (LDPM). For a submitted action, each HDPM rule used to predict the action as a high defect action and the LDPM rules are analyzed using negative association rule mining to spot the rule items with different characteristics in HDPM and LDPM rules. This information not only identifies the attributes required for corrections, but also provides a range (or a value) to facilitate the high defect action corrections. This study applies the ACR approach to a business software project to validate the efficiency of the proposed approach. The results show that the recommendations obtained can be applied to decrease software defect removal efforts.

Download Full-text

Present State-of-The-Art of Association Rule Mining Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2202.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6398-6405

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

State Of The Art ◽

Synthetic Data ◽

Data Sets ◽

Evolutionary Analysis ◽

Rule Mining ◽

Transaction Database ◽

Mining Algorithms

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis

Download Full-text

Automatic Rule Generation of Fuzzy Systems: A Comparative Assessment on Software Defect Prediction

2018 3rd International Conference on Computer Science and Engineering (UBMK) ◽

10.1109/ubmk.2018.8566479 ◽

2018 ◽

Cited By ~ 1

Author(s):

Begum Mutlu ◽

Ebru A. Sezer ◽

M. Ali Akcayol

Keyword(s):

Fuzzy Systems ◽

Comparative Assessment ◽

Defect Prediction ◽

Software Defect Prediction ◽

Rule Generation ◽

Software Defect

Download Full-text

APPLYING ASSOCIATION MINING TO CHANGE PROPAGATION

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194008004008 ◽

2008 ◽

Vol 18 (08) ◽

pp. 1043-1061 ◽

Cited By ~ 2

Author(s):

LIGUO YU ◽

STEPHEN R. SCHACH

Keyword(s):

Association Rule ◽

Software Maintenance ◽

Association Rule Mining ◽

Concept Drift ◽

Training Data ◽

Software Systems ◽

Association Mining ◽

Change Propagation ◽

Data Sets ◽

Rule Mining

A software system evolves as changes are made to accommodate new features and repair defects. Software components are frequently interdependent, so changes made to one component can result in changes having to be made to other components to ensure that the system remains consistent; this is called change propagation. Accurate detection of change propagation is essential for software maintenance, which can be aided by accurate prediction of change propagation. In this paper, we study change propagation in three leading open-source software products: Linux, FreeBSD, and Apache HTTP Server. We use association rules-based data-mining techniques to detect change-propagation rules from the product version history. These rules are evaluated with respect to different training data sets and different test data sets. We discuss the applicability of using association-rule mining for change propagation, and several related issues. We find that a challenging issue in association-rule mining, concept drift, exists in software systems. Concept drift complicates the task of change-propagation prediction and requires special approaches, different from currently-used techniques for predicting change propagation.

Download Full-text

The Novel Model of Construct Materials Science and Information Based on Association Rule Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.327.197 ◽

2013 ◽

Vol 327 ◽

pp. 197-200

Author(s):

Guo Fang Kuang ◽

Ying Cun Cao

Keyword(s):

Experimental Data ◽

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Materials Science ◽

Frequent Itemsets ◽

Data Sets ◽

The Novel ◽

Rule Mining ◽

Novel Model

The material is used by humans to manufacture the machines, components, devices and other products of substances. Association rules originated in the field of data mining, people use it to find large amounts of data between itemsets of the association. Apriori is a breadth-first algorithm to obtain the support is greater than the minimum support of frequent itemsets by repeatedly scanning the database. This paper presents the construction of materials science and information model based on association rule mining. Experimental data sets prove that the proposed algorithm is effective and reasonable.

Download Full-text

AN OPTIMIZED ARM SCHEME FOR DISTINCT NETWORK DATA SET

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2015.1302 ◽

2015 ◽

pp. 191-195

Author(s):

K.GANESH KUMAR ◽

H.VIGNESH RAMAMOORTHY ◽

M.PREM KUMAR ◽

S. SUDHA

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Distributed Databases ◽

Research Area ◽

Sequential Algorithm ◽

Data Sets ◽

Rule Mining ◽

Data Set ◽

Communication Costs

Association rule mining (ARM) discovers correlations between different item sets in a transaction database. It provides important knowledge in business for decision makers. Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging data sets from different sites incurs huge network communication costs. In this paper, an improved algorithm based on good performance level for data mining is being proposed. In local sites, it runs the application based on the improved LMatrix algorithm, which is used to calculate local support counts. Local Site also finds a center site to manage every message exchanged to obtain all globally frequent item sets. It also reduces the time of scan of partition database by using LMatrix which increases the performance of the algorithm. Therefore, the research is to develop a distributed algorithm for geographically distributed data sets that reduces communication costs, superior running efficiency, and stronger scalability than direct application of a sequential algorithm in distributed databases.

Download Full-text

Association Rule Mining on Big Data Sets

10.5772/intechopen.91478 ◽

2020 ◽

Author(s):

Oguz Celik ◽

Muruvvet Hasanbasoglu ◽

Mehmet S. Aktas ◽

Oya Kalipsiz

Keyword(s):

Big Data ◽

Association Rule ◽

Association Rule Mining ◽

Data Sets ◽

Rule Mining

Download Full-text