Improved point center algorithm for K-Means clustering to increase software defect prediction

The k-means is a clustering algorithm that is often and easy to use. This algorithm is susceptible to randomly chosen centroid points so that it cannot produce optimal results. This research aimed to improve the k-means algorithm’s performance by applying a proposed algorithm called point center. The proposed algorithm overcame the random centroid value in k-means and then applied it to predict software defects modules’ errors. The point center algorithm was proposed to determine the initial centroid value for the k-means algorithm optimization. Then, the selection of X and Y variables determined the cluster center members. The ten datasets were used to perform the testing, of which nine datasets were used for predicting software defects. The proposed center point algorithm showed the lowest errors. It also improved the k-means algorithm’s performance by an average of 12.82% cluster errors in the software compared to the centroid value obtained randomly on the simple k-means algorithm. The findings are beneficial and contribute to developing a clustering model to handle data, such as to predict software defect modules more accurately.

Download Full-text

Software Defect Prediction Using AWEIG+ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalance Problem

International Journal of Information Technology and Business ◽

10.24246/ijiteb.112018.36-41 ◽

2018 ◽

Vol 1 (1) ◽

pp. 36-41

Author(s):

Joko Suntoro ◽

Febrian Wahyu Christanto ◽

Henny Indriyawati

Keyword(s):

Naive Bayes ◽

High Dimensional Data ◽

Naïve Bayes ◽

High Dimensional ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defects ◽

Bayesian Algorithm ◽

Software Defect ◽

Bayes Algorithm

The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.

Download Full-text

Analyzing Software Defect Prediction Using K-Means and Expectation Maximization Clustering Algorithm Based On Genetic Feature Selection

i-manager’s Journal on Software Engineering ◽

10.26634/jse.11.1.8194 ◽

2016 ◽

Vol 11 (1) ◽

pp. 28

Author(s):

REENA R. ◽

SELVI R. THIRUMALAI ◽

◽

Keyword(s):

Feature Selection ◽

Expectation Maximization ◽

Clustering Algorithm ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Genetic Feature ◽

Genetic Feature Selection

Download Full-text

Incremental Feature Selection Method for Software Defect Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1252.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 1345-1353 ◽

Cited By ~ 1

Keyword(s):

Feature Selection ◽

Software Metrics ◽

Prediction Models ◽

Search Algorithm ◽

Feature Selection Method ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models ◽

Selection Of

Software defect prediction models are essential for understanding quality attributes relevant for software organization to deliver better software reliability. This paper focuses mainly based on the selection of attributes in the perspective of software quality estimation for incremental database. A new dimensionality reduction method Wilk’s Lambda Average Threshold (WLAT) is presented for selection of optimal features which are used for classifying modules as fault prone or not. This paper uses software metrics and defect data collected from benchmark data sets. The comparative results confirm that the statistical search algorithm (WLAT) outperforms the other relevant feature selection methods for most classifiers. The main advantage of the proposed WLAT method is: The selected features can be reused when there is increase or decrease in database size, without the need of extracting features afresh. In addition, performances of the defect prediction models either remains unchanged or improved even after eliminating 85% of the software metrics.

Download Full-text

An optimized feature selection using fuzzy mutual information based ant colony optimization for software defect prediction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.9954 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 456

Author(s):

G Manivasagam ◽

R Gunasundari

Keyword(s):

Feature Selection ◽

Software Engineering ◽

Mutual Information ◽

Ant Colony Optimization ◽

Optimization Technique ◽

Ant Colony ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defects ◽

Software Defect

In recent years, there is a significant notification focused towards the prediction of software defect in the field of software engineering. The prediction of software defects assist in reducing the cost of testing effort, improving the process of software testing and to concentrate only on the fault-prone software modules. Recently, software defect prediction is an important research topic in the software engineering field. One of the important factors which effect the software defect detection is the presence of noisy features in the dataset. The objective of this proposed work is to contribute an optimization technique for the selection of potential features to improve the prediction capability of software defects more accurately. The Fuzzy Mutual Information Ant Colony Optimization is used for searching the optimal feature set with the ability of Meta heuristic search. This proposed feature selection efficiency is evaluated using the datasets from NASA metric data repository. Simulation results have indicated that the proposed method makes an impressive enhancement in the prediction of routine for three different classifiers used in this work.

Download Full-text

Discriminating features-based cost-sensitive approach for software defect prediction

Automated Software Engineering ◽

10.1007/s10515-021-00289-8 ◽

2021 ◽

Vol 28 (2) ◽

Author(s):

Aftab Ali ◽

Naveed Khan ◽

Mamun Abu-Tair ◽

Joost Noppen ◽

Sally McClean ◽

...

Keyword(s):

Prediction Models ◽

Source Code ◽

Area Under The Curve ◽

Software Component ◽

Defect Prediction ◽

Unbalanced Data ◽

Software System ◽

Software Defect Prediction ◽

Software Defect ◽

Selection Of

AbstractCorrelated quality metrics extracted from a source code repository can be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highly unbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a fact that the selection of the best discriminating features significantly improves the robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating features that help in accurately predicting a defect in a software component. Secondly, a cost-sensitive logistic regression and decision tree ensemble-based prediction models are applied to the best discriminating features for precisely predicting a defect in a software component. The proposed models are compared with the most recent schemes in the literature in terms of accuracy, area under the curve, and recall. The models are evaluated using 11 datasets and it is evident from the results and analysis that the performance of the proposed prediction models outperforms the schemes in the literature.

Download Full-text

Towards Predicting Software Defects with Clustering Techniques

International Journal of Artificial Intelligence & Applications ◽

10.5121/ijaia.2021.12103 ◽

2021 ◽

Vol 12 (1) ◽

pp. 39-54

Author(s):

Waheeda Almayyan

Keyword(s):

Feature Selection ◽

Predictive Model ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Grey Wolf Optimizer ◽

Software Defect Prediction ◽

Software Defects ◽

Particle Swarm Optimizer ◽

Clustering Techniques ◽

Software Defect

The purpose of software defect prediction is to improve the quality of a software project by building a predictive model to decide whether a software module is or is not fault prone. In recent years, much research in using machine learning techniques in this topic has been performed. Our aim was to evaluate the performance of clustering techniques with feature selection schemes to address the problem of software defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA) dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer (GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number of features.

Download Full-text