scholarly journals Improved point center algorithm for K-Means clustering to increase software defect prediction

Author(s):  
Riski Annisa ◽  
Didi Rosiyadi ◽  
Dwiza Riana

The k-means is a clustering algorithm that is often and easy to use. This algorithm is susceptible to randomly chosen centroid points so that it cannot produce optimal results. This research aimed to improve the k-means algorithm’s performance by applying a proposed algorithm called point center. The proposed algorithm overcame the random centroid value in k-means and then applied it to predict software defects modules’ errors. The point center algorithm was proposed to determine the initial centroid value for the k-means algorithm optimization. Then, the selection of X and Y variables determined the cluster center members. The ten datasets were used to perform the testing, of which nine datasets were used for predicting software defects. The proposed center point algorithm showed the lowest errors. It also improved the k-means algorithm’s performance by an average of 12.82% cluster errors in the software compared to the centroid value obtained randomly on the simple k-means algorithm. The findings are beneficial and contribute to developing a clustering model to handle data, such as to predict software defect modules more accurately.

Author(s):  
Joko Suntoro ◽  
Febrian Wahyu Christanto ◽  
Henny Indriyawati

The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.


2019 ◽  
Vol 8 (2S3) ◽  
pp. 1345-1353 ◽  

Software defect prediction models are essential for understanding quality attributes relevant for software organization to deliver better software reliability. This paper focuses mainly based on the selection of attributes in the perspective of software quality estimation for incremental database. A new dimensionality reduction method Wilk’s Lambda Average Threshold (WLAT) is presented for selection of optimal features which are used for classifying modules as fault prone or not. This paper uses software metrics and defect data collected from benchmark data sets. The comparative results confirm that the statistical search algorithm (WLAT) outperforms the other relevant feature selection methods for most classifiers. The main advantage of the proposed WLAT method is: The selected features can be reused when there is increase or decrease in database size, without the need of extracting features afresh. In addition, performances of the defect prediction models either remains unchanged or improved even after eliminating 85% of the software metrics.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 456
Author(s):  
G Manivasagam ◽  
R Gunasundari

In recent years, there is a significant notification focused towards the prediction of software defect in the field of software engineering. The prediction of software defects assist in reducing the cost of testing effort, improving the process of software testing and to concentrate only on the fault-prone software modules. Recently, software defect prediction is an important research topic in the software engineering field. One of the important factors which effect the software defect detection is the presence of noisy features in the dataset. The objective of this proposed work is to contribute an optimization technique for the selection of potential features to improve the prediction capability of software defects more accurately. The Fuzzy Mutual Information Ant Colony Optimization is used for searching the optimal feature set with the ability of Meta heuristic search. This proposed feature selection efficiency is evaluated using the datasets from NASA metric data repository. Simulation results have indicated that the proposed method makes an impressive enhancement in the prediction of routine for three different classifiers used in this work.


2021 ◽  
Vol 28 (2) ◽  
Author(s):  
Aftab Ali ◽  
Naveed Khan ◽  
Mamun Abu-Tair ◽  
Joost Noppen ◽  
Sally McClean ◽  
...  

AbstractCorrelated quality metrics extracted from a source code repository can be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highly unbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a fact that the selection of the best discriminating features significantly improves the robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating features that help in accurately predicting a defect in a software component. Secondly, a cost-sensitive logistic regression and decision tree ensemble-based prediction models are applied to the best discriminating features for precisely predicting a defect in a software component. The proposed models are compared with the most recent schemes in the literature in terms of accuracy, area under the curve, and recall. The models are evaluated using 11 datasets and it is evident from the results and analysis that the performance of the proposed prediction models outperforms the schemes in the literature.


Author(s):  
Waheeda Almayyan

The purpose of software defect prediction is to improve the quality of a software project by building a predictive model to decide whether a software module is or is not fault prone. In recent years, much research in using machine learning techniques in this topic has been performed. Our aim was to evaluate the performance of clustering techniques with feature selection schemes to address the problem of software defect prediction problem. We analysed the National Aeronautics and Space Administration (NASA) dataset benchmarks using three clustering algorithms: (1) Farthest First, (2) X-Means, and (3) selforganizing map (SOM). In order to evaluate different feature selection algorithms, this article presents a comparative analysis involving software defects prediction based on Bat, Cuckoo, Grey Wolf Optimizer (GWO), and particle swarm optimizer (PSO). The results obtained with the proposed clustering models enabled us to build an efficient predictive model with a satisfactory detection rate and acceptable number of features.


2011 ◽  
Vol 34 (6) ◽  
pp. 1148-1154 ◽  
Author(s):  
Hui-Yan JIANG ◽  
Mao ZONG ◽  
Xiang-Ying LIU

2019 ◽  
Vol 28 (5) ◽  
pp. 925-932
Author(s):  
Hua WEI ◽  
Chun SHAN ◽  
Changzhen HU ◽  
Yu ZHANG ◽  
Xiao YU

Sign in / Sign up

Export Citation Format

Share Document