Research on Software Defect Prediction Method Based on Machine Learning

2014 ◽  
Vol 687-691 ◽  
pp. 2182-2185 ◽  
Author(s):  
Wei Zhang ◽  
Zhen Yu Ma ◽  
Qing Ling Lu ◽  
Xiao Bing Nie ◽  
Juan Liu

This paper analyzed 44 metrics of application level, file level, class level and function level, and do correlation analysis with the number of software defects and defect density, the results show that software metrics have little correlation with the number of software defect, but are correlative with defect density. Through correlation analysis, we selected five metrics that have larger correlation with defect density. On the basis of feature selection, we predicted defect density with 16 machine learning models for 33 actual software projects. The results show that the Spearman rank correlation coefficient (SRCC) between the predicting defect density and the actual defect density based on SVR model is 0.6727, higher than other 15 machine learning models, the model that has the second absolute value of SRCC is IBk model, the SRCC only is-0.3557, the results show that the method based on SVR has the highest prediction accuracy.

2019 ◽  
Vol 14 (2) ◽  
pp. 97-106
Author(s):  
Ning Yan ◽  
Oliver Tat-Sheung Au

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.


2020 ◽  
Vol 15 (1) ◽  
pp. 35-42
Author(s):  
A.O. Balogun ◽  
A.O. Bajeh ◽  
H.A. Mojeed ◽  
A.G. Akintola

Failure of software systems as a result of software testing is very much rampant as modern software systems are large and complex. Software testing which is an integral part of the software development life cycle (SDLC), consumes both human and capital resources. As such, software defect prediction (SDP) mechanisms are deployed to strengthen the software testing phase in SDLC by predicting defect prone modules or components in software systems. Machine learning models are used for developing the SDP models with great successes achieved. Moreover, some studies have highlighted that a combination of machine learning models as a form of an ensemble is better than single SDP models in terms of prediction accuracy. However, the efficiency of machine learning models can change with diverse predictive evaluation metrics. Thus, more studies are needed to establish the effectiveness of ensemble SDP models over single SDP models. This study proposes the deployment of Multi-Criteria Decision Method (MCDM) techniques to rank machine learning models. Analytic Network Process (ANP) and Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) which are types of MCDM techniques are deployed on 9 machine learning models with 11 performance evaluation metrics and 11 software defects datasets. The experimental results showed that ensemble SDP models are best appropriate SDP models as Boosted SMO and Boosted PART ranked highest for each of the MCDM techniques. Besides, the experimental results also validated the stand of not considering accuracy as the only performance evaluation metrics for SDP models. Conclusively, more performance metrics other than predictive accuracy should be considered when ranking and evaluating machine learning models. Keywords: Ensemble; Multi-Criteria Decision Method; Software Defect Prediction


2015 ◽  
Vol 713-715 ◽  
pp. 2225-2228
Author(s):  
Wei Zhang ◽  
Zhen Yu Ma ◽  
Wen Ge Zhang ◽  
Qing Ling Lu ◽  
Xiao Bing Nie

It is very useful for improving software quality if we can find which software metrics are more correlative with software defects or defects density. Based on 33 actual software projects, we analyzed 44 software metrics from application level, file level, class level and function level, and do correlation analysis with the number of software defects and defect density, the results show that software metrics have little correlation with the number of software defects, but are correlative with defect density. Through correlation analysis, we selected five metrics that have larger correlation with defect density, these metrics can be used for improving software quality and predicting software defects density.


Author(s):  
Ishrat-Un-Nisa Uqaili ◽  
Syed Nadeem Ahsan

During software development and maintenance phases, the fixing of severe bugs are mostly very challenging and needs more efforts to fix them on a priority basis. Several research works have been performed using software metrics and predict fault-prone software module. In this paper, we propose an approach to categorize different types of bugs according to their severity and priority basis and then use them to label software metrics’ data. Finally, we used labeled data to train the supervised machine learning models for the prediction of fault prone software modules. Moreover, to build an effective prediction model, we used genetic algorithm to search those sets of metrics which are highly correlated with severe bugs.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


Author(s):  
Feidu Akmel ◽  
Ermiyas Birihanu ◽  
Bahir Siraj

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.


Sign in / Sign up

Export Citation Format

Share Document