Research on Software Defect Prediction Method Based on Machine Learning

This paper analyzed 44 metrics of application level, file level, class level and function level, and do correlation analysis with the number of software defects and defect density, the results show that software metrics have little correlation with the number of software defect, but are correlative with defect density. Through correlation analysis, we selected five metrics that have larger correlation with defect density. On the basis of feature selection, we predicted defect density with 16 machine learning models for 33 actual software projects. The results show that the Spearman rank correlation coefficient (SRCC) between the predicting defect density and the actual defect density based on SVR model is 0.6727, higher than other 15 machine learning models, the model that has the second absolute value of SRCC is IBk model, the SRCC only is-0.3557, the results show that the method based on SVR has the highest prediction accuracy.

Download Full-text

Online learning behavior analysis based on machine learning

Asian Association of Open Universities Journal ◽

10.1108/aaouj-08-2019-0029 ◽

2019 ◽

Vol 14 (2) ◽

pp. 97-106

Author(s):

Ning Yan ◽

Oliver Tat-Sheung Au

Keyword(s):

Machine Learning ◽

Online Learning ◽

Correlation Analysis ◽

Prediction Accuracy ◽

Classification Models ◽

Limited Data ◽

Learning Models ◽

Learning Behavior ◽

Content Type ◽

Machine Learning Models

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.

Download Full-text

Software defect prediction: A multi-criteria decision-making approach

Nigerian Journal of Technological Research ◽

10.4314/njtr.v15i1.7 ◽

2020 ◽

Vol 15 (1) ◽

pp. 35-42

Author(s):

A.O. Balogun ◽

A.O. Bajeh ◽

H.A. Mojeed ◽

A.G. Akintola

Keyword(s):

Machine Learning ◽

Software Testing ◽

Evaluation Metrics ◽

Defect Prediction ◽

Software Systems ◽

Software Defect Prediction ◽

Learning Models ◽

Decision Method ◽

Software Defect ◽

Machine Learning Models

Failure of software systems as a result of software testing is very much rampant as modern software systems are large and complex. Software testing which is an integral part of the software development life cycle (SDLC), consumes both human and capital resources. As such, software defect prediction (SDP) mechanisms are deployed to strengthen the software testing phase in SDLC by predicting defect prone modules or components in software systems. Machine learning models are used for developing the SDP models with great successes achieved. Moreover, some studies have highlighted that a combination of machine learning models as a form of an ensemble is better than single SDP models in terms of prediction accuracy. However, the efficiency of machine learning models can change with diverse predictive evaluation metrics. Thus, more studies are needed to establish the effectiveness of ensemble SDP models over single SDP models. This study proposes the deployment of Multi-Criteria Decision Method (MCDM) techniques to rank machine learning models. Analytic Network Process (ANP) and Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) which are types of MCDM techniques are deployed on 9 machine learning models with 11 performance evaluation metrics and 11 software defects datasets. The experimental results showed that ensemble SDP models are best appropriate SDP models as Boosted SMO and Boosted PART ranked highest for each of the MCDM techniques. Besides, the experimental results also validated the stand of not considering accuracy as the only performance evaluation metrics for SDP models. Conclusively, more performance metrics other than predictive accuracy should be considered when ranking and evaluating machine learning models. Keywords: Ensemble; Multi-Criteria Decision Method; Software Defect Prediction

Download Full-text

Semi-empirical prediction method for monthly precipitation prediction based on environmental factors and comparison with stochastic and machine learning models

Hydrological Sciences Journal ◽

10.1080/02626667.2020.1784901 ◽

2020 ◽

Vol 65 (11) ◽

pp. 1928-1942 ◽

Cited By ~ 1

Author(s):

Huihui Zhang ◽

Hugo A. Loáiciga ◽

Fu Ren ◽

Qingyun Du ◽

Da Ha

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Prediction Method ◽

Monthly Precipitation ◽

Learning Models ◽

Empirical Prediction ◽

Precipitation Prediction ◽

Semi Empirical ◽

Machine Learning Models

Download Full-text

Transfer Learning Code Vectorizer based Machine Learning Models for Software Defect Prediction

2020 International Conference on Computational Performance Evaluation (ComPE) ◽

10.1109/compe49325.2020.9200076 ◽

2020 ◽

Author(s):

Rituraj Singh ◽

Jasmeet Singh ◽

Mehrab Singh Gill ◽

Ruchika Malhotra ◽

Garima

Keyword(s):

Machine Learning ◽

Transfer Learning ◽

Defect Prediction ◽

Software Defect Prediction ◽

Learning Models ◽

Software Defect ◽

Machine Learning Models

Download Full-text

Correlation Analysis of Software Defects Density and Metrics

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.2225 ◽

2015 ◽

Vol 713-715 ◽

pp. 2225-2228

Author(s):

Wei Zhang ◽

Zhen Yu Ma ◽

Wen Ge Zhang ◽

Qing Ling Lu ◽

Xiao Bing Nie

Keyword(s):

Correlation Analysis ◽

Software Quality ◽

Software Metrics ◽

Defect Density ◽

Software Defects ◽

Software Projects ◽

Class Level ◽

And Function

It is very useful for improving software quality if we can find which software metrics are more correlative with software defects or defects density. Based on 33 actual software projects, we analyzed 44 software metrics from application level, file level, class level and function level, and do correlation analysis with the number of software defects and defect density, the results show that software metrics have little correlation with the number of software defects, but are correlative with defect density. Through correlation analysis, we selected five metrics that have larger correlation with defect density, these metrics can be used for improving software quality and predicting software defects density.

Download Full-text

Machine Learning Based Prediction of Complex Bugs in Source Code

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/1/4 ◽

2019 ◽

pp. 26-37

Author(s):

Ishrat-Un-Nisa Uqaili ◽

Syed Nadeem Ahsan

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Source Code ◽

Supervised Machine Learning ◽

Learning Models ◽

Software Module ◽

Different Types ◽

Software Modules ◽

Highly Correlated ◽

Machine Learning Models

During software development and maintenance phases, the fixing of severe bugs are mostly very challenging and needs more efforts to fix them on a priority basis. Several research works have been performed using software metrics and predict fault-prone software module. In this paper, we propose an approach to categorize different types of bugs according to their severity and priority basis and then use them to label software metrics’ data. Finally, we used labeled data to train the supervised machine learning models for the prediction of fault prone software modules. Moreover, to build an effective prediction model, we used genetic algorithm to search those sets of metrics which are highly correlated with severe bugs.

Download Full-text

Improving XGBoost with Imagination Sampling

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.2.1.holloway.1 ◽

2020 ◽

Vol 2 (1) ◽

pp. 3-6

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

General System ◽

Learning Models ◽

Starting Point ◽

Machine Learning Models

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.

Download Full-text

Development of Machine Learning Models to Predict Student Performance in Computer Literacy Courses

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v13i1.16863 ◽

2018 ◽

Vol 13 (1) ◽

pp. 21

Author(s):

George Anderson ◽

Oduronke T. Eyitayo

Keyword(s):

Machine Learning ◽

Student Performance ◽

Computer Literacy ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Experimental Comparison of Machine Learning Models in Malware Packing Detection

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237007 ◽

2020 ◽

Author(s):

Jong-Wouk Kim ◽

Juhong Namgung ◽

Yang-Sae Moon ◽

Mi-Jung Choi

Keyword(s):

Machine Learning ◽

Experimental Comparison ◽

Learning Models ◽

Machine Learning Models

Download Full-text

A Literature Review Study of Software Defect Prediction using Machine Learning Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.286 ◽

2018 ◽

Vol 6 (6) ◽

pp. 300 ◽

Cited By ~ 3

Author(s):

Feidu Akmel ◽

Ermiyas Birihanu ◽

Bahir Siraj

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Quality Standard ◽

Machine Learning Techniques ◽

Software Systems ◽

Health Care Insurance ◽

Software Defect ◽

Learning Techniques ◽

Software Product

Software systems are any software product or applications that support business domains such as Manufacturing,Aviation, Health care, insurance and so on.Software quality is a means of measuring how software is designed and how well the software conforms to that design. Some of the variables that we are looking for software quality are Correctness, Product quality, Scalability, Completeness and Absence of bugs, However the quality standard that was used from one organization is different from other for this reason it is better to apply the software metrics to measure the quality of software. Attributes that we gathered from source code through software metrics can be an input for software defect predictor. Software defect are an error that are introduced by software developer and stakeholders. Finally, in this study we discovered the application of machine learning on software defect that we gathered from the previous research works.

Download Full-text