Statistical and Machine Learning Methods for Software Fault Prediction Using CK Metric Suite: A Comparative Analysis

Experimental validation of software metrics in fault prediction for object-oriented methods using statistical and machine learning methods is necessary. By the process of validation the quality of software product in a software organization is ensured. Object-oriented metrics play a crucial role in predicting faults. This paper examines the application of linear regression, logistic regression, and artificial neural network methods for software fault prediction using Chidamber and Kemerer (CK) metrics. Here, fault is considered as dependent variable and CK metric suite as independent variables. Statistical methods such as linear regression, logistic regression, and machine learning methods such as neural network (and its different forms) are being applied for detecting faults associated with the classes. The comparison approach was applied for a case study, that is, Apache integration framework (AIF) version 1.6. The analysis highlights the significance of weighted method per class (WMC) metric for fault classification, and also the analysis shows that the hybrid approach of radial basis function network obtained better fault prediction rate when compared with other three neural network models.

Download Full-text

Taxonomy of machine learning algorithms in software fault prediction using object oriented metrics

Procedia Computer Science ◽

10.1016/j.procs.2018.05.115 ◽

2018 ◽

Vol 132 ◽

pp. 993-1001 ◽

Cited By ~ 7

Author(s):

Ajmer Singh ◽

Rajesh Bhatia ◽

Anita Singhrova

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Object Oriented ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault ◽

Object Oriented Metrics

Download Full-text

Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

Atmospheric Chemistry and Physics ◽

10.5194/acp-16-8181-2016 ◽

2016 ◽

Vol 16 (13) ◽

pp. 8181-8191 ◽

Cited By ~ 10

Author(s):

Jani Huttunen ◽

Harri Kokkola ◽

Tero Mielonen ◽

Mika Esa Juhani Mononen ◽

Antti Lipponen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Learning Methods ◽

Surface Solar Radiation ◽

Machine Learning Methods ◽

Look Up Table ◽

Non Linear

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

Download Full-text

Machine Learning Enabled Capacitance Prediction for Carbon-Based Supercapacitors

10.26434/chemrxiv.6222221.v1 ◽

2018 ◽

Author(s):

Shan Zhu ◽

Jiajun Li ◽

Liying Ma ◽

Chunnian He ◽

Enzuo Liu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Linear Regression ◽

Learning Methods ◽

Machine Learning Methods ◽

Artificial Neural ◽

Carbon Based

This work applies three machine learning methods, including linear regression, Lasso and artificial neural network, to predict the capacitance of carbon-based supercapacitors.

Download Full-text

A Software Fault Prediction on Inter and Intra Release Prediction Scenarios

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.287611 ◽

2021 ◽

Vol 12 (4) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Research Work ◽

Fault Prediction ◽

Machine Learning Techniques ◽

Software Fault Prediction ◽

Machine Learning Methods ◽

Learning Techniques ◽

Software Modules ◽

Software Fault

Software quality engineering applied numerous techniques for assuring the quality of software, namely testing, verification, validation, fault tolerance, and fault prediction of the software. The machine learning techniques facilitate the identification of software modules as faulty or non-faulty. In most of the research, these approaches predict the fault-prone module in the same release of the software. Although, the model is found to be more efficient and validated when training and tested data are taken from previous and subsequent releases of the software respectively. The contribution of this paper is to predict the faults in two scenarios i.e. inter and intra release prediction. The comparison of both intra and inter-release fault prediction by computing various performance matrices using machine learning methods shows that intra-release prediction is having better accuracy compared to inter-releases prediction across all the releases. Also, but both the scenarios achieve good results in comparison to existing research work.

Download Full-text

Machine learning based methods for software fault prediction: A survey

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114595 ◽

2021 ◽

Vol 172 ◽

pp. 114595

Author(s):

Sushant Kumar Pandey ◽

Ravi Bhushan Mishra ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Fault Prediction ◽

Software Fault Prediction ◽

Software Fault

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Landslide susceptibility mapping based on convolutional neural network and conventional machine learning methods

10.21203/rs.3.rs-190195/v1 ◽

2021 ◽

Author(s):

Rui Liu ◽

Xin Yang ◽

Chong Xu ◽

Luyao Li ◽

Xiangqiang Zeng

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Conventional Machine

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.

Download Full-text