Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects

Software defects prediction at the initial period of the software development life cycle remains a critical and important assignment. Defect prediction and correctness leads to the assurance of the quality of software systems and has remained integral to study in the previous years. The quick forecast of imperfect or defective modules in software development can serve the development squad to use the existing assets competently and effectively to provide remarkable software products in a given short timeline. Hitherto, several researchers have industrialized defect prediction models by utilizing statistical and machine learning techniques that are operative and effective approaches to pinpoint the defective modules. Tree family machine learning techniques are well-thought-out to be one of the finest and ordinarily used supervised learning methods. In this study, different tree family machine learning techniques are employed for software defect prediction using ten benchmark datasets. These techniques include Credal Decision Tree (CDT), Cost-Sensitive Decision Forest (CS-Forest), Decision Stump (DS), Forest by Penalizing Attributes (Forest-PA), Hoeffding Tree (HT), Decision Tree (J48), Logistic Model Tree (LMT), Random Forest (RF), Random Tree (RT), and REP-Tree (REP-T). Performance of each technique is evaluated using different measures, i.e., mean absolute error (MAE), relative absolute error (RAE), root mean squared error (RMSE), root relative squared error (RRSE), specificity, precision, recall, F-measure (FM), G-measure (GM), Matthew’s correlation coefficient (MCC), and accuracy. The overall outcomes of this paper suggested RF technique by producing best results in terms of reducing error rates as well as increasing accuracy on five datasets, i.e., AR3, PC1, PC2, PC3, and PC4. The average accuracy achieved by RF is 90.2238%. The comprehensive outcomes of this study can be used as a reference point for other researchers. Any assertion concerning the enhancement in prediction through any new model, technique, or framework can be benchmarked and verified.

Download Full-text

Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques

Journal of Healthcare Engineering ◽

10.1155/2021/8899263 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Bilal Khan ◽

Rashid Naseem ◽

Muhammad Arif Shah ◽

Karzan Wakil ◽

Atif Khan ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Software Development ◽

Absolute Error ◽

Defect Prediction ◽

Support Vector ◽

Software Defect Prediction ◽

Software Defect ◽

Squared Error ◽

Average Accuracy

Software defect prediction (SDP) in the initial period of the software development life cycle (SDLC) remains a critical and important assignment. SDP is essentially studied during few last decades as it leads to assure the quality of software systems. The quick forecast of defective or imperfect artifacts in software development may serve the development team to use the existing assets competently and more effectively to provide extraordinary software products in the given or narrow time. Previously, several canvassers have industrialized models for defect prediction utilizing machine learning (ML) and statistical techniques. ML methods are considered as an operative and operational approach to pinpoint the defective modules, in which moving parts through mining concealed patterns amid software metrics (attributes). ML techniques are also utilized by several researchers on healthcare datasets. This study utilizes different ML techniques software defect prediction using seven broadly used datasets. The ML techniques include the multilayer perceptron (MLP), support vector machine (SVM), decision tree (J48), radial basis function (RBF), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K-nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB). The performance of each technique is evaluated using different measures, for instance, relative absolute error (RAE), mean absolute error (MAE), root mean squared error (RMSE), root relative squared error (RRSE), recall, and accuracy. The inclusive outcome shows the best performance of RF with 88.32% average accuracy and 2.96 rank value, second-best performance is achieved by SVM with 87.99% average accuracy and 3.83 rank values. Moreover, CDT also shows 87.88% average accuracy and 3.62 rank values, placed on the third position. The comprehensive outcomes of research can be utilized as a reference point for new research in the SDP domain, and therefore, any assertion concerning the enhancement in prediction over any new technique or model can be benchmarked and proved.

Download Full-text

Predicting Heating Load in Energy-Efficient Buildings Through Machine Learning Techniques

Applied Sciences ◽

10.3390/app9204338 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4338 ◽

Cited By ~ 6

Author(s):

Hossein Moayedi ◽

Dieu Tien Bui ◽

Anastasios Dounis ◽

Zongjie Lyu ◽

Loke Kok Foong

Keyword(s):

Machine Learning ◽

Energy Efficient ◽

Design Procedure ◽

Absolute Error ◽

Energy Performance ◽

Machine Learning Techniques ◽

Energy Efficient Buildings ◽

Squared Error ◽

Learning Techniques ◽

Heating Load

The heating load calculation is the first step of the iterative heating, ventilation, and air conditioning (HVAC) design procedure. In this study, we employed six machine learning techniques, namely multi-layer perceptron regressor (MLPr), lazy locally weighted learning (LLWL), alternating model tree (AMT), random forest (RF), ElasticNet (ENet), and radial basis function regression (RBFr) for the problem of designing energy-efficient buildings. After that, these approaches were used to specify a relationship among the parameters of input and output in terms of the energy performance of buildings. The calculated outcomes for datasets from each of the above-mentioned models were analyzed based on various known statistical indexes like root relative squared error (RRSE), root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R2), and relative absolute error (RAE). It was found that between the discussed machine learning-based solutions of MLPr, LLWL, AMT, RF, ENet, and RBFr, the RF was nominated as the most appropriate predictive network. The RF network outcomes determined the R2, MAE, RMSE, RAE, and RRSE for the training dataset to be 0.9997, 0.19, 0.2399, 2.078, and 2.3795, respectively. The RF network outcomes determined the R2, MAE, RMSE, RAE, and RRSE for the testing dataset to be 0.9989, 0.3385, 0.4649, 3.6813, and 4.5995, respectively. These results show the superiority of the presented RF model in estimation of early heating load in energy-efficient buildings.

Download Full-text

Empirical Assessment of Machine Learning Techniques for Software Requirements Risk Prediction

Electronics ◽

10.3390/electronics10020168 ◽

2021 ◽

Vol 10 (2) ◽

pp. 168

Author(s):

Rashid Naseem ◽

Zain Shaukat ◽

Muhammad Irfan ◽

Muhammad Arif Shah ◽

Arshad Ahmad ◽

...

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Naive Bayes ◽

Absolute Error ◽

Naïve Bayes ◽

Error Rates ◽

Machine Learning Techniques ◽

Decision Table ◽

Software Project ◽

Learning Techniques

Software risk prediction is the most sensitive and crucial activity of Software Development Life Cycle (SDLC). It may lead to the success or failure of a project. The risk should be predicted earlier to make a software project successful. A model is proposed for the prediction of software requirement risks using requirement risk dataset and machine learning techniques. In addition, a comparison is made between multiple classifiers that are K-Nearest Neighbour (KNN), Average One Dependency Estimator (A1DE), Naïve Bayes (NB), Composite Hypercube on Iterated Random Projection (CHIRP), Decision Table (DT), Decision Table/Naïve Bayes Hybrid Classifier (DTNB), Credal Decision Trees (CDT), Cost-Sensitive Decision Forest (CS-Forest), J48 Decision Tree (J48), and Random Forest (RF) achieve the best suited technique for the model according to the nature of dataset. These techniques are evaluated using various evaluation metrics including CCI (correctly Classified Instances), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE), Root Relative Squared Error (RRSE), precision, recall, F-measure, Matthew’s Correlation Coefficient (MCC), Receiver Operating Characteristic Area (ROC area), Precision-Recall Curves area (PRC area), and accuracy. The inclusive outcome of this study shows that in terms of reducing error rates, CDT outperforms other techniques achieving 0.013 for MAE, 0.089 for RMSE, 4.498% for RAE, and 23.741% for RRSE. However, in terms of increasing accuracy, DT, DTNB, and CDT achieve better results.

Download Full-text

Detection and Severity Evaluation of Combined Rail Defects Using Deep Learning

Vibration ◽

10.3390/vibration4020022 ◽

2021 ◽

Vol 4 (2) ◽

pp. 341-356

Author(s):

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Mean Absolute Error ◽

Absolute Error ◽

Machine Learning Techniques ◽

Rolling Stock ◽

Raw Data ◽

Learning Techniques ◽

Combined Defects

Various techniques have been developed to detect railway defects. One of the popular techniques is machine learning. This unprecedented study applies deep learning, which is a branch of machine learning techniques, to detect and evaluate the severity of rail combined defects. The combined defects in the study are settlement and dipped joint. Features used to detect and evaluate the severity of combined defects are axle box accelerations simulated using a verified rolling stock dynamic behavior simulation called D-Track. A total of 1650 simulations are run to generate numerical data. Deep learning techniques used in the study are deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN). Simulated data are used in two ways: simplified data and raw data. Simplified data are used to develop the DNN model, while raw data are used to develop the CNN and RNN model. For simplified data, features are extracted from raw data, which are the weight of rolling stock, the speed of rolling stock, and three peak and bottom accelerations from two wheels of rolling stock. In total, there are 14 features used as simplified data for developing the DNN model. For raw data, time-domain accelerations are used directly to develop the CNN and RNN models without processing and data extraction. Hyperparameter tuning is performed to ensure that the performance of each model is optimized. Grid search is used for performing hyperparameter tuning. To detect the combined defects, the study proposes two approaches. The first approach uses one model to detect settlement and dipped joint, and the second approach uses two models to detect settlement and dipped joint separately. The results show that the CNN models of both approaches provide the same accuracy of 99%, so one model is good enough to detect settlement and dipped joint. To evaluate the severity of the combined defects, the study applies classification and regression concepts. Classification is used to evaluate the severity by categorizing defects into light, medium, and severe classes, and regression is used to estimate the size of defects. From the study, the CNN model is suitable for evaluating dipped joint severity with an accuracy of 84% and mean absolute error (MAE) of 1.25 mm, and the RNN model is suitable for evaluating settlement severity with an accuracy of 99% and mean absolute error (MAE) of 1.58 mm.

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

SDP-ML: An Automated Approach of Software Defect Prediction employing Machine Learning Techniques

10.1109/icecit54077.2021.9641218 ◽

2021 ◽

Author(s):

Md Nasir Uddin ◽

Bixin Li ◽

Md Naim Mondol ◽

Md Mostafizur Rahman ◽

Md Suman Mia ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques

Download Full-text

Improved argumentative paragraphs detection in academic theses supported with unit segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219237 ◽

2021 ◽

pp. 1-11

Author(s):

Jesús Miguel García-Gorrostieta ◽

Aurelio López-López ◽

Samuel González-López ◽

Adrián Pastor López-Monroy

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Automatic Detection ◽

Machine Learning Techniques ◽

Svm Classifier ◽

Complex Task ◽

Decision Tree Classifier ◽

Learning Techniques ◽

Tree Classifier ◽

Academic Author

Academic theses writing is a complex task that requires the author to be skilled in argumentation. The goal of the academic author is to communicate clear ideas and to convince the reader of the presented claims. However, few students are good arguers, and this is a skill that takes time to master. In this paper, we present an exploration of lexical features used to model automatic detection of argumentative paragraphs using machine learning techniques. We present a novel proposal, which combines the information in the complete paragraph with the detection of argumentative segments in order to achieve improved results for the detection of argumentative paragraphs. We propose two approaches; a more descriptive one, which uses the decision tree classifier with indicators and lexical features; and another more efficient, which uses an SVM classifier with lexical features and a Document Occurrence Representation (DOR). Both approaches consider the detection of argumentative segments to ensure that a paragraph detected as argumentative has indeed segments with argumentation. We achieved encouraging results for both approaches.

Download Full-text

Machine Learning Techniques to Predict Software Defect

Encyclopedia of Business Analytics and Optimization ◽

10.4018/978-1-4666-5202-6.ch129 ◽

2014 ◽

pp. 1422-1434 ◽

Cited By ~ 1

Author(s):

Ramakanta Mohanty ◽

Vadlamani Ravi

Keyword(s):

Machine Learning ◽

Feature Subset Selection ◽

Machine Learning Techniques ◽

Group Method ◽

Software Project ◽

Feature Subset ◽

Software Defects ◽

Software Defect ◽

Learning Techniques ◽

Sensitivity Specificity

The past 10 years have seen the prediction of software defects proposed by many researchers using various metrics based on measurable aspects of source code entities (e.g. methods, classes, files or modules) and the social structure of software project in an effort to predict the software defects. However, these metrics could not predict very high accuracies in terms of sensitivity, specificity and accuracy. In this chapter, we propose the use of machine learning techniques to predict software defects. The effectiveness of all these techniques is demonstrated on ten datasets taken from literature. Based on an experiment, it is observed that PNN outperformed all other techniques in terms of accuracy and sensitivity in all the software defects datasets followed by CART and Group Method of data handling. We also performed feature selection by t-statistics based approach for selecting feature subsets across different folds for a given technique and followed by the feature subset selection. By taking the most important variables, we invoked the classifiers again and observed that PNN outperformed other classifiers in terms of sensitivity and accuracy. Moreover, the set of ‘if- then rules yielded by J48 and CART can be used as an expert system for prediction of software defects.

Download Full-text

Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning Techniques

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch009 ◽

2012 ◽

pp. 155-165

Author(s):

S. Prasanthi ◽

S.Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Classification Problem ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Data Set ◽

Learning Techniques

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.

Download Full-text

Machine Learning Techniques Applied to Profile Mobile Banking Users in India

International Journal of Information Systems in the Service Sector ◽

10.4018/jisss.2013010105 ◽

2013 ◽

Vol 5 (1) ◽

pp. 82-92 ◽

Cited By ~ 8

Author(s):

M. Carr ◽

V. Ravi ◽

G. Sridharan Reddy ◽

D. Veranna

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Decision Tree ◽

Decision Trees ◽

Multilayer Perceptron ◽

Machine Learning Techniques ◽

Mobile Banking ◽

Classification Rules ◽

Learning Techniques ◽

Potential Customers

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.

Download Full-text