Software vulnerability prediction using Grey wolf optimized Random forest on the unbalanced data sets

Any vulnerability in the software creates a software security threat and helps hackers to gain unauthorized access to resources. Vulnerability prediction models help software engineers to effectively allocate their resources to find any vulnerable class in the software, before its delivery to customers. Vulnerable classes must be carefully reviewed by security experts and tested to identify potential threats that may arise in the future. In the present work, a novel technique based on Grey wolf algorithm and Random forest is proposed for software vulnerability prediction. Grey wolf technique is a metaheuristic technique and it is used to select the best subset of features. The proposed technique is compared with other machine learning techniques. Experiments were performed on three datasets available publicly. It was observed that our proposed technique (GW-RF) outperformed all other techniques for software vulnerability prediction.

Download Full-text

Machine learning predictivity applied to consumer creditworthiness

Future Business Journal ◽

10.1186/s43093-020-00041-w ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Maisa Cardoso Aniceto ◽

Flavio Barboza ◽

Herbert Kimura

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Credit Risk ◽

Performance Metrics ◽

Prediction Models ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Default Prediction

AbstractCredit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower’s classification models using a Brazilian bank’s loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.

Download Full-text

A Review of Statistical and Machine Learning Techniques for Microvascular Complications in Type 2 Diabetes

Current Diabetes Reviews ◽

10.2174/1573399816666200511003357 ◽

2020 ◽

Vol 16 ◽

Author(s):

Nitigya Sambyal ◽

Poonam Saini ◽

Rupali Syal

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Clinical Medicine ◽

Microvascular Complications ◽

Descriptive Analysis ◽

Machine Learning Techniques ◽

World Health ◽

Public Health Issue ◽

Learning Techniques ◽

Health Organization

Background and Introduction: Diabetes mellitus is a metabolic disorder that has emerged as a serious public health issue worldwide. According to the World Health Organization (WHO), without interventions, the number of diabetic incidences is expected to be at least 629 million by 2045. Uncontrolled diabetes gradually leads to progressive damage to eyes, heart, kidneys, blood vessels and nerves. Method: The paper presents a critical review of existing statistical and Artificial Intelligence (AI) based machine learning techniques with respect to DM complications namely retinopathy, neuropathy and nephropathy. The statistical and machine learning analytic techniques are used to structure the subsequent content review. Result: It has been inferred that statistical analysis can help only in inferential and descriptive analysis whereas, AI based machine learning models can even provide actionable prediction models for faster and accurate diagnose of complications associated with DM. Conclusion: The integration of AI based analytics techniques like machine learning and deep learning in clinical medicine will result in improved disease management through faster disease detection and cost reduction for disease treatment.

Download Full-text

Sustainability Performance Assessment Using Self-Organizing Maps (SOM) and Classification and Ensembles of Regression Trees (CART)

Sustainability ◽

10.3390/su13073870 ◽

2021 ◽

Vol 13 (7) ◽

pp. 3870

Author(s):

Mehrbakhsh Nilashi ◽

Shahla Asadi ◽

Rabab Ali Abumalloh ◽

Sarminah Samad ◽

Fahad Ghabban ◽

...

Keyword(s):

Performance Assessment ◽

Prediction Accuracy ◽

Prediction Models ◽

Sustainability Assessment ◽

Regression Trees ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Sustainability Performance ◽

Learning Techniques ◽

Self Organizing

This study aims to develop a new approach based on machine learning techniques to assess sustainability performance. Two main dimensions of sustainability, ecological sustainability, and human sustainability, were considered in this study. A set of sustainability indicators was used, and the research method in this study was developed using cluster analysis and prediction learning techniques. A Self-Organizing Map (SOM) was applied for data clustering, while Classification and Regression Trees (CART) were applied to assess sustainability performance. The proposed method was evaluated through Sustainability Assessment by Fuzzy Evaluation (SAFE) dataset, which comprises various indicators of sustainability performance in 128 countries. Eight clusters from the data were found through the SOM clustering technique. A prediction model was found in each cluster through the CART technique. In addition, an ensemble of CART was constructed in each cluster of SOM to increase the prediction accuracy of CART. All prediction models were assessed through the adjusted coefficient of determination approach. The results demonstrated that the prediction accuracy values were high in all CART models. The results indicated that the method developed by ensembles of CART and clustering provide higher prediction accuracy than individual CART models. The main advantage of integrating the proposed method is its ability to automate decision rules from big data for prediction models. The method proposed in this study could be implemented as an effective tool for sustainability performance assessment.

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Autism Spectrum Disorder Identification Using Polynomial Distribution based Convolutional Neural Network

NeuroQuantology ◽

10.14704/nq.2021.19.2.nq21013 ◽

2021 ◽

Vol 19 (2) ◽

pp. 19-30

Author(s):

G. Nagarajan ◽

Dr.A. Mahabub Basha ◽

R. Poornima

Keyword(s):

Neural Network ◽

Prediction Models ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Machine Learning Techniques ◽

Hybrid Technique ◽

Learning Techniques ◽

Chicken Swarm Optimization ◽

Sensitivity Specificity ◽

Polynomial Distribution

One main psychiatric disorder found in humans is ASD (Autistic Spectrum Disorder). The disease manifests in a mental disorder that restricts humans from communications, language, speech in terms of their individual abilities. Even though its cure is complex and literally impossible, its early detection is required for mitigating its intensity. ASD does not have a pre-defined age for affecting humans. A system for effectively predicting ASD based on MLTs (Machine Learning Techniques) is proposed in this work. Hybrid APMs (Autism Prediction Models) combining multiple techniques like RF (Random Forest), CART (Classification and Regression Trees), RF-ID3 (RF-Iterative Dichotomiser 3) perform well, but face issues in memory usage, execution times and inadequate feature selections. Taking these issues into account, this work overcomes these hurdles in this proposed work with a hybrid technique that combines MCSO (Modified Chicken Swarm Optimization) and PDCNN (Polynomial Distribution based Convolution Neural Network) algorithms for its objective. The proposed scheme’s experimental results prove its higher levels of accuracy, precision, sensitivity, specificity, FPRs (False Positive Rates) and lowered time complexity when compared to other methods.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text

Decision Tree: A Machine Learning for Intrusion Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f1234.0486s419 ◽

2019 ◽

Vol 8 (6S4) ◽

pp. 1126-1130

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Detection System ◽

Research Work ◽

Machine Learning Techniques ◽

Data Sets ◽

Legitimate User ◽

Learning Techniques ◽

Three Stages

The Intrusion is a major threat to unauthorized data or legal network using the legitimate user identity or any of the back doors and vulnerabilities in the network. IDS mechanisms are developed to detect the intrusions at various levels. The objective of the research work is to improve the Intrusion Detection System performance by applying machine learning techniques based on decision trees for detection and classification of attacks. The methodology adapted will process the datasets in three stages. The experimentation is conducted on KDDCUP99 data sets based on number of features. The Bayesian three modes are analyzed for different sized data sets based upon total number of attacks. The time consumed by the classifier to build the model is analyzed and the accuracy is done.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Heart Disease Prediction using Machine Learning Techniques

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst2183218 ◽

2021 ◽

pp. 42-47

Author(s):

Ramesh Ponnala ◽

K. Sai Sowjanya

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Linear Model ◽

Machine Learning Techniques ◽

Disease Prediction ◽

Huge Amount ◽

Healthcare Enterprise ◽

Learning Techniques ◽

Accuracy Level

Prediction of Cardiovascular ailment is an important task inside the vicinity of clinical facts evaluation. Machine learning knowledge of has been proven to be effective in helping in making selections and predicting from the huge amount of facts produced by using the healthcare enterprise. on this paper, we advocate a unique technique that pursuits via finding good sized functions by means of applying ML strategies ensuing in improving the accuracy inside the prediction of heart ailment. The severity of the heart disease is classified primarily based on diverse methods like KNN, choice timber and so on. The prediction version is added with special combos of capabilities and several known classification techniques. We produce a stronger performance level with an accuracy level of a 100% through the prediction version for heart ailment with the Hybrid Random forest area with a linear model (HRFLM).

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

10.21203/rs.3.rs-22670/v3 ◽

2020 ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E Braat ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Survival Prediction ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.

Download Full-text