Predicting in-Hospital Mortality of Patients with COVID-19 Using Machine Learning Techniques

The present work aims to identify the predictors of COVID-19 in-hospital mortality testing a set of Machine Learning Techniques (MLTs), comparing their ability to predict the outcome of interest. The model with the best performance will be used to identify in-hospital mortality predictors and to build an in-hospital mortality prediction tool. The study involved patients with COVID-19, proved by PCR test, admitted to the “Ospedali Riuniti Padova Sud” COVID-19 referral center in the Veneto region, Italy. The algorithms considered were the Recursive Partition Tree (RPART), the Support Vector Machine (SVM), the Gradient Boosting Machine (GBM), and Random Forest. The resampled performances were reported for each MLT, considering the sensitivity, specificity, and the Receiving Operative Characteristic (ROC) curve measures. The study enrolled 341 patients. The median age was 74 years, and the male gender was the most prevalent. The Random Forest algorithm outperformed the other MLTs in predicting in-hospital mortality, with a ROC of 0.84 (95% C.I. 0.78–0.9). Age, together with vital signs (oxygen saturation and the quick SOFA) and lab parameters (creatinine, AST, lymphocytes, platelets, and hemoglobin), were found to be the strongest predictors of in-hospital mortality. The present work provides insights for the prediction of in-hospital mortality of COVID-19 patients using a machine-learning algorithm.

Download Full-text

Classification of Agriculture Farm Machinery Using Machine Learning and Internet of Things

Symmetry ◽

10.3390/sym13030403 ◽

2021 ◽

Vol 13 (3) ◽

pp. 403

Author(s):

Muhammad Waleed ◽

Tai-Won Um ◽

Tariq Kamal ◽

Syed Muhammad Usman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Farm Machinery ◽

Learning Techniques

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.

Download Full-text

Learning from Imbalanced Educational Data Using Ensemble Machine Learning Algorithms

Webology ◽

10.14704/web/v18si01/web18053 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 183-195

Author(s):

Thingbaijam Lenin ◽

N. Chandrasekaran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Missing Values ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Adaptive Boosting ◽

Stochastic Gradient Boosting ◽

Ensemble Machine Learning ◽

Learning Techniques ◽

Student’S Performance

Student’s academic performance is one of the most important parameters for evaluating the standard of any institute. It has become a paramount importance for any institute to identify the student at risk of underperforming or failing or even drop out from the course. Machine Learning techniques may be used to develop a model for predicting student’s performance as early as at the time of admission. The task however is challenging as the educational data required to explore for modelling are usually imbalanced. We explore ensemble machine learning techniques namely bagging algorithm like random forest (rf) and boosting algorithms like adaptive boosting (adaboost), stochastic gradient boosting (gbm), extreme gradient boosting (xgbTree) in an attempt to develop a model for predicting the student’s performance of a private university at Meghalaya using three categories of data namely demographic, prior academic record, personality. The collected data are found to be highly imbalanced and also consists of missing values. We employ k-nearest neighbor (knn) data imputation technique to tackle the missing values. The models are developed on the imputed data with 10 fold cross validation technique and are evaluated using precision, specificity, recall, kappa metrics. As the data are imbalanced, we avoid using accuracy as the metrics of evaluating the model and instead use balanced accuracy and F-score. We compare the ensemble technique with single classifier C4.5. The best result is provided by random forest and adaboost with F-score of 66.67%, balanced accuracy of 75%, and accuracy of 96.94%.

Download Full-text

Comparison of Machine Learning algorithm for COVID-19 Death Risk Prediction

10.21203/rs.3.rs-196077/v1 ◽

2021 ◽

Author(s):

Praveeen Anandhanathan ◽

Priyanka Gopalan

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Techniques ◽

Support Vector ◽

Nearest Neighbour ◽

Decision Tree Algorithm ◽

The Past ◽

Random Forest Method ◽

Learning Techniques ◽

The World

Abstract Coronavirus disease (COVID-19) is spreading across the world. Since at first it has appeared in Wuhan, China in December 2019, it has become a serious issue across the globe. There are no accurate resources to predict and find the disease. So, by knowing the past patients’ records, it could guide the clinicians to fight against the pandemic. Therefore, for the prediction of healthiness from symptoms Machine learning techniques can be implemented. From this we are going to analyse only the symptoms which occurs in every patient. These predictions can help clinicians in the easier manner to cure the patients. Already for prediction of many of the diseases, techniques like SVM (Support vector Machine), Fuzzy k-Means Clustering, Decision Tree algorithm, Random Forest Method, ANN (Artificial Neural Network), KNN (k-Nearest Neighbour), Naïve Bayes, Linear Regression model are used. As we haven’t faced this disease before, we can’t say which technique will give the maximum accuracy. So, we are going to provide an efficient result by comparing all the such algorithms in RStudio.

Download Full-text

Classification study of solvation free energies of organic molecules using machine learning techniques

RSC Advances ◽

10.1039/c4ra07961b ◽

2014 ◽

Vol 4 (106) ◽

pp. 61624-61630 ◽

Cited By ~ 8

Author(s):

N. S. Hari Narayana Moorthy ◽

Silvia A. Martins ◽

Sergio F. Sousa ◽

Maria J. Ramos ◽

Pedro A. Fernandes

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Organic Molecules ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Models ◽

Free Energies ◽

Learning Techniques ◽

Solvation Free Energies

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.

Download Full-text

Prediction of probable backorder scenarios in the supply chain using Distributed Random Forest and Gradient Boosting Machine learning techniques

Journal Of Big Data ◽

10.1186/s40537-020-00345-2 ◽

2020 ◽

Vol 7 (1) ◽

Cited By ~ 1

Author(s):

Samiul Islam ◽

Saman Hassanzadeh Amin

Keyword(s):

Machine Learning ◽

Supply Chain ◽

Random Forest ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques ◽

Gradient Boosting Machine

Download Full-text

Estimating Warehouse Rental Price using Machine Learning Techniques

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.2.3034 ◽

2018 ◽

Vol 13 (2) ◽

pp. 235-250 ◽

Cited By ~ 3

Author(s):

Yixuan Ma ◽

Zhenji Zhang ◽

Alexander Ihler ◽

Baoxiang Pan

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Estate ◽

Rapid Development ◽

Supply And Demand ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Logistics Industry ◽

Real Estate Price ◽

Learning Techniques

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.

Download Full-text

Credit Risk Assessment using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4936.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3482-3486

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Random Forest ◽

Credit Risk ◽

Banking Sector ◽

Machine Learning Techniques ◽

Support Vector ◽

Credit Risk Assessment ◽

Learning Techniques ◽

Cart Algorithm

Analysis of credit scoring is an effective credit risk assessment technique, which is one of the major research fields in the banking sector. Machine learning has a variety of applications in the banking sector and it has been widely used for data analysis. Modern techniques such as machine learning have provided a self-regulating process to analyze the data using classification techniques. The classification method is a supervised learning process in which the computer learns from the input data provided and makes use of this information to classify the new dataset. This research paper presents a comparison of various machine learning techniques used to evaluate the credit risk. A credit transaction that needs to be accepted or rejected is trained and implemented on the dataset using different machine learning algorithms. The techniques are implemented on the German credit dataset taken from UCI repository which has 1000 instances and 21 attributes, depending on which the transactions are either accepted or rejected. This paper compares algorithms such as Support Vector Network, Neural Network, Logistic Regression, Naive Bayes, Random Forest, and Classification and Regression Trees (CART) algorithm and the results obtained show that Random Forest algorithm was able to predict credit risk with higher accuracy

Download Full-text

Machine Learning Algorithms For Understanding The Determinants of Under-Five Mortality

10.21203/rs.3.rs-1021040/v1 ◽

2021 ◽

Author(s):

Rakesh Kumar Saroj ◽

Pawan Kumar Yadav ◽

Rajneesh Singh ◽

Obvious Nchimunya Chilyabanyama

Keyword(s):

Machine Learning ◽

Random Forest ◽

Information Gain ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Mortality Data ◽

Mortality Factors ◽

Under Five ◽

Learning Techniques

Abstract Background: The death rate of under-five children in India declined last few decades, but few bigger states have poor performance. This is a matter of serious concern for the child's health as well as social development. Nowadays, machine learning techniques play a crucial role in the smart health care system to capture the hidden factors and patterns of outcomes. In this paper, we used machine learning techniques to predict the important factors of under-five mortality.This study aims to explore the importance of machine learning techniques to predict under-five mortality and to find the important factors that cause under-five mortality.The data was taken from the National Family Health Survey-IV of Uttar Pradesh. We used four machine learning techniques like decision tree, support vector machine, random forest, and logistic regression to predict under-five mortality factors and model accuracy of each model. We have also used information gain to rank to know the important variables for accurate predictions in under-five mortality data.Result: Random Forest (RF) predicts the child mortality factors with the highest accuracy of 97.5 %, and the number of living children, births in the last five years, educational level, birth order, total children ever born, currently breastfeeding, and size of child at birth that identifying as essential factors for under-five mortality.Conclusion: The study focuses on machine learning techniques to predict and identify important factors for under-five mortality. The random forest model provides an excellent predictive result for estimating the risk factors of under-five mortality. Based on the resulting outcome, policymakers can make policies and plans to reduce under-five mortality.

Download Full-text

Evaluating machine learning techniques for archaeological lithic sourcing: a case study of flint in Britain

Scientific Reports ◽

10.1038/s41598-021-87834-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tom Elliot ◽

Robert Morse ◽

Duane Smythe ◽

Ashley Norris

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Random Forest ◽

Objective Evaluation ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Vector Machines

AbstractIt is 50 years since Sieveking et al. published their pioneering research in Nature on the geochemical analysis of artefacts from Neolithic flint mines in southern Britain. In the decades since, geochemical techniques to source stone artefacts have flourished globally, with a renaissance in recent years from new instrumentation, data analysis, and machine learning techniques. Despite the interest over these latter approaches, there has been variation in the quality with which these methods have been applied. Using the case study of flint artefacts and geological samples from England, we present a robust and objective evaluation of three popular techniques, Random Forest, K-Nearest-Neighbour, and Support Vector Machines, and present a pipeline for their appropriate use. When evaluated correctly, the results establish high model classification performance, with Random Forest leading with an average accuracy of 85% (measured through F1 Scores), and with Support Vector Machines following closely. The methodology developed in this paper demonstrates the potential to significantly improve on previous approaches, particularly in removing bias, and providing greater means of evaluation than previously utilised.

Download Full-text

A Novel Approach for Detecting DGA-Based Botnets in DNS Queries Using Machine Learning Techniques

Journal of Computer Networks and Communications ◽

10.1155/2021/4767388 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ali Soleymani ◽

Fatemeh Arabgol

Keyword(s):

Machine Learning ◽

Random Forest ◽

Text Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Detection Accuracy ◽

Domain Name ◽

Botnet Detection ◽

Learning Techniques

In today’s security landscape, advanced threats are becoming increasingly difficult to detect as the pattern of attacks expands. Classical approaches that rely heavily on static matching, such as blacklisting or regular expression patterns, may be limited in flexibility or uncertainty in detecting malicious data in system data. This is where machine learning techniques can show their value and provide new insights and higher detection rates. The behavior of botnets that use domain-flux techniques to hide command and control channels was investigated in this research. The machine learning algorithm and text mining used to analyze the network DNS protocol and identify botnets were also described. For this purpose, extracted and labeled domain name datasets containing healthy and infected DGA botnet data were used. Data preprocessing techniques based on a text-mining approach were applied to explore domain name strings with n-gram analysis and PCA. Its performance is improved by extracting statistical features by principal component analysis. The performance of the proposed model has been evaluated using different classifiers of machine learning algorithms such as decision tree, support vector machine, random forest, and logistic regression. Experimental results show that the random forest algorithm can be used effectively in botnet detection and has the best botnet detection accuracy.

Download Full-text