Abstract 13455: Predicting Emergency Department Disposition at Triage for Suspected Patients With Acute Coronary Syndrome Using Machine Learning Algorithms

Introduction: Overcrowded emergency departments (ED) and undifferentiated patients make the provision of care and resources challenging. We examined whether machine learning algorithms could identify ED patients’ disposition (hospitalization and critical care admission) using readily available objective triage data among patients with symptoms suggestive of acute coronary syndrome (ACS). Methods: This was a retrospective observational cohort study of adult patients who were triaged at the ED for a suspected coronary event. A total of 162 input variables (k) were extracted from the electronic health record: demographics (k=3), mode of transportation (k=1), past medical/surgical history (k=57), first ED vital signs (k=7), home medications (k=31), symptomology (k=40), and the computer generated automatic interpretation of 12-lead electrocardiogram (k=23). The primary outcomes were hospitalization and critical care admission (i.e., admission to intensive or step-down care unit). We used 10-fold stratified cross validation to evaluate the performance of five machine learning algorithms to predict the study outcomes: logistic regression, naïve Bayes, random forest, gradient boosting and artificial neural network classifiers. We determined the best model by comparing the area under the receiver operating characteristic curve (AUC) of all models. Results: Included were 1201 patients (age 64±14, 39% female; 10% Black) with a total of 956 hospitalizations, and 169 critical care admissions. The best performing machine learning classifier for the outcome of hospitalization was gradient boosting machine with an AUC of 0.85 (95% CI, 0.82–0.89), 89% sensitivity, and F-score of 0.83; random forest classifier performed the best for the outcome of critical care admission with an AUC of 0.73 (95% CI, 0.70–0.77), 76% sensitivity, and F-score of 0.56. Conclusion: Predictive machine learning algorithms demonstrate excellent to good discriminative power to predict hospitalization and critical care admission, respectively. Administrators and clinicians could benefit from machine learning approaches to predict hospitalization and critical care admission, to optimize and allocate scarce ED and hospital resources and provide optimal care.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units

Frontiers in Neurology ◽

10.3389/fneur.2020.610531 ◽

2021 ◽

Vol 11 ◽

Author(s):

Ximing Nie ◽

Yuan Cai ◽

Jingyi Liu ◽

Xiran Liu ◽

Jiahui Zhao ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Random Forest ◽

Hospital Mortality ◽

Intensive Care Units ◽

Cerebral Hemorrhage ◽

Characteristic Curve ◽

Learning Algorithms ◽

Mortality Prediction ◽

Machine Learning Algorithms

Objectives: This study aims to investigate whether the machine learning algorithms could provide an optimal early mortality prediction method compared with other scoring systems for patients with cerebral hemorrhage in intensive care units in clinical practice.Methods: Between 2008 and 2012, from Intensive Care III (MIMIC-III) database, all cerebral hemorrhage patients monitored with the MetaVision system and admitted to intensive care units were enrolled in this study. The calibration, discrimination, and risk classification of predicted hospital mortality based on machine learning algorithms were assessed. The primary outcome was hospital mortality. Model performance was assessed with accuracy and receiver operating characteristic curve analysis.Results: Of 760 cerebral hemorrhage patients enrolled from MIMIC database [mean age, 68.2 years (SD, ±15.5)], 383 (50.4%) patients died in hospital, and 377 (49.6%) patients survived. The area under the receiver operating characteristic curve (AUC) of six machine learning algorithms was 0.600 (nearest neighbors), 0.617 (decision tree), 0.655 (neural net), 0.671(AdaBoost), 0.819 (random forest), and 0.725 (gcForest). The AUC was 0.423 for Acute Physiology and Chronic Health Evaluation II score. The random forest had the highest specificity and accuracy, as well as the greatest AUC, showing the best ability to predict in-hospital mortality.Conclusions: Compared with conventional scoring system and the other five machine learning algorithms in this study, random forest algorithm had better performance in predicting in-hospital mortality for cerebral hemorrhage patients in intensive care units, and thus further research should be conducted on random forest algorithm.

Download Full-text

Classification of hazelnut cultivars: comparison of DL4J and ensemble learning algorithms

Notulae Botanicae Horti Agrobotanici Cluj-Napoca ◽

10.15835/nbha48412041 ◽

2020 ◽

Vol 48 (4) ◽

pp. 2316-2327

Author(s):

Caner KOC ◽

Dilara GERDAN ◽

Maksut B. EMİNOĞLU ◽

Uğur YEGÜL ◽

Bulent KOC ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Random Forest ◽

Ensemble Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Performance Criteria ◽

Gradient Boosting ◽

Data Set

Classification of hazelnuts is one of the values adding processes that increase the marketability and profitability of its production. While traditional classification methods are used commonly, machine learning and deep learning can be implemented to enhance the hazelnut classification processes. This paper presents the results of a comparative study of machine learning frameworks to classify hazelnut (Corylus avellana L.) cultivars (‘Sivri’, ‘Kara’, ‘Tombul’) using DL4J and ensemble learning algorithms. For each cultivar, 50 samples were used for evaluations. Maximum length, width, compression strength, and weight of hazelnuts were measured using a caliper and a force transducer. Gradient boosting machine (Boosting), random forest (Bagging), and DL4J feedforward (Deep Learning) algorithms were applied in traditional machine learning algorithms. The data set was partitioned into a 10-fold-cross validation method. The classifier performance criteria of accuracy (%), error percentage (%), F-Measure, Cohen’s Kappa, recall, precision, true positive (TP), false positive (FP), true negative (TN), false negative (FN) values are provided in the results section. The results showed classification accuracies of 94% for Gradient Boosting, 100% for Random Forest, and 94% for DL4J Feedforward algorithms.

Download Full-text

Mapping Landslide Susceptibility Using Machine Learning Algorithms and GIS: A Case Study in Shexian County, Anhui Province, China

Symmetry ◽

10.3390/sym12121954 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1954 ◽

Cited By ~ 1

Author(s):

Zitao Wang ◽

Qimeng Liu ◽

Yu Liu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Landslide Susceptibility ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Good Effect ◽

Gradient Boosting ◽

Support Vector ◽

Four Levels ◽

Very High

In this study, Logistics Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Machine (GBM), and Multilayer Perceptron (MLP) machine learning algorithms are combined with GIS techniques to map landslide susceptibility in Shexian County, China. By using satellite images and various topographic and geological maps, 16 landslide susceptibility factor maps of Shexian County were initially constructed. In total, 502 landslide and random safety points were then using the “Extract Multivalues To Points” tool in ArcGIS, parameters for the 16 factors were extracted and imported into models for the five algorithms, of which 70% of samples were used for training and 30% of samples were used for verification, which makes sense for date symmetry. The Shexian grid was converted into 260130 vector points and imported into the five models, and the natural breakpoint method was used to divide the grid into four levels: low, moderate, high, and very high. Finally, by using column results gained using Area Under Curve (AUC) analysis and a grid chart, susceptibility results for mapping landslide prediction in Shexian County was compared using the five methods. Results indicate that the ratio of landslide points of high or very high levels from LR, SVM, RF, GBM, and MLP was 1.52, 1.77, 1.95, 1.83, and 1.64, and the ratio of very high landslide points to grade area was 1.92, 2.20, 2.98, 2.62, and 2.14, respectively. The success rate of training samples for the five methods was 0.781, 0.824, 0.853, 0.828, and 0.811, and prediction accuracy was 0.772, 0.803, 0.821, 0.815, and 0.803, respectively; the order of accuracy of the five algorithms was RF > SVM > MLP > GBM > LR. Our results indicate that the five machine learning algorithms have good effect on landslide susceptibility evaluation in Shexian area, with Random Forest having the best effect.

Download Full-text

Ionospheric VTEC Forecasting using Machine Learning

10.5194/egusphere-egu21-8907 ◽

2021 ◽

Author(s):

Randa Natras ◽

Michael Schmidt

Keyword(s):

Magnetic Field ◽

Machine Learning ◽

Random Forest ◽

Space Weather ◽

Early Warning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Learning Techniques

The accuracy and reliability of Global Navigation Satellite System (GNSS) applications are affected by the state of the Earth&#8216;s ionosphere, especially when using single frequency observations, which are employed mostly in mass-market GNSS receivers. In addition, space weather can be the cause of strong sudden disturbances in the ionosphere, representing a major risk for GNSS performance and reliability. Accurate corrections of ionospheric effects and early warning information in the presence of space weather are therefore crucial for GNSS applications. This correction information can be obtained by employing a model that describes the complex relation of space weather processes with the non-linear spatial and temporal variability of the Vertical Total Electron Content (VTEC) within the ionosphere and includes a forecast component considering space weather events to provide an early warning system. To develop such a model is challenging but an important task and of high interest for the GNSS community.To model the impact of space weather, a complex chain of physical dynamical processes between the Sun, the interplanetary magnetic field, the Earth's magnetic field and the ionosphere need to be taken into account. Machine learning techniques are suitable in finding patterns and relationships from historical data to solve problems that are too complex for a traditional approach requiring an extensive set of rules (equations) or for which there is no acceptable solution available yet.The main objective of this study is to develop a model for forecasting the ionospheric VTEC taking into account physical processes and utilizing state-of-art machine learning techniques to learn complex non-linear relationships from the data. In this work, supervised learning is applied to forecast VTEC. This means that the model is provided by a set of (input) variables that have some influence on the VTEC forecast (output). To be more specific, data of solar activity, solar wind, interplanetary and geomagnetic field and other information connected to the VTEC variability are used as input to predict VTEC values in the future. Different machine learning algorithms are applied, such as decision tree regression, random forest regression and gradient boosting. The decision trees are the simplest and easiest to interpret machine learning algorithms, but the forecasted VTEC lacks smoothness. On the other hand, random forest and gradient boosting use a combination of multiple regression trees, which lead to improvements in the prediction accuracy and smoothness. However, the results show that the overall performance of the algorithms, measured by the root mean square error, does not differ much from each other and improves when the data are well prepared, i.e. cleaned and transformed to remove trends. Preliminary results of this study will be presented including the methodology, goals, challenges and perspectives of developing the machine learning model.

Download Full-text

Random Forest Algorithm to Investigate the Case of Acute Coronary Syndrome

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i2.3000 ◽

2021 ◽

Vol 5 (2) ◽

pp. 369-378

Author(s):

Eka Pandu Cynthia ◽

M. Afif Rizky A. ◽

Alwis Nazir ◽

Fadhilah Syafria

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Random Forest ◽

Data Science ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Random Forest Algorithm ◽

Coronary Syndrome ◽

Use Of Data

This paper explains the use of the Random Forest Algorithm to investigate the Case of Acute Coronary Syndrome (ACS). The objectives of this study are to review the evaluation of the use of data science techniques and machine learning algorithms in creating a model that can classify whether or not cases of acute coronary syndrome occur. The research method used in this study refers to the IBM Foundational Methodology for Data Science, include: i) inventorying dataset about ACS, ii) preprocessing for the data into four sub-processes, i.e. requirements, collection, understanding, and preparation, iii) determination of RFA, i.e. the "n" of the tree which will form a forest and forming trees from the random forest that has been created, and iv) determination of the model evaluation and result in analysis based on Python programming language. Based on the experiments that the learning have been conducted using a random forest machine-learning algorithm with an n-estimator value of 100 and each tree's depth (max depth) with a value of 4, learning scenarios of 70:30, 80:20, and 90:10 on 444 cases of acute coronary syndrome data. The results show that the 70:30 scenario model has the best results, with an accuracy value of 83.45%, a precision value of 85%, and a recall value of 92.4%. Conclusions obtained from the experiment results were evaluated with various statistical metrics (accuracy, precision, and recall) in each learning scenario on 444 cases of acute coronary syndrome data with a cross-validation value of 10 fold.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text