Machine learning approaches for predicting difficult airway and first-pass success in the emergency department: A multiple prospective observational study (Preprint)

BACKGROUND There is still room for improvement in the modified LEMON criteria for difficult airway prediction and no prediction tool for first-pass success in the ED. OBJECTIVE We applied modern machine learning approaches to predict difficult airway and first-pass success. METHODS In a multicenter prospective study that enrolled consecutive patients who underwent tracheal intubation in the 13 EDs, we developed seven machine learning models (e.g., random forest model) using routinely collected data (e.g., demographics, initial airway assessment). The outcomes were difficult airway and first-pass success. Model performance was evaluated by c-statistics, calibration slope, and association measures (e.g., sensitivity) in the test set (randomly-selected 20% of data). Their performance was compared with the modified LEMON criteria for the difficult airway and with a logistic regression model for the first-pass success. RESULTS Of 10,741 patients who underwent intubation, 543 patients (5%) had a difficult airway, and 7,690 patients (71%) had first-pass success. In predicting the difficult airway, machine learning models—except for k-point nearest neighbor and multilayer perceptron—had a higher discrimination ability compared with the modified LEMON criteria (P<0.01). For example, the ensemble method had the highest c-statistic (0.74 vs 0.62 in the modified LEMON criteria; P <0.01). For the first-pass success, machine learning models—except for k-point nearest neighbor and random forest models—had a higher discrimination ability. In particular, the ensemble model had the highest c-statistic (0.81 vs 0.76 in the reference regression; P <0.01). CONCLUSIONS Machine learning models demonstrated a greater ability in predicting difficult airway and first-pass success in the ED.

Download Full-text

Graduate Admission Prediction Using Machine Learning

International Journal of Computers and Communications ◽

10.46300/91013.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

Educational Institutions ◽

Master's Program ◽

Learning Models ◽

K Nearest Neighbor ◽

Machine Learning Models

Student admission problem is very important in educational institutions. This paper addresses machine learning models to predict the chance of a student to be admitted to a master’s program. This will assist students to know in advance if they have a chance to get accepted. The machine learning models are multiple linear regression, k-nearest neighbor, random forest, and Multilayer Perceptron. Experiments show that the Multilayer Perceptron model surpasses other models.

Download Full-text

Performance of Statistical and Machine Learning-Based Methods for Predicting Biogeographical Patterns of Fungal Productivity in Forest Ecosystems

10.21203/rs.3.rs-122045/v1 ◽

2020 ◽

Author(s):

Albert Morera ◽

Juan Martínez de Aragón ◽

José Antonio Bonet ◽

Jingjing Liang ◽

Sergio de-Miguel

Keyword(s):

Machine Learning ◽

Random Forest ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models ◽

Modelling Approaches

Abstract BackgroundThe prediction of biogeographical patterns from a large number of driving factors with complex interactions, correlations and non-linear dependences require advanced analytical methods and modelling tools. This study compares different statistical and machine learning models for predicting fungal productivity biogeographical patterns as a case study for the thorough assessment of the performance of alternative modelling approaches to provide accurate and ecologically-consistent predictions.MethodsWe evaluated and compared the performance of two statistical modelling techniques, namely, generalized linear mixed models and geographically weighted regression, and four machine learning models, namely, random forest, extreme gradient boosting, support vector machine and deep learning to predict fungal productivity. We used a systematic methodology based on substitution, random, spatial and climatic blocking combined with principal component analysis, together with an evaluation of the ecological consistency of spatially-explicit model predictions.ResultsFungal productivity predictions were sensitive to the modelling approach and complexity. Moreover, the importance assigned to different predictors varied between machine learning modelling approaches. Decision tree-based models increased prediction accuracy by ~7% compared to other machine learning approaches and by more than 25% compared to statistical ones, and resulted in higher ecological consistence at the landscape level.ConclusionsWhereas a large number of predictors are often used in machine learning algorithms, in this study we show that proper variable selection is crucial to create robust models for extrapolation in biophysically differentiated areas. When dealing with spatial-temporal data in the analysis of biogeographical patterns, climatic blocking is postulated as a highly informative technique to be used in cross-validation to assess the prediction error over larger scales. Random forest was the best approach for prediction both in sampling-like environments as well as in extrapolation beyond the spatial and climatic range of the modelling data.

Download Full-text

Development and validation of risk scores for all-cause mortality for the purposes of a smartphone-based ‘general health score’ application: a prospective cohort study using the UK Biobank (Preprint)

10.2196/preprints.25655 ◽

2020 ◽

Author(s):

Ashley K. Clift ◽

Erwann Le Lannou ◽

Arsi Hyvärinen ◽

Sachin S. Shah ◽

Devin D. Dunn ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Cox Model ◽

Statistical Modelling ◽

Learning Approaches ◽

Uk Biobank ◽

Learning Models ◽

C Statistic ◽

All Cause Mortality ◽

Machine Learning Models

BACKGROUND Even though established links exist between individuals behaviours and potentially adverse health outcomes, to date either univariate, simpler models or multivariate, yet difficult to employ ones, have been developed. Such models are unlikely to be successful at capturing the wider determinants of health in the broader population. Hence, there is a need for a multidimensional, yet widely employable and accessible, way to obtain a comprehensive health metric. OBJECTIVE To develop and validate a novel, easily interpretable points-based health score ("C-Score") derived from metrics measurable using smartphone components, and iterations thereof that utilise statistical modelling and machine learning approaches. METHODS Comprehensive literature review to identify suitable predictor variables for inclusion in a first iteration points-based model. This was followed by a prospective cohort study in a UK Biobank population for the purposes of validating the C-Score, and developing and comparatively validating variations of the score using statistical/machine learning models to assess the balance between expediency and ease of interpretability versus model complexity. Primary and secondary outcome measures: Discrimination of a points-based score for all-cause mortality within 10 years (Harrell’s c-statistic). Discrimination and calibration of Cox proportional hazards models and machine learning models that incorporate C-Score values (or raw data inputs) and other predictors to predict risk of all-cause mortality within 10 years. RESULTS The cohort comprised 420,560 individuals. During a cohort follow-up of 4,526,452 person-years, there were 16,188 deaths from any cause (3.85%). The points-based model had good discrimination (c-statistic = 0.66). There was a 31% relative reduction in risk of all-cause mortality per decile of increasing C-Score (hazard ratio: 0.69, 95% CI: 0.663 to 0.675). A Cox model integrating age and C-Score had improved discrimination (8% percentage points, c-statistic = 0.74) and good calibration. Machine learning approaches did not offer improved discrimination over statistical modelling. CONCLUSIONS The novel health metric (‘C-Score’) has good predictive capabilities for all-cause mortality within 10 years. Embedding C-Score within a smartphone application may represent a useful tool for democratised, individualised health risk prediction. A simple Cox model using C-Score and age optimally balances parsimony and accuracy of risk predictions and could be used to produce absolute risk estimations for application users.

Download Full-text

Development and validation of risk scores for all-cause mortality for the purposes of a smartphone-based ‘general health score’ application: a prospective cohort study using the UK Biobank

10.1101/2020.11.23.20229161 ◽

2020 ◽

Author(s):

Ashley K. Clift ◽

Erwann Le Lannou ◽

Christian P. Tighe ◽

Sachin S. Shah ◽

Matthew Beatty ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Cox Model ◽

Statistical Modelling ◽

Learning Approaches ◽

Uk Biobank ◽

Learning Models ◽

C Statistic ◽

All Cause Mortality ◽

Machine Learning Models

AbstractBackgroundEven though established links exist between individuals behaviours and potentially adverse health outcomes, to date either univariate, simpler models or multivariate, yet difficult to employ ones, have been developed. Such models are unlikely to be successful at capturing the wider determinants of health in the broader population. Hence, there is a need for a multidimensional, yet widely employable and accessible, way to obtain a comprehensive health metric.ObjectiveTo develop and validate a novel, easily interpretable points-based health score (“C-Score”) derived from metrics measurable using smartphone components, and iterations thereof that utilise statistical modelling and machine learning approaches.MethodsComprehensive literature review to identify suitable predictor variables for inclusion in a first iteration points-based model. This was followed by a prospective cohort study in a UK Biobank population for the purposes of validating the C-Score, and developing and comparatively validating variations of the score using statistical/machine learning models to assess the balance between expediency and ease of interpretability versus model complexity. Primary and secondary outcome measures: Discrimination of a points-based score for all-cause mortality within 10 years (Harrell’s c-statistic). Discrimination and calibration of Cox proportional hazards models and machine learning models that incorporate C-Score values (or raw data inputs) and other predictors to predict risk of all-cause mortality within 10 years.ResultsThe cohort comprised 420,560 individuals. During a cohort follow-up of 4,526,452 person-years, there were 16,188 deaths from any cause (3.85%). The points-based model had good discrimination (c-statistic = 0.66). There was a 31% relative reduction in risk of all-cause mortality per decile of increasing C-Score (hazard ratio: 0.69, 95% CI: 0.663 to 0.675). A Cox model integrating age and C-Score had improved discrimination (8% percentage points, c-statistic = 0.74) and good calibration. Machine learning approaches did not offer improved discrimination over statistical modelling.ConclusionsThe novel health metric (‘C-Score’) has good predictive capabilities for all-cause mortality within 10 years. Embedding C-Score within a smartphone application may represent a useful tool for democratised, individualised health risk prediction. A simple Cox model using C-Score and age optimally balances parsimony and accuracy of risk predictions and could be used to produce absolute risk estimations for application users.

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Machine learning approaches to understand and predict rate constants for organic processes in mixtures containing ionic liquids

Physical Chemistry Chemical Physics ◽

10.1039/d0cp04227g ◽

2021 ◽

Vol 23 (4) ◽

pp. 2742-2752

Author(s):

Tamar L. Greaves ◽

Karin S. Schaffarczyk McHale ◽

Raphael F. Burkart-Radke ◽

Jason B. Harper ◽

Tu C. Le

Keyword(s):

Machine Learning ◽

Ionic Liquids ◽

Rate Constants ◽

Learning Approaches ◽

Learning Models ◽

Organic Reaction ◽

Machine Learning Models ◽

Selection Of

Machine learning models were developed for an organic reaction in ionic liquids and validated on a selection of ionic liquids.

Download Full-text

Random forest and long short-term memory based machine learning models for classification of ion mobility spectrometry spectra

Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE) Sensing XXII ◽

10.1117/12.2585829 ◽

2021 ◽

Author(s):

Patrick C. Riley ◽

Samir V. Deshpande ◽

Brian S. Ince ◽

Brian C. Hauck ◽

Kyle P. O'Donnell ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Ion Mobility ◽

Short Term Memory ◽

Learning Models ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Machine Learning Models

Download Full-text

Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models to Predict Foodborne Pathogen Presence in Agricultural Water

Frontiers in Environmental Science ◽

10.3389/fenvs.2021.701288 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Lowell Weller ◽

Tanzy M. T. Love ◽

Martin Wiedmann

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Models ◽

Training Data ◽

Agricultural Water ◽

Learning Models ◽

Safety Hazards ◽

E Coli ◽

Resampling Method ◽

Machine Learning Models

Recent studies have shown that predictive models can supplement or provide alternatives to E. coli-testing for assessing the potential presence of food safety hazards in water used for produce production. However, these studies used balanced training data and focused on enteric pathogens. As such, research is needed to determine 1) if predictive models can be used to assess Listeria contamination of agricultural water, and 2) how resampling (to deal with imbalanced data) affects performance of these models. To address these knowledge gaps, this study developed models that predict nonpathogenic Listeria spp. (excluding L. monocytogenes) and L. monocytogenes presence in agricultural water using various combinations of learner (e.g., random forest, regression), feature type, and resampling method (none, oversampling, SMOTE). Four feature types were used in model training: microbial, physicochemical, spatial, and weather. “Full models” were trained using all four feature types, while “nested models” used between one and three types. In total, 45 full (15 learners*3 resampling approaches) and 108 nested (5 learners*9 feature sets*3 resampling approaches) models were trained per outcome. Model performance was compared against baseline models where E. coli concentration was the sole predictor. Overall, the machine learning models outperformed the baseline E. coli models, with random forests outperforming models built using other learners (e.g., rule-based learners). Resampling produced more accurate models than not resampling, with SMOTE models outperforming, on average, oversampling models. Regardless of resampling method, spatial and physicochemical water quality features drove accurate predictions for the nonpathogenic Listeria spp. and L. monocytogenes models, respectively. Overall, these findings 1) illustrate the need for alternatives to existing E. coli-based monitoring programs for assessing agricultural water for the presence of potential food safety hazards, and 2) suggest that predictive models may be one such alternative. Moreover, these findings provide a conceptual framework for how such models can be developed in the future with the ultimate aim of developing models that can be integrated into on-farm risk management programs. For example, future studies should consider using random forest learners, SMOTE resampling, and spatial features to develop models to predict the presence of foodborne pathogens, such as L. monocytogenes, in agricultural water when the training data is imbalanced.

Download Full-text