Development and validation of prognosis model of mortality risk in patients with COVID-19

Abstract This study aimed to identify clinical features for prognosing mortality risk using machine-learning methods in patients with coronavirus disease 2019 (COVID-19). A retrospective study of the inpatients with COVID-19 admitted from 15 January to 15 March 2020 in Wuhan is reported. The data of symptoms, comorbidity, demographic, vital sign, CT scans results and laboratory test results on admission were collected. Machine-learning methods (Random Forest and XGboost) were used to rank clinical features for mortality risk. Multivariate logistic regression models were applied to identify clinical features with statistical significance. The predictors of mortality were lactate dehydrogenase (LDH), C-reactive protein (CRP) and age based on 500 bootstrapped samples. A multivariate logistic regression model was formed to predict mortality 292 in-sample patients with area under the receiver operating characteristics (AUROC) of 0.9521, which was better than CURB-65 (AUROC of 0.8501) and the machine-learning-based model (AUROC of 0.4530). An out-sample data set of 13 patients was further tested to show our model (AUROC of 0.6061) was also better than CURB-65 (AUROC of 0.4608) and the machine-learning-based model (AUROC of 0.2292). LDH, CRP and age can be used to identify severe patients with COVID-19 on hospital admission.

Download Full-text

Post-stroke Anxiety Analysis via Machine Learning Methods

Frontiers in Aging Neuroscience ◽

10.3389/fnagi.2021.657937 ◽

2021 ◽

Vol 13 ◽

Author(s):

Jirui Wang ◽

Defeng Zhao ◽

Meiqing Lin ◽

Xinyu Huang ◽

Xiuli Shang

Keyword(s):

Diabetes Mellitus ◽

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Ischemic Stroke ◽

Multivariate Logistic Regression ◽

Learning Methods ◽

Anxiety Group ◽

Post Stroke ◽

Machine Learning Methods

Post-stroke anxiety (PSA) has caused wide public concern in recent years, and the study on risk factors analysis and prediction is still an open issue. With the deepening of the research, machine learning has been widely applied to various scenarios and make great achievements increasingly, which brings new approaches to this field. In this paper, 395 patients with acute ischemic stroke are collected and evaluated by anxiety scales (i.e., HADS-A, HAMA, and SAS), hence the patients are divided into anxiety group and non-anxiety group. Afterward, the results of demographic data and general laboratory examination between the two groups are compared to identify the risk factors with statistical differences accordingly. Then the factors with statistical differences are incorporated into a multivariate logistic regression to obtain risk factors and protective factors of PSA. Statistical analysis shows great differences in gender, age, serious stroke, hypertension, diabetes mellitus, drinking, and HDL-C level between PSA group and non-anxiety group with HADS-A and HAMA evaluation. Meanwhile, as evaluated by SAS scale, gender, serious stroke, hypertension, diabetes mellitus, drinking, and HDL-C level differ in the PSA group and the non-anxiety group. Multivariate logistic regression analysis of HADS-A, HAMA, and SAS scales suggest that hypertension, diabetes mellitus, drinking, high NIHSS score, and low serum HDL-C level are related to PSA. In other words, gender, age, disability, hypertension, diabetes mellitus, HDL-C, and drinking are closely related to anxiety during the acute stage of ischemic stroke. Hypertension, diabetes mellitus, drinking, and disability increased the risk of PSA, and higher serum HDL-C level decreased the risk of PSA. Several machine learning methods are employed to predict PSA according to HADS-A, HAMA, and SAS scores, respectively. The experimental results indicate that random forest outperforms the competitive methods in PSA prediction, which contributes to early intervention for clinical treatment.

Download Full-text

Automatic Misinformation Detection About COVID-19 in Brazilian Portuguese WhatsApp Messages

10.5753/sbbd_estendido.2021.18173 ◽

2021 ◽

Author(s):

Antônio Diogo Forte Martins ◽

José Maria Monteiro ◽

Javam Machado

Keyword(s):

Machine Learning ◽

Social Networks ◽

Brazilian Portuguese ◽

Primary Sources ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

During the coronavirus pandemic, the problem of misinformation arose once again, quite intensely, through social networks. In Brazil, one of the primary sources of misinformation is the messaging application WhatsApp. However, due to WhatsApp's private messaging nature, there still few methods of misinformation detection developed specifically for this platform. In this context, the automatic misinformation detection (MID) about COVID-19 in Brazilian Portuguese WhatsApp messages becomes a crucial challenge. In this work, we present the COVID-19.BR, a data set of WhatsApp messages about coronavirus in Brazilian Portuguese, collected from Brazilian public groups and manually labeled. Then, we are investigating different machine learning methods in order to build an efficient MID for WhatsApp messages. So far, our best result achieved an F1 score of 0.774 due to the predominance of short texts. However, when texts with less than 50 words are filtered, the F1 score rises to 0.85.

Download Full-text

Comparison of machine learning methods for crack localization

Acta et Commentationes Universitatis Tartuensis de Mathematica ◽

10.12697/acutm.2019.23.13 ◽

2019 ◽

Vol 23 (1) ◽

pp. 125-142

Author(s):

Helle Hein ◽

Ljubov Jaanuska

Keyword(s):

Machine Learning ◽

Random Forests ◽

Crack Depth ◽

Haar Wavelet ◽

Extensive Investigation ◽

Learning Methods ◽

Data Set ◽

Crack Location ◽

Machine Learning Methods ◽

Discrete Transform

In this paper, the Haar wavelet discrete transform, the artificial neural networks (ANNs), and the random forests (RFs) are applied to predict the location and severity of a crack in an Euler–Bernoulli cantilever subjected to the transverse free vibration. An extensive investigation into two data collection sets and machine learning methods showed that the depth of a crack is more difficult to predict than its location. The data set of eight natural frequency parameters produces more accurate predictions on the crack depth; meanwhile, the data set of eight Haar wavelet coefficients produces more precise predictions on the crack location. Furthermore, the analysis of the results showed that the ensemble of 50 ANN trained by Bayesian regularization and Levenberg–Marquardt algorithms slightly outperforms RF.

Download Full-text

Modelling of diesel engine performance using advanced machine learning methods under scarce and exponential data set

Applied Soft Computing ◽

10.1016/j.asoc.2013.06.006 ◽

2013 ◽

Vol 13 (11) ◽

pp. 4428-4441 ◽

Cited By ~ 25

Author(s):

Ka In Wong ◽

Pak Kin Wong ◽

Chun Shun Cheung ◽

Chi Man Vong

Keyword(s):

Machine Learning ◽

Diesel Engine ◽

Engine Performance ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods

Download Full-text

Analysis of Cancer Data Set with Statistical and Unsupervised Machine Learning Methods

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1921-1_27 ◽

2018 ◽

pp. 267-276

Author(s):

T. Panduranga Vital ◽

K. Dileep Kumar ◽

H. V. Bhagya Sri ◽

M. Murali Krishna

Keyword(s):

Machine Learning ◽

Learning Methods ◽

Data Set ◽

Unsupervised Machine Learning ◽

Cancer Data ◽

Machine Learning Methods

Download Full-text

Assessing Replicability of Machine Learning Results: An Introduction to Methods on Predictive Accuracy in Social Sciences

Social Science Computer Review ◽

10.1177/0894439319888445 ◽

2019 ◽

pp. 089443931988844

Author(s):

Ranjith Vijayakumar ◽

Mike W.-L. Cheung

Keyword(s):

Machine Learning ◽

Empirical Data ◽

Fixed Effects ◽

Predictive Accuracy ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Replication Studies ◽

Machine Learning Methods ◽

Accuracy Measure

Machine learning methods have become very popular in diverse fields due to their focus on predictive accuracy, but little work has been conducted on how to assess the replicability of their findings. We introduce and adapt replication methods advocated in psychology to the aims and procedural needs of machine learning research. In Study 1, we illustrate these methods with the use of an empirical data set, assessing the replication success of a predictive accuracy measure, namely, R 2 on the cross-validated and test sets of the samples. We introduce three replication aims. First, tests of inconsistency examine whether single replications have successfully rejected the original study. Rejection will be supported if the 95% confidence interval (CI) of R 2 difference estimates between replication and original does not contain zero. Second, tests of consistency help support claims of successful replication. We can decide apriori on a region of equivalence, where population values of the difference estimates are considered equivalent for substantive reasons. The 90% CI of a different estimate lying fully within this region supports replication. Third, we show how to combine replications to construct meta-analytic intervals for better precision of predictive accuracy measures. In Study 2, R 2 is reduced from the original in a subset of replication studies to examine the ability of the replication procedures to distinguish true replications from nonreplications. We find that when combining studies sampled from same population to form meta-analytic intervals, random-effects methods perform best for cross-validated measures while fixed-effects methods work best for test measures. Among machine learning methods, regression was comparable to many complex methods, while support vector machine performed most reliably across a variety of scenarios. Social scientists who use machine learning to model empirical data can use these methods to enhance the reliability of their findings.

Download Full-text

Evaluation of Tree-Based Ensemble Machine Learning Models in Predicting Stock Price Direction of Movement

Information ◽

10.3390/info11060332 ◽

2020 ◽

Vol 11 (6) ◽

pp. 332

Author(s):

Ernest Kwame Ampomah ◽

Zhiguang Qin ◽

Gabriel Nyame

Keyword(s):

Machine Learning ◽

Stock Market ◽

Stock Price ◽

Superior Performance ◽

Operating Characteristics ◽

Training Set ◽

Data Set ◽

Test Set ◽

Ensemble Machine Learning ◽

Better Than

Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.

Download Full-text

Early Prediction of the Carbapenem Resistance Gram-negative Bacteria Carriage in Intensive Care Unit using Machine-Learning

10.21203/rs.3.rs-60222/v1 ◽

2020 ◽

Author(s):

Qiqiang Liang ◽

Qinyu Zhao ◽

Xin Xu ◽

Yu Zhou ◽

Man Huang

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Carbapenem Resistance ◽

Multivariate Logistic Regression Model ◽

Gram Negative Bacteria ◽

Multivariate Logistic Regression ◽

Gram Negative ◽

Better Than

Abstract Background The prevention and control of carbapenem-resistance gram-negative bacteria (CR-GNB) is the difficulty and focus for clinicians in the intensive care unit (ICU). This study construct a CR-GNB carriage prediction model in order to predict the CR-GNB incidence in one week. Methods The database is comprised of nearly 10,000 patients. the model is constructed by the multivariate logistic regression model and three machine learning algorithms. Then we choose the optimal model and verify the accuracy by daily predicted and recorded the occurrence of CR-GNB of all patients admitted for 4 months. Results There are 1385 patients with positive CR-GNB cultures and 1535 negative patients in this study. Forty-five variables have statistical significant differences. We include the 17 variables in the multivariate logistic regression model and build three machine learning models for all variables. In terms of accuracy and the area under the receiver operating characteristic (AUROC) curve, the random forest is better than XGBoost and multivariate logistic regression model, and better than decision tree model (accuracy: 84% >82%>81%>72%), (AUROC: 0.9089 > 0.8947 ≈ 0.8987 > 0.7845). In the 4-month prospective study, 81 cases were predicted to be positive in CR-GNB culture within 7 days, 146 cases were predicted to be negative, 86 cases were positive, and 120 cases were negative, with an overall accuracy of 84% and AUROC of 91.98%. Conclusions Prediction models by machine learning can predict the occurrence of CR-GNB colonization or infection within a week period, and can real-time predict and guide medical staff to identify high-risk groups more accurately.

Download Full-text

Prediction of Collapsibility of Loess of Construction Sites in Xining Based on Machine Learning Methods

10.21203/rs.3.rs-307514/v1 ◽

2021 ◽

Author(s):

Qifei Zhao ◽

Xiaojun Li ◽

Yunning Cao ◽

Zhikun Li ◽

Jixin Fan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Engineering Practice ◽

Burial Depth ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

North East

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.

Download Full-text

Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-3-26-53-61 ◽

2020 ◽

Vol 26 (3) ◽

pp. 53-61

Author(s):

Pavel Kikin ◽

Alexey Kolesnikov ◽

Alexey Portnov ◽

Denis Grischenko

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Mathematical Models ◽

Optimal Algorithm ◽

The State ◽

Gradient Boosting ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

Spatio Temporal

The state of ecological systems, along with their general characteristics, is almost always described by indicators that vary in space and time, which leads to a significant complication of constructing mathematical models for predicting the state of such systems. One of the ways to simplify and automate the construction of mathematical models for predicting the state of such systems is the use of machine learning methods. The article provides a comparison of traditional and based on neural networks, algorithms and machine learning methods for predicting spatio-temporal series representing ecosystem data. Analysis and comparison were carried out among the following algorithms and methods: logistic regression, random forest, gradient boosting on decision trees, SARIMAX, neural networks of long-term short-term memory (LSTM) and controlled recurrent blocks (GRU). To conduct the study, data sets were selected that have both spatial and temporal components: the values of the number of mosquitoes, the number of dengue infections, the physical condition of tropical grove trees, and the water level in the river. The article discusses the necessary steps for preliminary data processing, depending on the algorithm used. Also, Kolmogorov complexity was calculated as one of the parameters that can help formalize the choice of the most optimal algorithm when constructing mathematical models of spatio-temporal data for the sets used. Based on the results of the analysis, recommendations are given on the application of certain methods and specific technical solutions, depending on the characteristics of the data set that describes a particular ecosystem

Download Full-text