scholarly journals Development and External Evaluation of Predictions Models for Mortality of COVID-19 Patients using Machine Learning Method

2020 ◽  
Author(s):  
Simin Li ◽  
Yulan Lin ◽  
Tong Zhu ◽  
Mengjie Fan ◽  
Shicheng Xu ◽  
...  

Abstract Background To develop and evaluate the prognostic machine-learning model for mortality in patients with coronavirus disease 2019 (COVID-19).Methods Clinical data of confirmed COVID-19 were retrospectively collected from Wuhan between 18th January and 29th March 2020. Gradient Boosting Decision Tree (GBDT), logistic regression (LR) model, and simplified LR with selected 5 features (LR-5) model were built to predict the mortality of COVID-19. 5-fold area under curve (AUC), accuracy, positive predictive value (PPV), and negative predictive value (NPV) were calculated and compared between models.Results A total of 2,924 patients were included in the final analysis, 257(8.8%) of whom died during hospitalization and 2,667 (91.2%) survived. There were 21(0.7%) mild cases, 2,051(70.1%) moderate case, 779(26.6%) severe cases, and 73(2.5%) critically severe cases of COVID-19 on admission. The overall 5-fold AUC was observed highest in GBDT model (0.941), followed by LR (0.928) and LR-5 (0.913). The diagnostic accuracy were 0.889 in GBDT, 0.868 in LR and 0.887 in LR-5. GBDT model also showed the highest sensitivity (0.899) and speciality (0.889). The NPV of all three models exceeded 97%, while the PPV were relatively low in all models, 0.381 for LR, 0.402 for LR-5 and 0.432 for GBDT. In subgroups analysis with severe cases only, GBDT model also performed the best with a accuracy of 0.799 and 5-fold AUC (0.918).Conclusion The finding revealed that mortality prediction performance of the GBDT was superior to the LR models in confirmed cases of COVID-19, regardless of disease severity.

Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4068
Author(s):  
Xu Huang ◽  
Mirna Wasouf ◽  
Jessada Sresakoolchai ◽  
Sakdirat Kaewunruen

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Mohammad Nahid Hossain ◽  
Mohammad Helal Uddin ◽  
K. Thapa ◽  
Md Abdullah Al Zubaer ◽  
Md Shafiqul Islam ◽  
...  

Cognitive impairment has a significantly negative impact on global healthcare and the community. Holding a person’s cognition and mental retention among older adults is improbable with aging. Early detection of cognitive impairment will decline the most significant impact of extended disease to permanent mental damage. This paper aims to develop a machine learning model to detect and differentiate cognitive impairment categories like severe, moderate, mild, and normal by analyzing neurophysical and physical data. Keystroke and smartwatch have been used to extract individuals’ neurophysical and physical data, respectively. An advanced ensemble learning algorithm named Gradient Boosting Machine (GBM) is proposed to classify the cognitive severity level (absence, mild, moderate, and severe) based on the Standardised Mini-Mental State Examination (SMMSE) questionnaire scores. The statistical method “Pearson’s correlation” and the wrapper feature selection technique have been used to analyze and select the best features. Then, we have conducted our proposed algorithm GBM on those features. And the result has shown an accuracy of more than 94%. This paper has added a new dimension to the state-of-the-art to predict cognitive impairment by implementing neurophysical data and physical data together.


Diagnostics ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 2102
Author(s):  
Eyal Klang ◽  
Robert Freeman ◽  
Matthew A. Levin ◽  
Shelly Soffer ◽  
Yiftach Barash ◽  
...  

Background & Aims: We aimed at identifying specific emergency department (ED) risk factors for developing complicated acute diverticulitis (AD) and evaluate a machine learning model (ML) for predicting complicated AD. Methods: We analyzed data retrieved from unselected consecutive large bowel AD patients from five hospitals from the Mount Sinai health system, NY. The study time frame was from January 2011 through March 2021. Data were used to train and evaluate a gradient-boosting machine learning model to identify patients with complicated diverticulitis, defined as a need for invasive intervention or in-hospital mortality. The model was trained and evaluated on data from four hospitals and externally validated on held-out data from the fifth hospital. Results: The final cohort included 4997 AD visits. Of them, 129 (2.9%) visits had complicated diverticulitis. Patients with complicated diverticulitis were more likely to be men, black, and arrive by ambulance. Regarding laboratory values, patients with complicated diverticulitis had higher levels of absolute neutrophils (AUC 0.73), higher white blood cells (AUC 0.70), platelet count (AUC 0.68) and lactate (AUC 0.61), and lower levels of albumin (AUC 0.69), chloride (AUC 0.64), and sodium (AUC 0.61). In the external validation cohort, the ML model showed AUC 0.85 (95% CI 0.78–0.91) for predicting complicated diverticulitis. For Youden’s index, the model showed a sensitivity of 88% with a false positive rate of 1:3.6. Conclusions: A ML model trained on clinical measures provides a proof of concept performance in predicting complications in patients presenting to the ED with AD. Clinically, it implies that a ML model may classify low-risk patients to be discharged from the ED for further treatment under an ambulatory setting.


An Individual method of living on with a daily existence it directly influences on your overall health. Since stress is the significant infection of our human body. Like depression, heart attack and mental illness. WHO says “Globally, more than 264 million people of all ages suffer from depression.”[8]. Also the report says that most of the time people are stressed because of their work. 10.7% of People disorder with stress, anxiety and depression [8]. There are different method to discovering stress ex. Smart watches, chest belt, and extraordinary machine. Our principle objective is to figure out pressure progressively utilizing smart watches through their Sensor. There are different kinds of sensor available to find stress such as PPG, GSR, HRV, ECG and temperature. Smart watches contain a wide range of data through various sensor. This kind of gathered information are applied on various machine learning method. Like linear regression, SVM, KNN, decision tree. Technique have distinct, comparing accuracy and chooses best Machine learning model. This paper investigation have different analysis to find and compare accuracy by various sensors data. It is also check whether using one sensor or multiple sensors such as HRV, ECG or GSR and PPG to predict the better accuracy score for stress detection.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
B. A Omodunbi

Diabetes mellitus is a health disorder that occurs when the blood sugar level becomes extremely high due to body resistance in producing the required amount of insulin. The aliment happens to be among the major causes of death in Nigeria and the world at large. This study was carried out to detect diabetes mellitus by developing a hybrid model that comprises of two machine learning model namely Light Gradient Boosting Machine (LGBM) and K-Nearest Neighbor (KNN). This research is aimed at developing a machine learning model for detecting the occurrence of diabetes in patients. The performance metrics employed in evaluating the finding for this study are Receiver Operating Characteristics (ROC) Curve, Five-fold Cross-validation, precision, and accuracy score. The proposed system had an accuracy of 91% and the area under the Receiver Operating Characteristic Curve was 93%. The experimental result shows that the prediction accuracy of the hybrid model is better than traditional machine learning


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Zvi Segal ◽  
Dan Kalifa ◽  
Kira Radinsky ◽  
Bar Ehrenberg ◽  
Guy Elad ◽  
...  

Abstract Background End stage renal disease (ESRD) describes the most severe stage of chronic kidney disease (CKD), when patients need dialysis or renal transplant. There is often a delay in recognizing, diagnosing, and treating the various etiologies of CKD. The objective of the present study was to employ machine learning algorithms to develop a prediction model for progression to ESRD based on a large-scale multidimensional database. Methods This study analyzed 10,000,000 medical insurance claims from 550,000 patient records using a commercial health insurance database. Inclusion criteria were patients over the age of 18 diagnosed with CKD Stages 1–4. We compiled 240 predictor candidates, divided into six feature groups: demographics, chronic conditions, diagnosis and procedure features, medication features, medical costs, and episode counts. We used a feature embedding method based on implementation of the Word2Vec algorithm to further capture temporal information for the three main components of the data: diagnosis, procedures, and medications. For the analysis, we used the gradient boosting tree algorithm (XGBoost implementation). Results The C-statistic for the model was 0.93 [(0.916–0.943) 95% confidence interval], with a sensitivity of 0.715 and specificity of 0.958. Positive Predictive Value (PPV) was 0.517, and Negative Predictive Value (NPV) was 0.981. For the top 1 percentile of patients identified by our model, the PPV was 1.0. In addition, for the top 5 percentile of patients identified by our model, the PPV was 0.71. All the results above were tested on the test data only, and the threshold used to obtain these results was 0.1. Notable features contributing to the model were chronic heart and ischemic heart disease as a comorbidity, patient age, and number of hypertensive crisis events. Conclusions When a patient is approaching the threshold of ESRD risk, a warning message can be sent electronically to the physician, who will initiate a referral for a nephrology consultation to ensure an investigation to hasten the establishment of a diagnosis and initiate management and therapy when appropriate.


2020 ◽  
Vol 34 (01) ◽  
pp. 784-791 ◽  
Author(s):  
Qinbin Li ◽  
Zhaomin Wu ◽  
Zeyi Wen ◽  
Bingsheng He

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Chalachew Muluken Liyew ◽  
Haileyesus Amsaya Melese

AbstractPredicting the amount of daily rainfall improves agricultural productivity and secures food and water supply to keep citizens healthy. To predict rainfall, several types of research have been conducted using data mining and machine learning techniques of different countries’ environmental datasets. An erratic rainfall distribution in the country affects the agriculture on which the economy of the country depends on. Wise use of rainfall water should be planned and practiced in the country to minimize the problem of the drought and flood occurred in the country. The main objective of this study is to identify the relevant atmospheric features that cause rainfall and predict the intensity of daily rainfall using machine learning techniques. The Pearson correlation technique was used to select relevant environmental variables which were used as an input for the machine learning model. The dataset was collected from the local meteorological office at Bahir Dar City, Ethiopia to measure the performance of three machine learning techniques (Multivariate Linear Regression, Random Forest, and Extreme Gradient Boost). Root mean squared error and Mean absolute Error methods were used to measure the performance of the machine learning model. The result of the study revealed that the Extreme Gradient Boosting machine learning algorithm performed better than others.


Sign in / Sign up

Export Citation Format

Share Document