scholarly journals An ensemble prediction model for COVID-19 mortality risk

Author(s):  
Jie Li ◽  
Xin Li ◽  
John Hutchinson ◽  
Mohammad Asad ◽  
Yadong Wang ◽  
...  

Background: It's critical to identify COVID-19 patients with a higher death risk at early stage to give them better hospitalization or intensive care. However, thus far, none of the machine learning models has been shown to be successful in an independent cohort. We aim to develop a machine learning model which could accurately predict death risk of COVID-19 patients at an early stage in other independent cohorts. Methods: We used a cohort containing 4711 patients whose clinical features associated with patient physiological conditions or lab test data associated with inflammation, hepatorenal function, cardiovascular function and so on to identify key features. To do so, we first developed a novel data preprocessing approach to clean up clinical features and then developed an ensemble machine learning method to identify key features. Results: Finally, we identified 14 key clinical features whose combination reached a good predictive performance of AUC 0.907. Most importantly, we successfully validated these key features in a large independent cohort containing 15,790 patients. Conclusions: Our study shows that 14 key features are robust and useful in predicting the risk of death in patients confirmed SARS-CoV-2 infection at an early stage, and potentially useful in clinical settings to help in making clinical decisions.

Blood ◽  
2020 ◽  
Vol 136 (20) ◽  
pp. 2249-2262 ◽  
Author(s):  
Yasunobu Nagata ◽  
Ran Zhao ◽  
Hassan Awada ◽  
Cassandra M. Kerr ◽  
Inom Mirzaev ◽  
...  

Abstract Morphologic interpretation is the standard in diagnosing myelodysplastic syndrome (MDS), but it has limitations, such as varying reliability in pathologic evaluation and lack of integration with genetic data. Somatic events shape morphologic features, but the complexity of morphologic and genetic changes makes clear associations challenging. This article interrogates novel clinical subtypes of MDS using a machine-learning technique devised to identify patterns of cooccurrence among morphologic features and genomic events. We sequenced 1079 MDS patients and analyzed bone marrow morphologic alterations and other clinical features. A total of 1929 somatic mutations were identified. Five distinct morphologic profiles with unique clinical characteristics were defined. Seventy-seven percent of higher-risk patients clustered in profile 1. All lower-risk (LR) patients clustered into the remaining 4 profiles: profile 2 was characterized by pancytopenia, profile 3 by monocytosis, profile 4 by elevated megakaryocytes, and profile 5 by erythroid dysplasia. These profiles could also separate patients with different prognoses. LR MDS patients were classified into 8 genetic signatures (eg, signature A had TET2 mutations, signature B had both TET2 and SRSF2 mutations, and signature G had SF3B1 mutations), demonstrating association with specific morphologic profiles. Six morphologic profiles/genetic signature associations were confirmed in a separate analysis of an independent cohort. Our study demonstrates that nonrandom or even pathognomonic relationships between morphology and genotype to define clinical features can be identified. This is the first comprehensive implementation of machine-learning algorithms to elucidate potential intrinsic interdependencies among genetic lesions, morphologies, and clinical prognostic in attributes of MDS.


2022 ◽  
Author(s):  
Albane Ruaud ◽  
Niklas A Pfister ◽  
Ruth E Ley ◽  
Nicholas D Youngblut

Background: Tree ensemble machine learning models are increasingly used in microbiome science as they are compatible with the compositional, high-dimensional, and sparse structure of sequence-based microbiome data. While such models are often good at predicting phenotypes based on microbiome data, they only yield limited insights into how microbial taxa or genomic content may be associated. Results: We developed endoR, a method to interpret a fitted tree ensemble model. First, endoR simplifies the fitted model into a decision ensemble from which it then extracts information on the importance of individual features and their pairwise interactions and also visualizes these data as an interpretable network. Both the network and importance scores derived from endoR provide insights into how features, and interactions between them, contribute to the predictive performance of the fitted model. Adjustable regularization and bootstrapping help reduce the complexity and ensure that only essential parts of the model are retained. We assessed the performance of endoR on both simulated and real metagenomic data. We found endoR to infer true associations with more or comparable accuracy than other commonly used approaches while easing and enhancing model interpretation. Using endoR, we also confirmed published results on gut microbiome differences between cirrhotic and healthy individuals. Finally, we utilized endoR to gain insights into components of the microbiome that predict the presence of human gut methanogens, as these hydrogen-consumers are expected to interact with fermenting bacteria in a complex syntrophic network. Specifically, we analyzed a global metagenome dataset of 2203 individuals and confirmed the previously reported association between Methanobacteriaceae and Christensenellales. Additionally, we observed that Methanobacteriaceae are associated with a network of hydrogen-producing bacteria. Conclusion: Our method accurately captures how tree ensembles use features and interactions between them to predict a response. As demonstrated by our applications, the resultant visualizations and summary outputs facilitate model interpretation and enable the generation of novel hypotheses about complex systems. An implementation of endoR is available as an open-source R-package on GitHub (https://github.com/leylabmpi/endoR).


2019 ◽  
Vol 9 (24) ◽  
pp. 5324 ◽  
Author(s):  
Will Y. Lin ◽  
Ying-Hua Huang

Construction projects are usually designed by different professional teams, where design clashes may inevitably occur. With the clash detection tools provided by Building Information Modeling (BIM) software, these clashes can be discovered at an early stage. However, the number of clashes detected by BIM software is often huge. The literature states that the majority of those clashes are found to be irrelevant, i.e., harmless to the building and its construction. How to filter out these irrelevant clashes from the detection report is one of the issues to be resolved urgently in the construction industry. This study develops a method that automatically screens for irrelevant clashes by combining the two techniques of rule-based reasoning and supervised machine learning. First, we acquire experts’ knowledge through interviews to compile rules for the preliminary classification of clash types. Subsequently, the results of the initial classification inferred by the rules are added into the training dataset to improve the predictive performance of the classifiers implemented by supervised machine learning. The average predictive performance obtained by using the hybrid method is up to 0.96, which has been improved from the traditional machine learning process only using individual or ensemble learning classifiers by 6%–17%.


2021 ◽  
Vol 21 (suppl 2) ◽  
pp. 445-451
Author(s):  
Tiago Pessoa Ferreira Lima ◽  
Gabrielle Ribeiro Sena ◽  
Camila Soares Neves ◽  
Suely Arruda Vidal ◽  
Jurema Telles Oliveira Lima ◽  
...  

Abstract Objectives: train a Random Forest (RF) classifier to estimate death risk in elderly people (over 60 years old) diagnosed with COVID-19 in Pernambuco. A "feature" of this classifier, called feature importance, was used to identify the attributes (main risk factors) related to the outcome (cure or death) through gaining information. Methods: data from confirmed cases of COVID-19 was obtained between February 13 and June 19, 2020, in Pernambuco, Brazil. The K-fold Cross Validation algorithm (K=10) assessed RF performance and the importance of clinical features. Results: the RF algorithm correctly classified 78.33% of the elderly people, with AUC of 0.839. Advanced age was the factor representing the highest risk of death. The main comorbidity and symptom were cardiovascular disease and oxygen saturation ≤ 95%, respectively. Conclusion: this study applied the RF classifier to predict risk of death and identified the main clinical features related to this outcome in elderly people with COVID-19 in the state of Pernambuco.


Computers ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 31
Author(s):  
Aziz Alotaibi ◽  
Mohammad Shiblee ◽  
Adel Alshahrani

Precisely assessing the severity of persons with COVID-19 at an early stage is an effective way to increase the survival rate of patients. Based on the initial screening, to identify and triage the people at highest risk of complications that can result in mortality risk in patients is a challenging problem, especially in developing nations around the world. This problem is further aggravated due to the shortage of specialists. Using machine learning (ML) techniques to predict the severity of persons with COVID-19 in the initial screening process can be an effective method which would enable patients to be sorted and treated and accordingly receive appropriate clinical management with optimum use of medical facilities. In this study, we applied and evaluated the effectiveness of three types of Artificial Neural Network (ANN), Support Vector Machine and Random forest regression using a variety of learning methods, for early prediction of severity using patient history and laboratory findings. The performance of different machine learning techniques to predict severity with clinical features shows that it can be successfully applied to precisely and quickly assess the severity of the patient and the risk of death by using patient history and laboratory findings that can be an effective method for patients to be triaged and treated accordingly.


2020 ◽  
Author(s):  
Yijun Wu ◽  
Yuming Chong ◽  
Jianghao Liu ◽  
Pancheng Wu ◽  
Yanyu Wang ◽  
...  

Abstract BackgroundLymph node metastasis (LNM) status can be a critical decisive factor for clinical management of lung cancer. Accurately evaluating the risk of LNM during or after the surgery can be helpful for making clinical decisions. This study aims to incorporate clinicopathological characteristics to develop reliable machine learning (ML)-based models for predicting LNM in patients with early-stage lung adenocarcinoma.MethodsA total of 709 lung adenocarcinoma patients with tumor size ≤ 2 cm were enrolled for analysis and modeling by multiple ML algorithms. The receiver operating characteristic (ROC) curve and decision curve were used for evaluating model’s predictive performance and clinical usefulness. Feature selection based on potential models was performed to identify most-contributed predictive factors.ResultsLNM occurred in 11.3% (80/709) of patients with lung adenocarcinoma. Most models reached high areas under the ROC curve (AUCs) > 0.9. In the decision curve, all models performed better than the treat-all and treat-none lines. The random forest classifier (RFC) model, with a minimal number of 5 variables introduced (including carcinoembryonic antigen, solid component, micropapillary component, lymphovascular invasion and pleural invasion), was identified as the optimal model for predicting LNM, because of its excellent performance in both ROC and decision curves. The cost-efficient application of RFC model could precisely predict LNM during or after the operation of early-stage adenocarcinomas (sensitivity: 87.5%; specificity: 82.2%).ConclusionsIncorporating clinicopathological characteristics, it is feasible to predict LNM intraoperatively or postoperatively by ML algorithms.Trial registration: NA


2019 ◽  
Vol 19 (292) ◽  
Author(s):  
Nan Hu ◽  
Jian Li ◽  
Alexis Meyer-Cirkel

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.


2021 ◽  
Author(s):  
Christine Tedijanto ◽  
Solomon Aragie ◽  
Zerihun Tadesse ◽  
Mahteme Haile ◽  
Taye Zeru ◽  
...  

Trachoma is an infectious disease characterized by repeated exposures to Chlamydia trachomatis (Ct) that may ultimately lead to blindness. Certain areas, particularly in Africa, pose persistent challenges to elimination of trachoma as a public health problem. Efficiently identifying communities with high infection burden could help target more intensive control efforts. We hypothesized that IgG seroprevalence in combination with geospatial layers, machine learning, and model-based geostatistics would be able to accurately predict future community-level ocular Ct infections detected by PCR. We used measurements from 40 communities in the hyperendemic Amhara region of Ethiopia. Median Ct infection prevalence among children 0-5 years old increased from 6% at enrollment to 29% by month 36. At baseline, correlation between seroprevalence and Ct infection was stronger among children 0-5 years old (ρ = 0.77) than children 6-9 years old (ρ = 0.48), and stronger than the correlation between clinical trachoma and Ct infection (0-5y ρ = 0.56; 6-9y ρ = 0.40). Seroprevalence was the strongest concurrent predictor of infection prevalence at month 36 among children 0-5 years old (cross-validated R2 = 0.75, 95% CI: 0.58-0.85), though predictive performance declined substantially with increasing temporal lag between predictor and outcome measurements. Geospatial variables, a spatial Gaussian process, and stacked ensemble machine learning did not meaningfully improve predictions. Serological markers among children 0-5 years old may be an objective, programmatic tool for identifying communities with high levels of active ocular Ct infections, but accurate, future prediction in the context of changing transmission remains an open challenge.


Időjárás ◽  
2021 ◽  
Vol 125 (4) ◽  
pp. 609-624
Author(s):  
Sándor Baran ◽  
Ágnes Baran

In the last decades, wind power became the second largest energy source in the EU covering 16% of its electricity demand. However, due to its volatility, accurate short range wind power predictions are required for successful integration of wind energy into the electrical grid. Accurate predictions of wind power require accurate hub height wind speed forecasts, where the state-of-the-art method is the probabilistic approach based on ensemble forecasts obtained from multiple runs of numerical weather prediction models. Nonetheless, ensemble forecasts are often uncalibrated and might also be biased, thus require some form of post-processing to improve their predictive performance. We propose a novel flexible machine learning approach for calibrating wind speed ensemble forecasts, which results in a truncated normal predictive distribution. In a case study based on 100m wind speed forecasts produced by the operational ensemble prediction system of the Hungarian Meteorological Service, the forecast skill of this method is compared with the predictive performance of three different ensemble model output statistics approaches and the raw ensemble forecasts. We show that compared with the raw ensemble, post-processing always improves the calibration of probabilistic and accuracy of point forecasts, and from the four competing methods, the novel machine learning based approach results in the best overall performance.


Author(s):  
Preeth B.Meena ◽  
Radha, P.

In today’s scenario, disease prediction plays an important role in medical field. Early detection of diseases is essential because of the fast food habits and life. In my previous study for predicting diseases using radiology test report , and to classify the disease as positive or negative three classifiers Naïve Bayes (NB), Support Vector Machine (SVM) and Modified Extreme Learning Machine (MELM was used to increase the accuracy of results. To increase the efficiency of predicting the disease and to find which disease pricks the society, ensemble machine learning algorithm is used. The huge data from the healthcare industry were preprocessed., categorized and analyzed to find out and predict which patient to be treated and given priority and which hits the society the most. Ensemble machine learning's popularity in the medical industry is due to a variety of factors the Classifiers used are K Nearest Neighbors, Nearest Mean Classifier, Mean Feature Voting Classifier, KDtree KNN, Random Forest. To reduce the manual processes in medical field automating these processes has become important. Electronic medical records and significant advances in health care have given an opportunity to make find out which patients need to be given more importance. Several methodologies and techniques were used to preprocess the data in order to meet the study' requirements. To improve the performance of machine learning algorithms, feature selections were made using Tabu search. When ensemble prediction is combined with the Random Forest algorithm as the combiner, the results are more reliable. The aim of this study is to create a system to classify Medical records whether it is diseased or not and find out which disease rate has increased. This research will help the society to an individual to get treated easily and take preventive measures to avoid diseases.


Sign in / Sign up

Export Citation Format

Share Document