scholarly journals Machine learning demonstrates that somatic mutations imprint invariant morphologic features in myelodysplastic syndromes

Blood ◽  
2020 ◽  
Vol 136 (20) ◽  
pp. 2249-2262 ◽  
Author(s):  
Yasunobu Nagata ◽  
Ran Zhao ◽  
Hassan Awada ◽  
Cassandra M. Kerr ◽  
Inom Mirzaev ◽  
...  

Abstract Morphologic interpretation is the standard in diagnosing myelodysplastic syndrome (MDS), but it has limitations, such as varying reliability in pathologic evaluation and lack of integration with genetic data. Somatic events shape morphologic features, but the complexity of morphologic and genetic changes makes clear associations challenging. This article interrogates novel clinical subtypes of MDS using a machine-learning technique devised to identify patterns of cooccurrence among morphologic features and genomic events. We sequenced 1079 MDS patients and analyzed bone marrow morphologic alterations and other clinical features. A total of 1929 somatic mutations were identified. Five distinct morphologic profiles with unique clinical characteristics were defined. Seventy-seven percent of higher-risk patients clustered in profile 1. All lower-risk (LR) patients clustered into the remaining 4 profiles: profile 2 was characterized by pancytopenia, profile 3 by monocytosis, profile 4 by elevated megakaryocytes, and profile 5 by erythroid dysplasia. These profiles could also separate patients with different prognoses. LR MDS patients were classified into 8 genetic signatures (eg, signature A had TET2 mutations, signature B had both TET2 and SRSF2 mutations, and signature G had SF3B1 mutations), demonstrating association with specific morphologic profiles. Six morphologic profiles/genetic signature associations were confirmed in a separate analysis of an independent cohort. Our study demonstrates that nonrandom or even pathognomonic relationships between morphology and genotype to define clinical features can be identified. This is the first comprehensive implementation of machine-learning algorithms to elucidate potential intrinsic interdependencies among genetic lesions, morphologies, and clinical prognostic in attributes of MDS.

2021 ◽  
Author(s):  
Fang He ◽  
John H Page ◽  
Kerry R Weinberg ◽  
Anirban Mishra

BACKGROUND The current COVID-19 pandemic is unprecedented; under resource-constrained setting, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients, however there are few risk scores derived from a substantially large EHR dataset, using simplified predictors as input. OBJECTIVE To develop and validate simplified machine learning algorithms which predicts COVID-19 adverse outcomes, to evaluate the AUC (area under the receiver operating characteristic curve), sensitivity, specificity and calibration of the algorithms, to derive clinically meaningful thresholds. METHODS We conducted machine learning model development and validation via cohort study using multi-center, patient-level, longitudinal electronic health records (EHR) from Optum® COVID-19 database which provides anonymized, longitudinal EHR from across US. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, ICU admission, respiratory failure, mechanical ventilator usages at inpatient setting. Data from patients who were admitted prior to Sep 7, 2020, is randomly sampled into development, test and validation datasets; data collected from Sep 7, 2020 through Nov 15, 2020 was reserved as prospective validation dataset. RESULTS Of 3.7M patients in the analysis, a total of 585,867 patients were diagnosed or tested positive for SARS-CoV-2; and 50,703 adult patients were hospitalized with COVID-19 between Feb 1 and Nov 15, 2020. Among the study cohort (N=50,703), there were 6,204 deaths, 9,564 ICU admissions, 6,478 mechanically ventilated or EMCO patients and 25,169 patients developed ARDS or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC = 0.89 (0.89 - 0.89) on validation dataset (N=10,752)), consistent prediction through the second wave of pandemic from September to November (AUC = 0.85 (0.85 - 0.86) on post-development validation (N= 14,863)), great clinical relevance and utility. Besides, a comprehensive 386 input covariates from baseline and at admission was included in the analysis; the end-to-end pipeline automates feature selection and model development process, producing 10 key predictors as input such as age, blood urea nitrogen, oxygen saturation, which are both commonly measured and concordant with recognized risk factors for COVID-19. CONCLUSIONS The systematic approach and rigorous validations demonstrate consistent model performance to predict even beyond the time period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated and reliable prediction model based on only ten clinical features as a prognostic tool to stratifying COVID-19 patients into intermediate, high and very high-risk groups. This simple predictive tool could be shared with a wider healthcare community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize healthcare resources. CLINICALTRIAL N/A


10.2196/16042 ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. e16042
Author(s):  
Emily R Pfaff ◽  
Miles Crosskey ◽  
Kenneth Morton ◽  
Ashok Krishnamurthy

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Maoxin Huang ◽  
Yi Zhang ◽  
Xiaohong Ou ◽  
Caiyun Wang ◽  
Xueqing Wang ◽  
...  

Background. Cutaneous melanoma (CM) is one of the most life-threatening primary skin cancers and is prone to distant metastases. A widespread presence of posttranscriptional modification of RNA, 5-methylcytosine (m5C), has been observed in human cancers. However, the potential mechanism of the tumorigenesis and prognosis in CM by dysregulated m5C-related regulators is obscure. Methods. We use comprehensive bioinformatics analyses to explore the expression of m5C regulators in CM, the prognostic implications of the m5C regulators, the frequency of the copy number variant (CNV), and somatic mutations in m5C regulators. Additionally, the CM patients were divided into three clusters for better predicting clinical features and outcomes via consensus clustering of m5C regulators. Then, the risk score was established via Lasso Cox regression analysis. Next, the prognosis value and clinical characteristics of m5C-related signatures were further explored. Then, machine learning was used to recognize the outstanding m5C regulators to risk score. Finally, the expression level and clinical value of USUN6 were analyzed via the tissue microarray (TMA) cohort. Results. We found that m5C regulators were dysregulated in CM, with a high frequency of somatic mutations and CNV alterations of the m5C regulatory gene in CM. Furthermore, 16 m5C-related proteins interacted with each other frequently, and we divided CM patients into three clusters to better predicting clinical features and outcomes. Then, five m5C regulators were selected as a risk score based on the LASSO model. The XGBoost algorithm recognized that NOP2 and NSUN6 were the most significant risk score contributors. Immunohistochemistry has verified that low expression of USUN6 was closely correlated with CM progression. Conclusion. The m5C-related signatures can be used as new prognostic biomarkers and therapeutic targets for CM, and NSUN6 might play a vital role in tumorigenesis and malignant progression.


Computers ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 4
Author(s):  
Silvia Panicacci ◽  
Massimiliano Donati ◽  
Francesco Profili ◽  
Paolo Francesconi ◽  
Luca Fanucci

Together with population ageing, the number of people suffering from multimorbidity is increasing, up to more than half of the population by 2035. This part of the population is composed by the highest-risk patients, who are, at the same time, the major users of the healthcare systems. The early identification of this sub-population can really help to improve people’s quality of life and reduce healthcare costs. In this paper, we describe a population health management tool based on state-of-the-art intelligent algorithms, starting from administrative and socio-economic data, for the early identification of high-risk patients. The study refers to the population of the Local Health Unit of Central Tuscany in 2015, which amounts to 1,670,129 residents. After a trade-off on machine learning models and on input data, Random Forest applied to 1-year of historical data achieves the best results, outperforming state-of-the-art models. The most important variables for this model, in terms of mean minimal depth, accuracy decrease and Gini decrease, result to be age and some group of drugs, such as high-ceiling diuretics. Thanks to the low inference time and reduced memory usage, the resulting model allows for real-time risk prediction updates whenever new data become available, giving General Practitioners the possibility to early adopt personalised medicine.


2022 ◽  
Author(s):  
Jie Li ◽  
Xin Li ◽  
John Hutchinson ◽  
Mohammad Asad ◽  
Yadong Wang ◽  
...  

Background: It's critical to identify COVID-19 patients with a higher death risk at early stage to give them better hospitalization or intensive care. However, thus far, none of the machine learning models has been shown to be successful in an independent cohort. We aim to develop a machine learning model which could accurately predict death risk of COVID-19 patients at an early stage in other independent cohorts. Methods: We used a cohort containing 4711 patients whose clinical features associated with patient physiological conditions or lab test data associated with inflammation, hepatorenal function, cardiovascular function and so on to identify key features. To do so, we first developed a novel data preprocessing approach to clean up clinical features and then developed an ensemble machine learning method to identify key features. Results: Finally, we identified 14 key clinical features whose combination reached a good predictive performance of AUC 0.907. Most importantly, we successfully validated these key features in a large independent cohort containing 15,790 patients. Conclusions: Our study shows that 14 key features are robust and useful in predicting the risk of death in patients confirmed SARS-CoV-2 infection at an early stage, and potentially useful in clinical settings to help in making clinical decisions.


2020 ◽  
Author(s):  
Abin Abraham ◽  
Brian L Le ◽  
Idit Kosti ◽  
Peter Straub ◽  
Digna R Velez Edwards ◽  
...  

Abstract: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. Here, we apply machine learning to diverse data from EHRs to predict singleton preterm birth. Leveraging a large cohort of 35,282 deliveries, we find that a prediction model based on billing codes alone can predict preterm birth at 28 weeks of gestation (ROC-AUC=0.75, PR-AUC=0.40) and outperforms a comparable model trained using known risk factors (ROC-AUC=0.59, PR-AUC=0.21). Our machine learning approach is also able to accurately predict preterm birth sub-types (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. We demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5,978 deliveries) with only a modest decrease in performance. Interpreting the features identified by the model as most informative for risk stratification demonstrates that they capture non-linear combinations of known risk factors and patterns of care. The strong performance of our approach across multiple clinical contexts and an independent cohort highlights the potential of machine learning algorithms to improve medical care during pregnancy.


Author(s):  
Shiladitya Raj ◽  
◽  
Megha Jain ◽  
Dr. Pradeep Chouksey ◽  
◽  
...  

Massive volumes of network traffic & data are generated by common technology including the Internet of Things, cloud computing & social networking. Intrusion Detection Systems are therefore required to track the network which dynamically analyses incoming traffic. The purpose of the IDS is to carry out attacks inspection or provide security management with desirable help along with intrusion data. To date, several approaches to intrusion detection have been suggested to anticipate network malicious traffic. The NSL-KDD dataset is being applied in the paper to test intrusion detection machine learning algorithms. We research the potential viability of ELM by evaluating the advantages and disadvantages of ELM. In the preceding part on this issue, we noted that ELM does not degrade the generalisation potential in the expectation sense by selecting the activation function correctly. In this paper, we initiate a separate analysis & demonstrate that the randomness of ELM often contributes to some negative effects. For this reason, we have employed a new technique of machine learning for overcoming the problems of ELM by using the Categorical Boosting technique (CATBoost).


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Jawed Nawabi ◽  
Helge Kniep ◽  
Gerhard Schön ◽  
Jens Fiehler ◽  
Uta Hanning

Background: Intracranial hemorrhage (ICH) requires prompt diagnosis to optimize patient outcomes 1 . We hypothesized that machine learning algorithms could automatically analyze non-contrast computed tomography (NECT) of the head and predict clinical outcome of ICH patients 2 . Methods: 300 NECTs with acute spontaneous ICH between 2014-2019 were retrospectively included from the database at a tertiary university hospital. A binary outcome was defined as Modified Ranking Scale (mRS) 0-3 (good outcome) and mRS 4-6 (bad outcome) at discharge. Radiomic features including shape, histogram and texture markers were extracted from non- , wavelet- and log-sigma-filtered images using regions of interest of ICH. The quantitative predictors were evaluated utilizing random forest algorithms with 5-fold model-external cross-validation. Results: The model achieved an area under the ROC curve of 0.81 (95% CI [0.077; 0.86]; P<0.01), specificities and sensitivities reached 78% at Youden’s Index optimal cut-off point for the prediction of functional clinical outcome at discharge (mRS). Discussion: In conclusion, quantitative features of acute NECT images in a machine learning algorithm provided high discriminatory power in predicting functional outcome. In clinical routine, this proposed approach could allow early triage of high-risk patients for poor outcome. Indication of source:1 Qureshi, A. I. et al. Intracerebral haemorrhage. Lancet. 2009. 2 Mohammad R. Arbabshirani et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. npj Digital Medicine. 2018.


Sign in / Sign up

Export Citation Format

Share Document