458. A Machine Learning Approach Identifies Distinct Early-Symptom Cluster Phenotypes Which Correlate with Severe SARS-CoV-2 Outcomes

Abstract Background The novel coronavirus disease 2019 (COVID-19) pandemic remains a global challenge. Accurate COVID-19 prognosis remains an important aspect of clinical management. While many prognostic systems have been proposed, most are derived from analyses of individual symptoms or biomarkers. Here, we take a machine learning approach to first identify discrete clusters of early stage-symptoms which may delineate groups with distinct symptom phenotypes. We then sought to identify whether these groups correlate with subsequent disease severity. Methods The Epidemiology, Immunology, and Clinical Characteristics of Emerging Infectious Diseases with Pandemic Potential (EPICC) study is a longitudinal cohort study with data and biospecimens collected from nine military treatment facilities over 1 year of follow-up. Demographic and clinical characteristics were measured with interviews and electronic medical record review. Early symptoms by organ-domain were measured by FLU-PRO-plus surveys collected for 14 days post-enrollment, with surveys completed a median 14.5 (Interquartile Range, IQR = 13) days post-symptom onset. Using these FLU-PRO-plus responses, we applied principal component analysis followed by unsupervised machine learning algorithm k-means to identify groups with distinct clusters of symptoms. We then fit multivariate logistic regression models to determine how these early-symptom clusters correlated with hospitalization risk after controlling for age, sex, race, and obesity. Results Using SARS-CoV-2 positive participants (n = 1137) from the EPICC cohort (Figure 1), we transformed reported symptoms into domains and identified three groups of participants with distinct clusters of symptoms. Logistic regression demonstrated that cluster-2 was associated with an approximately three-fold increased odds [3.01 (95% CI: 2-4.52); P < 0.001] of hospitalization which remained significant after controlling for other factors [2.97 (95% CI: 1.88-4.69); P < 0.001]. (A) Baseline characteristics of SARS-CoV-2 positive participants. (B) Heatmap comparing FLU-PRO response in each participant. (C) Principal component analysis followed by k-means clustering identified three groups of participants. (D) Crude and adjusted association of identified cluster with hospitalization. Conclusion Our findings have identified three distinct groups with early-symptom phenotypes. With further validation of the clusters’ significance, this tool could be used to improve COVID-19 prognosis in a precision medicine framework and may assist in patient triaging and clinical decision-making. Disclaimer Disclosures David A. Lindholm, MD, American Board of Internal Medicine (Individual(s) Involved: Self): Member of Auxiliary R&D Infectious Disease Item-Writer Task Force. No financial support received. No exam questions will be disclosed ., Other Financial or Material Support Ryan C. Maves, MD, EMD Serono (Advisor or Review Panel member)Heron Therapeutics (Advisor or Review Panel member) Simon Pollett, MBBS, Astra Zeneca (Other Financial or Material Support, HJF, in support of USU IDCRP, funded under a CRADA to augment the conduct of an unrelated Phase III COVID-19 vaccine trial sponsored by AstraZeneca as part of USG response (unrelated work))

Download Full-text

A machine learning approach to medical data identification through principal component analysis

Big Data III: Learning, Analytics, and Applications ◽

10.1117/12.2586038 ◽

2021 ◽

Author(s):

Lorenzo E. Jaques ◽

Arthur C. Depoian ◽

Dong Xie ◽

Colleen P. Bailey ◽

Parthasarathy Guturu

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Medical Data ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Comparative Machine Learning Approach in Dementia Patient Classification using Principal Component Analysis

Proceedings of the 12th International Conference on Agents and Artificial Intelligence ◽

10.5220/0009096907800784 ◽

2020 ◽

Cited By ~ 1

Author(s):

Gopi Battineni ◽

Nalini Chintalapudi ◽

Francesco Amenta

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Dementia Patient ◽

Learning Approach ◽

Patient Classification ◽

Machine Learning Approach

Download Full-text

Compression of Multilead Electrocardiogram Using Principal Component Analysis and Machine Learning Approach

2018 IEEE Applied Signal Processing Conference (ASPCON) ◽

10.1109/aspcon.2018.8748572 ◽

2018 ◽

Cited By ~ 1

Author(s):

Soumyendu Banerjee ◽

Rajarshi Gupta ◽

Jayanta Saha

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Enhancement of the Au/ZnO-NA Plasmonic SERS Signal Using Principal Component Analysis as a Machine Learning Approach

IEEE Photonics Journal ◽

10.1109/jphot.2020.3015740 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1-11

Author(s):

Akhilesh Kumar Gupta ◽

Chih-Hsien Hsu ◽

Chao-Sung Lai

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Learning Approach ◽

Sers Signal ◽

Machine Learning Approach

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

A machine learning approach for the factorization of psychometric data with application to the Delis Kaplan Executive Function System

Scientific Reports ◽

10.1038/s41598-021-96342-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

J. A. Camilleri ◽

S. B. Eickhoff ◽

S. Weis ◽

J. Chen ◽

J. Amunts ◽

...

Keyword(s):

Machine Learning ◽

Executive Function ◽

Executive Functioning ◽

Factor Model ◽

Five Factor Model ◽

Principal Component ◽

Function System ◽

Learning Approach ◽

Psychometric Data ◽

Machine Learning Approach

AbstractWhile a replicability crisis has shaken psychological sciences, the replicability of multivariate approaches for psychometric data factorization has received little attention. In particular, Exploratory Factor Analysis (EFA) is frequently promoted as the gold standard in psychological sciences. However, the application of EFA to executive functioning, a core concept in psychology and cognitive neuroscience, has led to divergent conceptual models. This heterogeneity severely limits the generalizability and replicability of findings. To tackle this issue, in this study, we propose to capitalize on a machine learning approach, OPNMF (Orthonormal Projective Non-Negative Factorization), and leverage internal cross-validation to promote generalizability to an independent dataset. We examined its application on the scores of 334 adults at the Delis–Kaplan Executive Function System (D-KEFS), while comparing to standard EFA and Principal Component Analysis (PCA). We further evaluated the replicability of the derived factorization across specific gender and age subsamples. Overall, OPNMF and PCA both converge towards a two-factor model as the best data-fit model. The derived factorization suggests a division between low-level and high-level executive functioning measures, a model further supported in subsamples. In contrast, EFA, highlighted a five-factor model which reflects the segregation of the D-KEFS battery into its main tasks while still clustering higher-level tasks together. However, this model was poorly supported in the subsamples. Thus, the parsimonious two-factors model revealed by OPNMF encompasses the more complex factorization yielded by EFA while enjoying higher generalizability. Hence, OPNMF provides a conceptually meaningful, technically robust, and generalizable factorization for psychometric tools.

Download Full-text

Logistic Regression Model for Loan Prediction: A Machine Learning Approach

10.1109/eti4.051663.2021.9619201 ◽

2021 ◽

Author(s):

Richa Manglani ◽

Anuja Bokhare

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Prediksi Not Operational Transaction Menggunakan Logistic Regression pada Bank XYZ di Kota Kupang

AITI ◽

10.24246/aiti.v17i1.42-55 ◽

2020 ◽

Vol 17 (1) ◽

pp. 42-55

Author(s):

Radius Tanone ◽

Arnold B Emmanuel

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Method ◽

Learning Approach ◽

Know How ◽

Independent Variables ◽

Machine Learning Approach ◽

Logistic Regression Method ◽

Python Programming

Bank XYZ is one of the banks in Kupang City, East Nusa Tenggara Province which has several ATM machines and is placed in several merchant locations. The existing ATM machine is one of the goals of customers and non-customers in conducting transactions at the ATM machine. The placement of the ATM machines sometimes makes the machine not used optimally by the customer to transact, causing the disposal of machine resources and a condition called Not Operational Transaction (NOP). With the data consisting of several independent variables with numeric types, it is necessary to know how the classification of the dependent variable is NOP. Machine learning approach with Logistic Regression method is the solution in doing this classification. Some research steps are carried out by collecting data, analyzing using machine learning using python programming and writing reports. The results obtained with this machine learning approach is the resulting prediction value of 0.507 for its classification. This means that in the future XYZ Bank can classify NOP conditions based on the behavior of customers or non-customers in making transactions using Bank XYZ ATM machines.

Download Full-text

A Machine Learning Approach for Efficient Selection of Enzyme Concentrations and Its Application for Flux Optimization

Catalysts ◽

10.3390/catal10030291 ◽

2020 ◽

Vol 10 (3) ◽

pp. 291 ◽

Cited By ~ 1

Author(s):

Anamya Ajjolli Nagaraja ◽

Philippe Charton ◽

Xavier F. Cadet ◽

Nicolas Fontaine ◽

Mathieu Delsaut ◽

...

Keyword(s):

Machine Learning ◽

Glass Ceiling ◽

Principal Component ◽

Enzyme Concentration ◽

Learning Approach ◽

Neural Network Approach ◽

Free System ◽

Machine Learning Approach ◽

Selection Of

The metabolic engineering of pathways has been used extensively to produce molecules of interest on an industrial scale. Methods like gene regulation or substrate channeling helped to improve the desired product yield. Cell-free systems are used to overcome the weaknesses of engineered strains. One of the challenges in a cell-free system is selecting the optimized enzyme concentration for optimal yield. Here, a machine learning approach is used to select the enzyme concentration for the upper part of glycolysis. The artificial neural network approach (ANN) is known to be inefficient in extrapolating predictions outside the box: high predicted values will bump into a sort of “glass ceiling”. In order to explore this “glass ceiling” space, we developed a new methodology named glass ceiling ANN (GC-ANN). Principal component analysis (PCA) and data classification methods are used to derive a rule for a high flux, and ANN to predict the flux through the pathway using the input data of 121 balances of four enzymes in the upper part of glycolysis. The outcomes of this study are i. in silico selection of optimum enzyme concentrations for a maximum flux through the pathway and ii. experimental in vitro validation of the “out-of-the-box” fluxes predicted using this new approach. Surprisingly, flux improvements of up to 63% were obtained. Gratifyingly, these improvements are coupled with a cost decrease of up to 25% for the assay.

Download Full-text

Predicting patient outcomes in psychiatric hospitals with routine data: a machine learning approach

10.21203/rs.2.15371/v4 ◽

2020 ◽

Author(s):

Jan Wolff ◽

Alexander Gary ◽

Daniela Jung ◽

Claus Normann ◽

Klaus Kaier ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Psychiatric Hospital ◽

Hospital Care ◽

Routine Data ◽

Psychiatric Hospitals ◽

Predictive Performance ◽

Learning Approach ◽

Machine Learning Approach ◽

Better Than

Abstract Background: A common problem in machine learning applications is availability of data at the point of decision making. The aim of the present study was to use routine data readily available at admission to predict aspects relevant to the organization of psychiatric hospital care. A further aim was to compare the results of a machine learning approach with those obtained through a traditional method and those obtained through a naive baseline classifier. Methods: The study included consecutively discharged patients between 1 st of January 2017 and 31 st of December 2018 from nine psychiatric hospitals in Hesse, Germany. We compared the predictive performance achieved by stochastic gradient boosting (GBM) with multiple logistic regression and a naive baseline classifier. We tested the performance of our final models on unseen patients from another calendar year and from different hospitals. Results: The study included 45,388 inpatient episodes. The models’ performance, as measured by the area under the Receiver Operating Characteristic curve, varied strongly between the predicted outcomes, with relatively high performance in the prediction of coercive treatment (area under the curve: 0.83) and 1:1 observations (0.80) and relatively poor performance in the prediction of short length of stay (0.69) and non-response to treatment (0.65). The GBM performed slightly better than logistic regression. Both approaches were substantially better than a naive prediction based solely on basic diagnostic grouping. Conclusion: The present study has shown that administrative routine data can be used to predict aspects relevant to the organisation of psychiatric hospital care. Future research should investigate the predictive performance that is necessary to provide effective assistance in clinical practice for the benefit of both staff and patients.

Download Full-text