AutoScore: A Machine Learning–Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records

Background Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records. Objective This study aims to propose AutoScore, a machine learning–based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications. Methods We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore. Results Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules. Conclusions We developed an easy-to-use, machine learning–based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.

Download Full-text

AutoScore: A Machine Learning–Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records (Preprint)

10.2196/preprints.21798 ◽

2020 ◽

Author(s):

Feng Xie ◽

Bibhas Chakraborty ◽

Marcus Eng Hock Ong ◽

Benjamin Alan Goldstein ◽

Nan Liu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Electronic Health Records ◽

Clinical Score ◽

Mortality Prediction ◽

Fine Tuning ◽

Health Records ◽

Scoring Model ◽

Benchmark Database ◽

Electronic Health

BACKGROUND Risk scores can be useful in clinical risk stratification and accurate allocations of medical resources, helping health providers improve patient care. Point-based scores are more understandable and explainable than other complex models and are now widely used in clinical decision making. However, the development of the risk scoring model is nontrivial and has not yet been systematically presented, with few studies investigating methods of clinical score generation using electronic health records. OBJECTIVE This study aims to propose AutoScore, a machine learning–based automatic clinical score generator consisting of 6 modules for developing interpretable point-based scores. Future users can employ the AutoScore framework to create clinical scores effortlessly in various clinical applications. METHODS We proposed the AutoScore framework comprising 6 modules that included variable ranking, variable transformation, score derivation, model selection, score fine-tuning, and model evaluation. To demonstrate the performance of AutoScore, we used data from the Beth Israel Deaconess Medical Center to build a scoring model for mortality prediction and then compared the data with other baseline models using the receiver operating characteristic analysis. A software package in R 3.5.3 (R Foundation) was also developed to demonstrate the implementation of AutoScore. RESULTS Implemented on the data set with 44,918 individual admission episodes of intensive care, the AutoScore-created scoring models performed comparably well as other standard methods (ie, logistic regression, stepwise regression, least absolute shrinkage and selection operator, and random forest) in terms of predictive accuracy and model calibration but required fewer predictors and presented high interpretability and accessibility. The nine-variable, AutoScore-created, point-based scoring model achieved an area under the curve (AUC) of 0.780 (95% CI 0.764-0.798), whereas the model of logistic regression with 24 variables had an AUC of 0.778 (95% CI 0.760-0.795). Moreover, the AutoScore framework also drives the clinical research continuum and automation with its integration of all necessary modules. CONCLUSIONS We developed an easy-to-use, machine learning–based automatic clinical score generator, AutoScore; systematically presented its structure; and demonstrated its superiority (predictive performance and interpretability) over other conventional methods using a benchmark database. AutoScore will emerge as a potential scoring tool in various medical applications.

Download Full-text

Predicting dementia diagnosis from cognitive footprints in electronic health records: a case–control study protocol

BMJ Open ◽

10.1136/bmjopen-2020-043487 ◽

2020 ◽

Vol 10 (11) ◽

pp. e043487

Author(s):

Hao Luo ◽

Kui Kai Lau ◽

Gloria H Y Wong ◽

Wai-Chi Chan ◽

Henry K F Mak ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Hong Kong ◽

Electronic Health Records ◽

Case Control Study ◽

Case Control ◽

Dementia Diagnosis ◽

Health Records ◽

Electronic Health ◽

Control Study

IntroductionDementia is a group of disabling disorders that can be devastating for persons living with it and for their families. Data-informed decision-making strategies to identify individuals at high risk of dementia are essential to facilitate large-scale prevention and early intervention. This population-based case–control study aims to develop and validate a clinical algorithm for predicting dementia diagnosis, based on the cognitive footprint in personal and medical history.Methods and analysisWe will use territory-wide electronic health records from the Clinical Data Analysis and Reporting System (CDARS) in Hong Kong between 1 January 2001 and 31 December 2018. All individuals who were at least 65 years old by the end of 2018 will be identified from CDARS. A random sample of control individuals who did not receive any diagnosis of dementia will be matched with those who did receive such a diagnosis by age, gender and index date with 1:1 ratio. Exposure to potential protective/risk factors will be included in both conventional logistic regression and machine-learning models. Established risk factors of interest will include diabetes mellitus, midlife hypertension, midlife obesity, depression, head injuries and low education. Exploratory risk factors will include vascular disease, infectious disease and medication. The prediction accuracy of several state-of-the-art machine-learning algorithms will be compared.Ethics and disseminationThis study was approved by Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 18-225). Patients’ records are anonymised to protect privacy. Study results will be disseminated through peer-reviewed publications. Codes of the resulted dementia risk prediction algorithm will be made publicly available at the website of the Tools to Inform Policy: Chinese Communities’ Action in Response to Dementia project (https://www.tip-card.hku.hk/).

Download Full-text

Comparative analysis of machine learning methods for analyzing security practice in electronic health records’ logs.

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378353 ◽

2020 ◽

Author(s):

Prosper K Yeng ◽

Muhammad Ali Fauzi ◽

Bian Yang

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Electronic Health Records ◽

Learning Methods ◽

Health Records ◽

Machine Learning Methods ◽

Electronic Health

Download Full-text

Workflow-based anomaly detection using machine learning on electronic health records’ logs: A Comparative Study

2020 International Conference on Computational Science and Computational Intelligence (CSCI) ◽

10.1109/csci51800.2020.00143 ◽

2020 ◽

Author(s):

Prosper K Yeng ◽

Muhammad Ali Fauzi ◽

Bian Yang

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Anomaly Detection ◽

Comparative Study ◽

Health Records ◽

Electronic Health

Download Full-text

The process of sourcing and preparing electronic health records data to implement a machine-learning algorithm for early identification of maternal cardiovascular risk (Preprint)

10.2196/preprints.34932 ◽

2021 ◽

Author(s):

Nawar Shara ◽

Kelley M. Anderson ◽

Noor Falah ◽

Maryam F. Ahmad ◽

Darya Tavazoei ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Risk ◽

Electronic Health Records ◽

Patient Care ◽

Early Identification ◽

Health Data ◽

Patient Specific ◽

Patient Records ◽

Health Records ◽

Electronic Health

BACKGROUND Healthcare data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes [6]. However, the differences that exist in each individual’s health records, combined with the lack of health-data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. While these problems exist throughout healthcare, they are especially prevalent within maternal health, and exacerbate the maternal morbidity and mortality (MMM) crisis in the United States. OBJECTIVE Maternal patient records were extracted from the electronic health records (EHRs) of a large tertiary healthcare system and made into patient-specific, complete datasets through a systematic method so that a machine-learning-based (ML-based) risk-assessment algorithm could effectively identify maternal cardiovascular risk prior to evidence of diagnosis or intervention within the patient’s record. METHODS We outline the effort that was required to define the specifications of the computational systems, the dataset, and access to relevant systems, while ensuring data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for its use by a proprietary risk-stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient’s baselines to inform early interventions. RESULTS Patient records can be made actionable for the goal of effectively employing machine learning (ML), specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS Upon acquiring data, including the concatenation, anonymization, and normalization of said data across multiple EHRs, the use of a machine-learning-based (ML-based) tool can provide early identification of cardiovascular risk in pregnant patients. CLINICALTRIAL N/A

Download Full-text

Leveraging Genetic Reports and Electronic Health Records for Predicting Primary Cancers Based on FHIR and RDF (Preprint)

10.2196/preprints.23586 ◽

2020 ◽

Author(s):

Nansu Zong ◽

Victoria Ngo ◽

Daniel J. Stone ◽

Andrew Wen ◽

Yiqing Zhao ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Cancer Patients ◽

Genetic Data ◽

Precision Oncology ◽

Primary Cancer ◽

Health Records ◽

Web Resource ◽

Cancer Types ◽

Electronic Health

BACKGROUND Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnose, and treatments. A key research area focuses on early detection of primary cancers and the potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict unknown primaries. METHODS We extracted the genetic data elements from a collection of oncology genetic reports of 1,011 cancer patients, and corresponding phenotypical data from the Mayo Clinic electronic health records (EHRs). We modeled both genetic and EHR data with HL7 Fast Healthcare Interoperability Resources (FHIR). The semantic web Resource Description Framework (RDF) was employed to generate the network-based data representation (i.e., patient-phenotypic-genetic network). Based on RDF data graph, graph embedding algorithm Node2vec was applied to generate features, and then multiple machine learning and deep learning backbone models were adopted for cancer prediction. RESULTS With six machine-learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types and predicting unknown primaries. To demonstrate the interpretability, phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS Accurate prediction of cancer types can be achieved with existing EHR data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnose stage for cancer patients.

Download Full-text

Synchronization of Machine Learning into Electronic Health Records

International Journal of Computer Applications ◽

10.5120/ijca2019919751 ◽

2019 ◽

Vol 177 (26) ◽

pp. 40-47

Author(s):

Meet N. ◽

Eshan Vatsa ◽

Nitin S.

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Health Records ◽

Electronic Health

Download Full-text

Early prediction of clinical deterioration using data-driven machine learning modeling of electronic health records

Journal of Thoracic and Cardiovascular Surgery ◽

10.1016/j.jtcvs.2021.10.060 ◽

2021 ◽

Author(s):

Victor M. Ruiz ◽

Michael P. Goldsmith ◽

Lingyun Shi ◽

Allan F. Simpao ◽

Jorge A. Gálvez ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Clinical Deterioration ◽

Data Driven ◽

Early Prediction ◽

Health Records ◽

Electronic Health ◽

Using Data

Download Full-text

Accurate COVID-19 Health Outcome Prediction and Risk Factors Identification through an Innovative Machine Learning Framework Using Longitudinal Electronic Health Records

10.1109/ichi52183.2021.00099 ◽

2021 ◽

Author(s):

Alice Feng

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Health Outcome ◽

Electronic Health Records ◽

Outcome Prediction ◽

Health Records ◽

Learning Framework ◽

Electronic Health

Download Full-text

Application of Machine Learning in Chronic Kidney Disease Risk Prediction Using Electronic Health Records (EHR)

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch014 ◽

2021 ◽

pp. 213-233

Author(s):

Laxmi Kumari Pathak ◽

Pooja Jha

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Electronic Health Records ◽

Bone Diseases ◽

Machine Learning Algorithms ◽

Health Records ◽

Chronic Kidney Disease Risk ◽

Electronic Health ◽

Heart Disorders

Chronic kidney disease (CKD) is a disorder in which the kidneys are weakened and become unable to filter blood. It lowers the human ability to remain healthy. The field of biosciences has progressed and produced vast volumes of knowledge from electronic health records. Heart disorders, anemia, bone diseases, elevated potassium, and calcium are the very prevalent complications that arise from kidney failure. Early identification of CKD can improve the quality of life greatly. To achieve this, various machine learning techniques have been introduced so far that use the data in electronic health record (EHR) to predict CKD. This chapter studies various machine learning algorithms like support vector machine, random forest, probabilistic neural network, Apriori, ZeroR, OneR, naive Bayes, J48, IBk (k-nearest neighbor), ensemble method, etc. and compares their accuracy. The study aims in finding the best-suited technique from different methods of machine learning for the early detection of CKD by which medical professionals can interpret model predictions easily.

Download Full-text