Integrating Block Chain based Ensuring Security and Privacy of Electronic Health Record Systems using Machine Learning Techniques

Electronic health record (EHR) analysis has become increasingly important in improving the quality of human healthcare. To leverage the full insights from the big EHRs, it is very important to define some application scenarios for which the relevant data can be extracted for training machine learning models to accomplish the expected goals. In this paper, we develop a system on how to recommend medical treatment solutions for patients living in the countryside and small cities when they happen to have schizophrenia but the doctors in the local hospitals do not have sufficient expertise to deal with such challenges. In the EHRs, we take the patients’ symptom descriptions as documents and then develop NLP and unsupervised machine learning techniques to analyze such documents to find the relevant and effective treatment solutions provided by medical experts. Extensive experimental results with different vector representations for documents show that the binary keyword vector representation works best to find relevant and effective medical treatment plans and solutions from the EHRs for any input symptom description.

Download Full-text

Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocab018 ◽

2021 ◽

Author(s):

Jeffrey G Klann ◽

Griffin M Weber ◽

Hossein Estiri ◽

Bertrand Moal ◽

Paul Avillach ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Chart Review ◽

Learning Approach ◽

Health Record ◽

Learning Approaches ◽

Electronic Health Record Data ◽

Icu Admission ◽

Machine Learning Approach ◽

Electronic Health

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Download Full-text

Assessing stroke severity using electronic health record data: a machine learning approach

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-1010-x ◽

2020 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Emily Kogan ◽

Kathryn Twyman ◽

Jesse Heap ◽

Dejan Milentijevic ◽

Jennifer H. Lin ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Patient Outcomes ◽

Stroke Severity ◽

Health Record ◽

Learning Models ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health ◽

Machine Learning Models

Abstract Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.

Download Full-text

DEVELOPMENT OF A PREDICTION MODEL FOR INCIDENT MYOCARDIAL INFARCTION USING MACHINE LEARNING APPLIED TO HARMONIZED ELECTRONIC HEALTH RECORD DATA

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(20)30821-4 ◽

2020 ◽

Vol 75 (11) ◽

pp. 194

Author(s):

Divneet Mandair ◽

Premanand Tiwari ◽

Steven Simon ◽

Michael Rosenberg

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Prediction Model ◽

Electronic Health Record ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study

The Lancet HIV ◽

10.1016/s2352-3018(19)30137-7 ◽

2019 ◽

Vol 6 (10) ◽

pp. e688-e695 ◽

Cited By ~ 19

Author(s):

Julia L Marcus ◽

Leo B Hurley ◽

Douglas S Krakower ◽

Stacey Alexeeff ◽

Michael J Silverberg ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Health Record ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health ◽

Exposure Prophylaxis ◽

Modelling Study

Download Full-text

Prediction of Acute Kidney Injury With a Machine Learning Algorithm Using Electronic Health Record Data

Canadian Journal of Kidney Health and Disease ◽

10.1177/2054358118776326 ◽

2018 ◽

Vol 5 ◽

pp. 205435811877632 ◽

Cited By ~ 31

Author(s):

Hamid Mohamadlou ◽

Anna Lynn-Palevsky ◽

Christopher Barton ◽

Uli Chettipally ◽

Lisa Shieh ◽

...

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Electronic Health Record ◽

Learning Algorithm ◽

Kidney Injury ◽

Health Record ◽

Machine Learning Algorithm ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

The Development and Validation of a Machine Learning Model to Predict Bacteremia and Fungemia in Hospitalized Patients Using Electronic Health Record Data

Critical Care Medicine ◽

10.1097/ccm.0000000000004556 ◽

2020 ◽

Vol 48 (11) ◽

pp. e1020-e1028

Author(s):

Sivasubramanium V. Bhavani ◽

Zachary Lonjers ◽

Kyle A. Carey ◽

Majid Afshar ◽

Emily R. Gilbert ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Learning Model ◽

Hospitalized Patients ◽

Health Record ◽

Electronic Health Record Data ◽

Machine Learning Model ◽

Record Data ◽

Electronic Health ◽

Development And Validation

Download Full-text

Classifying Pseudogout using Machine Learning Approaches with Electronic Health Record Data

Arthritis Care & Research ◽

10.1002/acr.24132 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sara K. Tedeschi ◽

Tianrun Cai ◽

Zeling He ◽

Yuri Ahuja ◽

Chuan Hong ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Health Record ◽

Learning Approaches ◽

Electronic Health Record Data ◽

Record Data ◽

Electronic Health

Download Full-text

Development of a Machine Learning Model Using Electronic Health Record Data to Identify Antibiotic Use Among Hospitalized Patients

JAMA Network Open ◽

10.1001/jamanetworkopen.2021.3460 ◽

2021 ◽

Vol 4 (3) ◽

pp. e213460

Author(s):

Rebekah W. Moehring ◽

Matthew Phelan ◽

Eric Lofgren ◽

Alicia Nelson ◽

Elizabeth Dodds Ashley ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Antibiotic Use ◽

Learning Model ◽

Hospitalized Patients ◽

Health Record ◽

Electronic Health Record Data ◽

Machine Learning Model ◽

Record Data ◽

Electronic Health

Download Full-text

Machine Learning Electronic Health Record Identification of Patients with Rheumatoid Arthritis: Algorithm Pipeline Development and Validation Study (Preprint)

10.2196/preprints.23930 ◽

2020 ◽

Author(s):

Tjardo D Maarseveen ◽

Timo Meinderink ◽

Marcel J T Reinders ◽

Johannes Knitza ◽

Tom W J Huizinga ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Electronic Health Record ◽

Support Vector ◽

Free Text ◽

Health Record ◽

Electronic Health Record Data ◽

Data Set ◽

Record Data ◽

Electronic Health

BACKGROUND Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries. OBJECTIVE The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records. METHODS Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naïve word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation. RESULTS For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97). CONCLUSIONS We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems.

Download Full-text