Applying machine learning algorithms to electronic health records predicted pneumonia after respiratory tract infection

Novel electronic health records applied for prediction of pre-eclampsia: machine-learning algorithms

Pregnancy Hypertension ◽

10.1016/j.preghy.2021.10.006 ◽

2021 ◽

Author(s):

Yi-xin Li ◽

Xiao-ping Shen ◽

Chao Yang ◽

Zuo-zeng Cao ◽

Rui Du ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Health Records ◽

Electronic Health

Identifying lupus patients in electronic health records: Development and validation of machine learning algorithms and application of rule-based algorithms

Seminars in Arthritis and Rheumatism ◽

10.1016/j.semarthrit.2019.01.002 ◽

2019 ◽

Vol 49 (1) ◽

pp. 84-90 ◽

Cited By ~ 7

Author(s):

April Jorge ◽

Victor M. Castro ◽

April Barnado ◽

Vivian Gainer ◽

Chuan Hong ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Health Records ◽

Rule Based ◽

Electronic Health ◽

Development And Validation

Development of Phenotyping Algorithms for the Identification of Organ Transplant Recipients: Cohort Study (Preprint)

10.2196/preprints.18001 ◽

2020 ◽

Author(s):

Lee Wheless ◽

Laura Baker ◽

LaVar Edwards ◽

Nimay Anand ◽

Kelly Birdwell ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Organ Transplantation ◽

Learning Algorithms ◽

Organ Transplant ◽

Machine Learning Algorithms ◽

Transplant Recipients ◽

Health Records ◽

Organ Transplant Recipients ◽

Electronic Health

BACKGROUND Studies involving organ transplant recipients (OTRs) are often limited to the variables collected in the national Scientific Registry of Transplant Recipients database. Electronic health records contain additional variables that can augment this data source if OTRs can be identified accurately. OBJECTIVE The aim of this study was to develop phenotyping algorithms to identify OTRs from electronic health records. METHODS We used Vanderbilt’s deidentified version of its electronic health record database, which contains nearly 3 million subjects, to develop algorithms to identify OTRs. We identified all 19,817 individuals with at least one International Classification of Diseases (ICD) or Current Procedural Terminology (CPT) code for organ transplantation. We performed a chart review on 1350 randomly selected individuals to determine the transplant status. We constructed machine learning models to calculate positive predictive values and sensitivity for combinations of codes by using classification and regression trees, random forest, and extreme gradient boosting algorithms. RESULTS Of the 1350 reviewed patient charts, 827 were organ transplant recipients while 511 had no record of a transplant, and 12 were equivocal. Most patients with only 1 or 2 transplant codes did not have a transplant. The most common reasons for being labeled a nontransplant patient were the lack of data (229/511, 44.8%) or the patient being evaluated for an organ transplant (174/511, 34.1%). All 3 machine learning algorithms identified OTRs with overall >90% positive predictive value and >88% sensitivity. CONCLUSIONS Electronic health records linked to biobanks are increasingly used to conduct large-scale studies but have not been well-utilized in organ transplantation research. We present rigorously evaluated methods for phenotyping OTRs from electronic health records that will enable the use of the full spectrum of clinical data in transplant research. Using several different machine learning algorithms, we were able to identify transplant cases with high accuracy by using only ICD and CPT codes.

Development of Phenotyping Algorithms for the Identification of Organ Transplant Recipients: Cohort Study

JMIR Medical Informatics ◽

10.2196/18001 ◽

2020 ◽

Vol 8 (12) ◽

pp. e18001

Author(s):

Lee Wheless ◽

Laura Baker ◽

LaVar Edwards ◽

Nimay Anand ◽

Kelly Birdwell ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Organ Transplantation ◽

Learning Algorithms ◽

Organ Transplant ◽

Machine Learning Algorithms ◽

Transplant Recipients ◽

Health Records ◽

Organ Transplant Recipients ◽

Electronic Health

Background Studies involving organ transplant recipients (OTRs) are often limited to the variables collected in the national Scientific Registry of Transplant Recipients database. Electronic health records contain additional variables that can augment this data source if OTRs can be identified accurately. Objective The aim of this study was to develop phenotyping algorithms to identify OTRs from electronic health records. Methods We used Vanderbilt’s deidentified version of its electronic health record database, which contains nearly 3 million subjects, to develop algorithms to identify OTRs. We identified all 19,817 individuals with at least one International Classification of Diseases (ICD) or Current Procedural Terminology (CPT) code for organ transplantation. We performed a chart review on 1350 randomly selected individuals to determine the transplant status. We constructed machine learning models to calculate positive predictive values and sensitivity for combinations of codes by using classification and regression trees, random forest, and extreme gradient boosting algorithms. Results Of the 1350 reviewed patient charts, 827 were organ transplant recipients while 511 had no record of a transplant, and 12 were equivocal. Most patients with only 1 or 2 transplant codes did not have a transplant. The most common reasons for being labeled a nontransplant patient were the lack of data (229/511, 44.8%) or the patient being evaluated for an organ transplant (174/511, 34.1%). All 3 machine learning algorithms identified OTRs with overall >90% positive predictive value and >88% sensitivity. Conclusions Electronic health records linked to biobanks are increasingly used to conduct large-scale studies but have not been well-utilized in organ transplantation research. We present rigorously evaluated methods for phenotyping OTRs from electronic health records that will enable the use of the full spectrum of clinical data in transplant research. Using several different machine learning algorithms, we were able to identify transplant cases with high accuracy by using only ICD and CPT codes.

Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms with Electronic Health Records (Preprint)

JMIR Medical Informatics ◽

10.2196/25237 ◽

2020 ◽

Author(s):

Zakhriya Alhassan ◽

Matthew Watson ◽

David Budgen ◽

Riyad Alshammari ◽

Ali Alessa ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Glycated Hemoglobin ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Health Records ◽

Electronic Health

Application of Machine Learning in Chronic Kidney Disease Risk Prediction Using Electronic Health Records (EHR)

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch014 ◽

2021 ◽

pp. 213-233

Author(s):

Laxmi Kumari Pathak ◽

Pooja Jha

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Electronic Health Records ◽

Bone Diseases ◽

Machine Learning Algorithms ◽

Health Records ◽

Chronic Kidney Disease Risk ◽

Electronic Health ◽

Heart Disorders

Chronic kidney disease (CKD) is a disorder in which the kidneys are weakened and become unable to filter blood. It lowers the human ability to remain healthy. The field of biosciences has progressed and produced vast volumes of knowledge from electronic health records. Heart disorders, anemia, bone diseases, elevated potassium, and calcium are the very prevalent complications that arise from kidney failure. Early identification of CKD can improve the quality of life greatly. To achieve this, various machine learning techniques have been introduced so far that use the data in electronic health record (EHR) to predict CKD. This chapter studies various machine learning algorithms like support vector machine, random forest, probabilistic neural network, Apriori, ZeroR, OneR, naive Bayes, J48, IBk (k-nearest neighbor), ensemble method, etc. and compares their accuracy. The study aims in finding the best-suited technique from different methods of machine learning for the early detection of CKD by which medical professionals can interpret model predictions easily.

Dense phenotyping from electronic health records enables machine-learning-based prediction of preterm birth

10.1101/2020.07.15.20154864 ◽

2020 ◽

Author(s):

Abin Abraham ◽

Brian L Le ◽

Idit Kosti ◽

Peter Straub ◽

Digna R Velez Edwards ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Preterm Birth ◽

Electronic Health Records ◽

Prediction Models ◽

Mode Of Delivery ◽

Machine Learning Algorithms ◽

Health Records ◽

Electronic Health ◽

Independent Cohort

Abstract: Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. Here, we apply machine learning to diverse data from EHRs to predict singleton preterm birth. Leveraging a large cohort of 35,282 deliveries, we find that a prediction model based on billing codes alone can predict preterm birth at 28 weeks of gestation (ROC-AUC=0.75, PR-AUC=0.40) and outperforms a comparable model trained using known risk factors (ROC-AUC=0.59, PR-AUC=0.21). Our machine learning approach is also able to accurately predict preterm birth sub-types (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. We demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5,978 deliveries) with only a modest decrease in performance. Interpreting the features identified by the model as most informative for risk stratification demonstrates that they capture non-linear combinations of known risk factors and patterns of care. The strong performance of our approach across multiple clinical contexts and an independent cohort highlights the potential of machine learning algorithms to improve medical care during pregnancy.

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application (Preprint)

10.2196/preprints.10497 ◽

2018 ◽

Author(s):

Gondy Leroy ◽

Yang Gu ◽

Sydney Pettygrove ◽

Maureen K Galindo ◽

Ananyaa Arora ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Diagnostic Criteria ◽

Large Scale ◽

Autism Spectrum ◽

Machine Learning Algorithms ◽

Health Records ◽

Rule Based ◽

Electronic Health ◽

Rule Based Approach

BACKGROUND Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. OBJECTIVE Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. METHODS We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. RESULTS We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. CONCLUSIONS Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.

Predicting dementia diagnosis from cognitive footprints in electronic health records: a case–control study protocol

BMJ Open ◽

10.1136/bmjopen-2020-043487 ◽

2020 ◽

Vol 10 (11) ◽

pp. e043487

Author(s):

Hao Luo ◽

Kui Kai Lau ◽

Gloria H Y Wong ◽

Wai-Chi Chan ◽

Henry K F Mak ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Hong Kong ◽

Electronic Health Records ◽

Case Control Study ◽

Case Control ◽

Dementia Diagnosis ◽

Health Records ◽

Electronic Health ◽

Control Study

IntroductionDementia is a group of disabling disorders that can be devastating for persons living with it and for their families. Data-informed decision-making strategies to identify individuals at high risk of dementia are essential to facilitate large-scale prevention and early intervention. This population-based case–control study aims to develop and validate a clinical algorithm for predicting dementia diagnosis, based on the cognitive footprint in personal and medical history.Methods and analysisWe will use territory-wide electronic health records from the Clinical Data Analysis and Reporting System (CDARS) in Hong Kong between 1 January 2001 and 31 December 2018. All individuals who were at least 65 years old by the end of 2018 will be identified from CDARS. A random sample of control individuals who did not receive any diagnosis of dementia will be matched with those who did receive such a diagnosis by age, gender and index date with 1:1 ratio. Exposure to potential protective/risk factors will be included in both conventional logistic regression and machine-learning models. Established risk factors of interest will include diabetes mellitus, midlife hypertension, midlife obesity, depression, head injuries and low education. Exploratory risk factors will include vascular disease, infectious disease and medication. The prediction accuracy of several state-of-the-art machine-learning algorithms will be compared.Ethics and disseminationThis study was approved by Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 18-225). Patients’ records are anonymised to protect privacy. Study results will be disseminated through peer-reviewed publications. Codes of the resulted dementia risk prediction algorithm will be made publicly available at the website of the Tools to Inform Policy: Chinese Communities’ Action in Response to Dementia project (https://www.tip-card.hku.hk/).

Comparative analysis of machine learning methods for analyzing security practice in electronic health records’ logs.

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378353 ◽

2020 ◽

Author(s):

Prosper K Yeng ◽

Muhammad Ali Fauzi ◽

Bian Yang

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Electronic Health Records ◽

Learning Methods ◽

Health Records ◽

Machine Learning Methods ◽

Electronic Health