scholarly journals Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework

Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1908
Author(s):  
Fabiola Fernández-Gutiérrez ◽  
Jonathan I. Kennedy ◽  
Roxanne Cooksey ◽  
Mark Atkinson ◽  
Ernest Choy ◽  
...  

(1) Background: We aimed to develop a transparent machine-learning (ML) framework to automatically identify patients with a condition from electronic health records (EHRs) via a parsimonious set of features. (2) Methods: We linked multiple sources of EHRs, including 917,496,869 primary care records and 40,656,805 secondary care records and 694,954 records from specialist surgeries between 2002 and 2012, to generate a unique dataset. Then, we treated patient identification as a problem of text classification and proposed a transparent disease-phenotyping framework. This framework comprises a generation of patient representation, feature selection, and optimal phenotyping algorithm development to tackle the imbalanced nature of the data. This framework was extensively evaluated by identifying rheumatoid arthritis (RA) and ankylosing spondylitis (AS). (3) Results: Being applied to the linked dataset of 9657 patients with 1484 cases of rheumatoid arthritis (RA) and 204 cases of ankylosing spondylitis (AS), this framework achieved accuracy and positive predictive values of 86.19% and 88.46%, respectively, for RA and 99.23% and 97.75% for AS, comparable with expert knowledge-driven methods. (4) Conclusions: This framework could potentially be used as an efficient tool for identifying patients with a condition of interest from EHRs, helping clinicians in clinical decision-support process.

PLoS ONE ◽  
2016 ◽  
Vol 11 (5) ◽  
pp. e0154515 ◽  
Author(s):  
Shang-Ming Zhou ◽  
Fabiola Fernandez-Gutierrez ◽  
Jonathan Kennedy ◽  
Roxanne Cooksey ◽  
Mark Atkinson ◽  
...  

2020 ◽  
Author(s):  
Nicholas B. Link ◽  
Selena Huang ◽  
Tianrun Cai ◽  
Zeling He ◽  
Jiehuan Sun ◽  
...  

ABSTRACTObjectiveThe use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce an unsupervised method for acronym disambiguation, the task of classifying the correct sense of acronyms in the clinical EHR notes.MethodsWe developed an unsupervised ensemble machine learning (CASEml) algorithm to automatically classify acronyms by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard unsupervised method and a baseline metric selecting the most frequent acronym sense. We additionally evaluated the effects of RA disambiguation on NLP-driven phenotyping of rheumatoid arthritis.ResultsCASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art unsupervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.ConclusionCASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and unsupervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.


PLoS ONE ◽  
2013 ◽  
Vol 8 (2) ◽  
pp. e54878 ◽  
Author(s):  
Amanda Nicholson ◽  
Elizabeth Ford ◽  
Kevin A. Davies ◽  
Helen E. Smith ◽  
Greta Rait ◽  
...  

2018 ◽  
Vol 68 (suppl 1) ◽  
pp. bjgp18X696749 ◽  
Author(s):  
Maimoona Hashmi ◽  
Mark Wright ◽  
Kirin Sultana ◽  
Benjamin Barratt ◽  
Lia Chatzidiakou ◽  
...  

BackgroundChronic Obstructive Airway Disease (COPD) is marked by often severely debilitating exacerbations. Efficient patient-centric research approaches are needed to better inform health management primary-care.AimThe ‘COPE study’ aims to develop a method of predicting COPD exacerbations utilising personal air quality sensors, environmental exposure modelling and electronic health records through the recruitment of patients from consenting GPs contributing to the Clinical Practice Research Datalink (CPRD).MethodThe study made use of Electronic Healthcare Records (EHR) from CPRD, an anonymised GP records database to screen and locate patients within GP practices in Central London. Personal air monitors were used to capture data on individual activities and environmental exposures. Output from the monitors were then linked with the EHR data to obtain information on COPD management, severity, comorbidities and exacerbations. Symptom changes not equating to full exacerbations were captured on diary cards. Linear regression was used to investigate the relationship between subject peak flow, symptoms, exacerbation events and exposure data.ResultsPreliminary results on the first 80 patients who have completed the study indicate variable susceptibility to environmental stressors in COPD patients. Some individuals appear highly susceptible to environmental stress and others appear to have unrelated triggers.ConclusionRecruiting patients through EHR for a study is feasible and allows easy collection of data for long term follow up. Portable environmental sensors could now be used to develop personalised models to predict risk of COPD exacerbations in susceptible individuals. Identification of direct links between participant health and activities would allow improved health management thus cost savings.


BMJ Open ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. e043487
Author(s):  
Hao Luo ◽  
Kui Kai Lau ◽  
Gloria H Y Wong ◽  
Wai-Chi Chan ◽  
Henry K F Mak ◽  
...  

IntroductionDementia is a group of disabling disorders that can be devastating for persons living with it and for their families. Data-informed decision-making strategies to identify individuals at high risk of dementia are essential to facilitate large-scale prevention and early intervention. This population-based case–control study aims to develop and validate a clinical algorithm for predicting dementia diagnosis, based on the cognitive footprint in personal and medical history.Methods and analysisWe will use territory-wide electronic health records from the Clinical Data Analysis and Reporting System (CDARS) in Hong Kong between 1 January 2001 and 31 December 2018. All individuals who were at least 65 years old by the end of 2018 will be identified from CDARS. A random sample of control individuals who did not receive any diagnosis of dementia will be matched with those who did receive such a diagnosis by age, gender and index date with 1:1 ratio. Exposure to potential protective/risk factors will be included in both conventional logistic regression and machine-learning models. Established risk factors of interest will include diabetes mellitus, midlife hypertension, midlife obesity, depression, head injuries and low education. Exploratory risk factors will include vascular disease, infectious disease and medication. The prediction accuracy of several state-of-the-art machine-learning algorithms will be compared.Ethics and disseminationThis study was approved by Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 18-225). Patients’ records are anonymised to protect privacy. Study results will be disseminated through peer-reviewed publications. Codes of the resulted dementia risk prediction algorithm will be made publicly available at the website of the Tools to Inform Policy: Chinese Communities’ Action in Response to Dementia project (https://www.tip-card.hku.hk/).


Rheumatology ◽  
2021 ◽  
Author(s):  
Dahai Yu ◽  
George Peat ◽  
Kelvin P Jordan ◽  
James Bailey ◽  
Daniel Prieto-Alhambra ◽  
...  

Abstract Objectives Better indicators from affordable, sustainable data sources are needed to monitor population burden of musculoskeletal conditions. We propose five indicators of musculoskeletal health and assessed if routinely available primary care electronic health records (EHR) can estimate population levels in musculoskeletal consulters. Methods We collected validated patient-reported measures of pain experience, function and health status through a local survey of adults (≥35 years) presenting to English general practices over 12 months for low back pain, shoulder pain, osteoarthritis and other regional musculoskeletal disorders. Using EHR data we derived and validated models for estimating population levels of five self-reported indicators: prevalence of high impact chronic pain, overall musculoskeletal health (based on Musculoskeletal Health Questionnaire), quality of life (based on EuroQoL health utility measure), and prevalence of moderate-to-severe low back pain and moderate-to-severe shoulder pain. We applied models to a national EHR database (Clinical Practice Research Datalink) to obtain national estimates of each indicator for three successive years. Results The optimal models included recorded demographics, deprivation, consultation frequency, analgesic and antidepressant prescriptions, and multimorbidity. Applying models to national EHR, we estimated that 31.9% of adults (≥35 years) presenting with non-inflammatory musculoskeletal disorders in England in 2016/17 experienced high impact chronic pain. Estimated population health levels were worse in women, older aged and those in the most deprived neighbourhoods, and changed little over 3 years. Conclusion National and subnational estimates for a range of subjective indicators of non-inflammatory musculoskeletal health conditions can be obtained using information from routine electronic health records.


2013 ◽  
Vol 112 (3) ◽  
pp. 731-737 ◽  
Author(s):  
Usman Iqbal ◽  
Cheng-Hsun Ho ◽  
Yu-Chuan(Jack) Li ◽  
Phung-Anh Nguyen ◽  
Wen-Shan Jian ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document