Leveraging Large-scale Electronic Health Records and Interpretable Machine Learning for Clinical Decision Making at the Emergency Department: Protocol for System Development and Validation (Preprint)

2021 ◽  
Author(s):  
Nan Liu ◽  
Feng Xie ◽  
Fahad Javaid Siddiqui ◽  
Andrew Fu Wah Ho ◽  
Bibhas Chakraborty ◽  
...  

BACKGROUND There is a growing demand globally for emergency department (ED) services. An increase in ED visits has resulted in overcrowding and longer wait times. The triage process plays a crucial role in assessing and stratifying patients' risks and ensuring that the critically ill promptly receive appropriate priority and emergency treatment. A substantial amount of research has been conducted on the use of machine learning tools to construct triage and risk prediction models; however, the black box nature of these models has limited their clinical application and interpretation. OBJECTIVE In this study, we plan to develop an innovative, dynamic, and interpretable System for Emergency Risk Triage (SERT) for risk stratification in the ED by leveraging large-scale electronic health records (EHR) and machine learning. METHODS To achieve this objective, we will conduct a retrospective, single-centre study based on a large, longitudinal dataset obtained from the EHR of the largest tertiary hospital in Singapore. Study outcomes include adverse events experienced by patients, such as the need for an intensive care unit, inpatient death, among others. With pre-identified candidate variables drawn from expert opinions and relevant literature, we will apply an interpretable machine learning-based AutoScore to develop three SERT scores. These three scores can be used at different times in the ED, i.e., upon arrival, during the ED stay, and at admission. Furthermore, we will compare our novel SERT scores with established clinical scores and previously described black box machine learning models as baselines. The receiver operating characteristic analysis will be conducted on the testing cohorts for performance evaluation. RESULTS The study is currently being conducted. The extracted data indicate approximately 1.8 million ED visits by over 810,000 unique patients. Modelling results are expected to be published in 2022. CONCLUSIONS The SERT scoring system proposed in this study will be unique and innovative due to its dynamic nature and modelling transparency. If successfully validated, our proposed solution will establish a standard for data processing and modelling by taking advantage of large-scale EHRs and interpretable machine learning tools.

Information ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 386
Author(s):  
Sheikh S. Abdullah ◽  
Neda Rostamzadeh ◽  
Kamran Sedig ◽  
Amit X. Garg ◽  
Eric McArthur

Acute kidney injury (AKI) is a common complication in hospitalized patients and can result in increased hospital stay, health-related costs, mortality and morbidity. A number of recent studies have shown that AKI is predictable and avoidable if early risk factors can be identified by analyzing Electronic Health Records (EHRs). In this study, we employ machine learning techniques to identify older patients who have a risk of readmission with AKI to the hospital or emergency department within 90 days after discharge. One million patients’ records are included in this study who visited the hospital or emergency department in Ontario between 2014 and 2016. The predictor variables include patient demographics, comorbid conditions, medications and diagnosis codes. We developed 31 prediction models based on different combinations of two sampling techniques, three ensemble methods, and eight classifiers. These models were evaluated through 10-fold cross-validation and compared based on the AUROC metric. The performances of these models were consistent, and the AUROC ranged between 0.61 and 0.88 for predicting AKI among 31 prediction models. In general, the performances of ensemble-based methods were higher than the cost-sensitive logistic regression. We also validated features that are most relevant in predicting AKI with a healthcare expert to improve the performance and reliability of the models. This study predicts the risk of AKI for a patient after being discharged, which provides healthcare providers enough time to intervene before the onset of AKI.


2018 ◽  
Author(s):  
Gondy Leroy ◽  
Yang Gu ◽  
Sydney Pettygrove ◽  
Maureen K Galindo ◽  
Ananyaa Arora ◽  
...  

BACKGROUND Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. OBJECTIVE Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. METHODS We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. RESULTS We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. CONCLUSIONS Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.


BMJ Open ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. e043487
Author(s):  
Hao Luo ◽  
Kui Kai Lau ◽  
Gloria H Y Wong ◽  
Wai-Chi Chan ◽  
Henry K F Mak ◽  
...  

IntroductionDementia is a group of disabling disorders that can be devastating for persons living with it and for their families. Data-informed decision-making strategies to identify individuals at high risk of dementia are essential to facilitate large-scale prevention and early intervention. This population-based case–control study aims to develop and validate a clinical algorithm for predicting dementia diagnosis, based on the cognitive footprint in personal and medical history.Methods and analysisWe will use territory-wide electronic health records from the Clinical Data Analysis and Reporting System (CDARS) in Hong Kong between 1 January 2001 and 31 December 2018. All individuals who were at least 65 years old by the end of 2018 will be identified from CDARS. A random sample of control individuals who did not receive any diagnosis of dementia will be matched with those who did receive such a diagnosis by age, gender and index date with 1:1 ratio. Exposure to potential protective/risk factors will be included in both conventional logistic regression and machine-learning models. Established risk factors of interest will include diabetes mellitus, midlife hypertension, midlife obesity, depression, head injuries and low education. Exploratory risk factors will include vascular disease, infectious disease and medication. The prediction accuracy of several state-of-the-art machine-learning algorithms will be compared.Ethics and disseminationThis study was approved by Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 18-225). Patients’ records are anonymised to protect privacy. Study results will be disseminated through peer-reviewed publications. Codes of the resulted dementia risk prediction algorithm will be made publicly available at the website of the Tools to Inform Policy: Chinese Communities’ Action in Response to Dementia project (https://www.tip-card.hku.hk/).


2021 ◽  
Author(s):  
Nawar Shara ◽  
Kelley M. Anderson ◽  
Noor Falah ◽  
Maryam F. Ahmad ◽  
Darya Tavazoei ◽  
...  

BACKGROUND Healthcare data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes [6]. However, the differences that exist in each individual’s health records, combined with the lack of health-data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. While these problems exist throughout healthcare, they are especially prevalent within maternal health, and exacerbate the maternal morbidity and mortality (MMM) crisis in the United States. OBJECTIVE Maternal patient records were extracted from the electronic health records (EHRs) of a large tertiary healthcare system and made into patient-specific, complete datasets through a systematic method so that a machine-learning-based (ML-based) risk-assessment algorithm could effectively identify maternal cardiovascular risk prior to evidence of diagnosis or intervention within the patient’s record. METHODS We outline the effort that was required to define the specifications of the computational systems, the dataset, and access to relevant systems, while ensuring data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for its use by a proprietary risk-stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient’s baselines to inform early interventions. RESULTS Patient records can be made actionable for the goal of effectively employing machine learning (ML), specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS Upon acquiring data, including the concatenation, anonymization, and normalization of said data across multiple EHRs, the use of a machine-learning-based (ML-based) tool can provide early identification of cardiovascular risk in pregnant patients. CLINICALTRIAL N/A


2020 ◽  
Author(s):  
Nansu Zong ◽  
Victoria Ngo ◽  
Daniel J. Stone ◽  
Andrew Wen ◽  
Yiqing Zhao ◽  
...  

BACKGROUND Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnose, and treatments. A key research area focuses on early detection of primary cancers and the potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict unknown primaries. METHODS We extracted the genetic data elements from a collection of oncology genetic reports of 1,011 cancer patients, and corresponding phenotypical data from the Mayo Clinic electronic health records (EHRs). We modeled both genetic and EHR data with HL7 Fast Healthcare Interoperability Resources (FHIR). The semantic web Resource Description Framework (RDF) was employed to generate the network-based data representation (i.e., patient-phenotypic-genetic network). Based on RDF data graph, graph embedding algorithm Node2vec was applied to generate features, and then multiple machine learning and deep learning backbone models were adopted for cancer prediction. RESULTS With six machine-learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types and predicting unknown primaries. To demonstrate the interpretability, phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS Accurate prediction of cancer types can be achieved with existing EHR data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnose stage for cancer patients.


2020 ◽  
Vol 8 (10) ◽  
pp. 1-140
Author(s):  
Alison Porter ◽  
Anisha Badshah ◽  
Sarah Black ◽  
David Fitzpatrick ◽  
Robert Harris-Mayes ◽  
...  

Background Ambulance services have a vital role in the shift towards the delivery of health care outside hospitals, when this is better for patients, by offering alternatives to transfer to the emergency department. The introduction of information technology in ambulance services to electronically capture, interpret, store and transfer patient data can support out-of-hospital care. Objective We aimed to understand how electronic health records can be most effectively implemented in a pre-hospital context in order to support a safe and effective shift from acute to community-based care, and how their potential benefits can be maximised. Design and setting We carried out a study using multiple methods and with four work packages: (1) a rapid literature review; (2) a telephone survey of all 13 freestanding UK ambulance services; (3) detailed case studies examining electronic health record use through qualitative methods and analysis of routine data in four selected sites consisting of UK ambulance services and their associated health economies; and (4) a knowledge-sharing workshop. Results We found limited literature on electronic health records. Only half of the UK ambulance services had electronic health records in use at the time of data collection, with considerable variation in hardware and software and some reversion to use of paper records as services transitioned between systems. The case studies found that the ambulance services’ electronic health records were in a state of change. Not all patient contacts resulted in the generation of electronic health records. Ambulance clinicians were dealing with partial or unclear information, which may not fit comfortably with the electronic health records. Ambulance clinicians continued to use indirect data input approaches (such as first writing on a glove) even when using electronic health records. The primary function of electronic health records in all services seemed to be as a store for patient data. There was, as yet, limited evidence of electronic health records’ full potential being realised to transfer information, support decision-making or change patient care. Limitations Limitations included the difficulty of obtaining sets of matching routine data for analysis, difficulties of attributing any change in practice to electronic health records within a complex system and the rapidly changing environment, which means that some of our observations may no longer reflect reality. Conclusions Realising all the benefits of electronic health records requires engagement with other parts of the local health economy and dealing with variations between providers and the challenges of interoperability. Clinicians and data managers, and those working in different parts of the health economy, are likely to want very different things from a data set and need to be presented with only the information that they need. Future work There is scope for future work analysing ambulance service routine data sets, qualitative work to examine transfer of information at the emergency department and patients’ perspectives on record-keeping, and to develop and evaluate feedback to clinicians based on patient records. Study registration This study is registered as Health and Care Research Wales Clinical Research Portfolio 34166. Funding This project was funded by the National Institute for Health Research (NIHR) Health Services and Delivery Research programme and will be published in full in Health Services and Delivery Research; Vol. 8, No. 10. See the NIHR Journals Library website for further project information.


Sign in / Sign up

Export Citation Format

Share Document