scholarly journals Data-driven discovery of seasonally linked diseases from an Electronic Health Records system

2014 ◽  
Vol 15 (Suppl 6) ◽  
pp. S3 ◽  
Author(s):  
Rachel D Melamed ◽  
Hossein Khiabanian ◽  
Raul Rabadan
2020 ◽  
Vol 4 (4) ◽  
Author(s):  
Jie Xu ◽  
Fei Wang ◽  
Zhenxing Xu ◽  
Prakash Adekkanattu ◽  
Pascal Brandt ◽  
...  

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Larry Y. Liu ◽  
William S. Bush ◽  
Mehmet Koyutürk ◽  
Günnur Karakurt

Abstract Background It is estimated that a majority of intimate partner violence (IPV) victims suffer from blunt force to the head, neck and the face area. Injuries to head and neck are among the major causes for traumatic brain injury (TBI). Methods In this interdisciplinary study, we aimed to characterize the key associations between IPV and TBI by mining de-identified electronic health records data with more than 12 M records between 1999 to 2017 from the IBM Explorys platform. For this purpose, we formulated a data-driven analytical framework to identify significant health correlates among IPV, TBI and six control cohorts. Using this framework, we assessed the co-morbidity, shared prevalence, and synergy between pairs of conditions. Results Our findings suggested that health effects attributed to malnutrition, acquired thrombocytopenia, post-traumatic wound infection, local infection of wound, poisoning by cardiovascular drug, alcoholic cirrhosis, alcoholic fatty liver, and drug-induced cirrhosis were highly significant at the joint presence of IPV and TBI. Conclusion To develop a better understanding of how IPV is related to negative health effects, it is potentially useful to determine the interactions and relationships between symptom categories. Our results can potentially improve the accuracy and confidence of existing clinical screening techniques on determining IPV-induced TBI diagnoses.


2020 ◽  
Vol 4 (1) ◽  
pp. 7 ◽  
Author(s):  
Neda Rostamzadeh ◽  
Sheikh S. Abdullah ◽  
Kamran Sedig

Electronic health records (EHRs) can be used to make critical decisions, to study the effects of treatments, and to detect hidden patterns in patient histories. In this paper, we present a framework to identify and analyze EHR-data-driven tasks and activities in the context of interactive visualization tools (IVTs)—that is, all the activities, sub-activities, tasks, and sub-tasks that are and can be supported by EHR-based IVTs. A systematic literature survey was conducted to collect the research papers that describe the design, implementation, and/or evaluation of EHR-based IVTs that support clinical decision-making. Databases included PubMed, the ACM Digital Library, the IEEE Library, and Google Scholar. These sources were supplemented by gray literature searching and reference list reviews. Of the 946 initially identified articles, the survey analyzes 19 IVTs described in 24 articles that met the final selection criteria. The survey includes an overview of the goal of each IVT, a brief description of its visualization, and an analysis of how sub-activities, tasks, and sub-tasks blend and combine to accomplish the tool’s main higher-level activities of interpreting, predicting, and monitoring. Our proposed framework shows the gaps in support of higher-level activities supported by existing IVTs. It appears that almost all existing IVTs focus on the activity of interpreting, while only a few of them support predicting and monitoring—this despite the importance of these activities in assisting users in finding patients that are at high risk and tracking patients’ status after treatment.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 679-679
Author(s):  
Limor Appelbaum ◽  
Jose Pablo Cambronero ◽  
Karla Pollick ◽  
George Silva ◽  
Jennifer P. Stevens ◽  
...  

679 Background: Pancreatic Adenocarcinoma (PDAC) is often diagnosed at an advanced stage. We sought to develop a model for early PDAC prediction in the general population, using electronic health records (EHRs) and machine learning. Methods: We used three EHR datasets from Beth-Israel Deaconess Medical Center (BIDMC) and Partners Healthcare (PHC): 1. “BIDMC-Development-Data” (BIDMC-DD) for model development, using a feed-forward neural network (NN) and L2-regularized logistic regression,randomly split (80:20) into training and test groups. We tuned hyperparameters using cross-validation in training, and report performance on the test split. 2. “BIDMC-Large-Data” (BIDMC-LD) to re-fit and calibrate models. 3. “PHC-Data” for external validation. We evaluate using Area Under the Receiver Operating Characteristic Curve (AUC) and compute 95% CI using empirical bootstrap over test data. PDAC patients were selected using ICD9/-10 codes and validated with tumor registries. In contrast to prior work, we did not predefine feature sets based on known clinical correlates and instead employed data-driven feature selection, specifically importance-based feature pruning, regularization, and manual validation, to identify diagnostic-based features. Results: BIDMC-DD included demographics, diagnoses, labs and medications for 1018 patients (cases = 509; age-sex paired controls). BIDMC-LD included diagnoses for 547,917 patients (cases = 509), and PHC included diagnoses for 160,593 patients (cases = 408). We compared our approach to adapted and re-fitted published baselines. With a 365-day lead-time, NN obtained a BIDMC-DD test AUC of 0.84 (CI 0.79 - 0.90) versus the previous best baseline AUC of 0.70 (CI 0.62 - 0.78). We also validated using BIDMC-DD’s test cancer patients and BIDMC LD controls. The AUC was 0.71 (CI 0.67 - 0.76) at the 365-day cutoff. NN’s external validation AUC on PHC-Data was 0.71 (CI 0.63 - 0.79), outperforming an existing model’s AUC of 0.61 (CI 0.52 - 0.70) (Baecker et al, 2019). Conclusions: Models based on data-driven feature selection outperform models that use predefined sets of known clinical correlates and can help in early prediction of PDAC development.


BMJ Open ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. e034396 ◽  
Author(s):  
Patrick Rockenschaub ◽  
Vincent Nguyen ◽  
Robert W Aldridge ◽  
Dionisio Acosta ◽  
Juan Miguel García-Gómez ◽  
...  

ObjectivesTo demonstrate how data-driven variability methods can be used to identify changes in disease recording in two English electronic health records databases between 2001 and 2015.DesignRepeated cross-sectional analysis that applied data-driven temporal variability methods to assess month-by-month changes in routinely collected medical data. A measure of difference between months was calculated based on joint distributions of age, gender, socioeconomic status and recorded cardiovascular diseases. Distances between months were used to identify temporal trends in data recording.Setting400 English primary care practices from the Clinical Practice Research Datalink (CPRD GOLD) and 451 hospital providers from the Hospital Episode Statistics (HES).Main outcomesThe proportion of patients (CPRD GOLD) and hospital admissions (HES) with a recorded cardiovascular disease (CPRD GOLD: coronary heart disease, heart failure, peripheral arterial disease, stroke; HES: International Classification of Disease codes I20-I69/G45).ResultsBoth databases showed gradual changes in cardiovascular disease recording between 2001 and 2008. The recorded prevalence of included cardiovascular diseases in CPRD GOLD increased by 47%–62%, which partially reversed after 2008. For hospital records in HES, there was a relative decrease in angina pectoris (−34.4%) and unspecified stroke (−42.3%) over the same time period, with a concomitant increase in chronic coronary heart disease (+14.3%). Multiple abrupt changes in the use of myocardial infarction codes in hospital were found in March/April 2010, 2012 and 2014, possibly linked to updates of clinical coding guidelines.ConclusionsIdentified temporal variability could be related to potentially non-medical causes such as updated coding guidelines. These artificial changes may introduce temporal correlation among diagnoses inferred from routine data, violating the assumptions of frequently used statistical methods. Temporal variability measures provide an objective and robust technique to identify, and subsequently account for, those changes in electronic health records studies without any prior knowledge of the data collection process.


Sign in / Sign up

Export Citation Format

Share Document