Asthma/COPD Data Set And Definitions: Environmental Scan Of Electronic Health Records

Author(s):  
Janice P. Minard ◽  
Mary Ann Juurlink ◽  
Carole Madeley ◽  
Anne Van Dam ◽  
M. D. Lougheed
GigaScience ◽  
2020 ◽  
Vol 9 (8) ◽  
Author(s):  
Carlos Sáez ◽  
Alba Gutiérrez-Sacristán ◽  
Isaac Kohane ◽  
Juan M García-Gómez ◽  
Paul Avillach

Abstract Background Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. Results EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. Conclusions EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface. Availability: https://github.com/hms-dbmi/EHRtemporalVariability/ Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.html Online demo: http://ehrtemporalvariability.upv.es/


2010 ◽  
Vol 40 (7/8) ◽  
pp. 336-343 ◽  
Author(s):  
Bonnie L. Westra ◽  
Amarnath Subramanian ◽  
Colleen M. Hart ◽  
Susan A. Matney ◽  
Patricia S. Wilson ◽  
...  

Author(s):  
Phil Appleby

ABSTRACT ObjectivesTo build a searchable database for SNP array data from the GoDARTS data set, in which a combined view of genotype data derived from multiple assay platforms can be extracted for both candidate gene and GWA studies and to combine this with a database of phenotype descriptors which are saved as shareable, reusable database objects and which persist beyond the lifetime of any analysis script. To build databases and software solutions which can be made readily available to laboratories and academic institutions which may not have the resources to adopt one of the larger Genotype / Phenotype integration solutions. ApproachTwo databases were built. The first is a hybrid Genomics one in which variant and study subject data are stored in a database with variant detail data retained in Variant Call Format (VCF) files. The second database saves phenotype descriptors as shareable, modifiable database objects alongside a table of events derived from the set of available Electronic Health Records (EHRs). All detail from the EHRs is also retained in the database which is delivered on a project by project basis using virtual machines. Both databases are accessed using web applications, allowing delivery of data to the users’ desktops. ResultsTraditionally the process of deriving genotype and phenotype data for epidemiological studies can be a laborious one with genotype data being retrieved from large, flat data files and phenotypes being defined by codes in flat EHR records which are tested and filtered in scripts, written for analysis in a statistical package such as Stata, SPSS or R. In our solution, genotype data can be retrieved in seconds and delivered to the users’ desktops. Similarly lists of cases and controls can be downloaded based on saved or transient phenotype descriptors. Phenotypes descriptors derived from codes in Electronic Health Records are saved as reusable, shareable and modifiable database objects objects, allowing rapid retrieval of phenotype data. ConclusionThe ability to access Genomic data from multiple assay platforms and to use this in conjunction with shareable libraries of phenotype objects allows rapid access to data for analysis using both Genomic SNP Array data and linked Electronic Health Records. Analysis on data extracted from our linked databases should proceed more rapidly and should be more easily reproducible.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Neal Yuan ◽  
Khalid Latif ◽  
Patrick G. Botting ◽  
Yaron Elad ◽  
Steven M. Bradley ◽  
...  

Background Contrast‐associated acute kidney injury (CA‐AKI) is associated with substantial morbidity and may be prevented by using less contrast during percutaneous coronary intervention (PCI). However, tools for determining safe contrast volumes are limited. We developed risk models to tailor safe contrast volume limits during PCI. Methods and Results Using data from all PCIs performed at 18 hospitals from January 2015 to March 2018, we developed logistic regression models for predicting CA‐AKI, including simpler models (“pragmatic full,” “pragmatic minimum”) using only predictors easily derivable from electronic health records. We prospectively validated these models using PCI data from April 2018 to December 2018 and compared them to preexisting safe contrast models using the area under the receiver operating characteristic curve (AUC). The model derivation data set included 20 579 PCIs with 2102 CA‐AKI cases. When applying models to the separate validation data set (5423 PCIs, 488 CA‐AKI cases), prior safe contrast limits (5*Weight/Creatinine, 2*CreatinineClearance) were poor measures of safety with accuracies of 53.7% and 56.6% in predicting CA‐AKI, respectively. The full, pragmatic full, and pragmatic minimum models performed significantly better (accuracy, 73.1%, 69.3%, 66.6%; AUC, 0.80, 0.76, 0.72 versus 0.59 for 5 * Weight/Creatinine, 0.61 for 2*CreatinineClearance). We found that applying safe contrast limits could meaningfully reduce CA‐AKI risk in one‐quarter of patients. Conclusions Compared with preexisting equations, new multivariate models for safe contrast limits were substantially more accurate in predicting CA‐AKI and could help determine which patients benefit most from limiting contrast during PCI. Using readily available electronic health record data, these models could be implemented into electronic health records to provide actionable information for improving PCI safety.


Author(s):  
Cristina Lopez ◽  
Jose Luis Holgado ◽  
Raquel Cortes ◽  
Inma Sauri ◽  
Antonio Fernandez ◽  
...  

Artificial Intelligence are creating a paradigm shift in health care, being phenotyping patients through clustering techniques one of the areas of interest. Objective: To develop a predictive model to classify heart failure (HF) patients according to their left ventricular ejection fraction (LVEF), by using available data in Electronic Health Records (EHR). Subjects and methods: 2854 subjects more than 25 years old with diagnose of HF and LVEF measured by echocardiography were selected to develop an algorithm to predict patients with reduced EF using supervised analysis. Performance of the algorithm developed were tested in heart failure patients from Primary Care. To select the most influencing variables, LASSO algorithm setting was used and to tackle the issue of one class exceed the other one by a large proportion we used the Synthetic Minority Oversampling Technique (SMOTE). Finally, Random Forest (RF) and XGBoost models were constructed. Results: Full XGBoost model obtained the maximized accuracy, a high negative predictive value and the highest positive predictive value. Gender, age, unstable angina, atrial fibrillation and acute myocardial infarct are the variables that most influence FE value. Applied in the EHR data set with a total 25594 patients with an ICD-code of HF and no regular follow-up in Cardiology clinics, 6170 (21.1%) were identified as those pertaining to the reduced EF group. Conclusion: The algorithm obtained is able to rescue a number of HF patients with reduced ejection fraction that can be take benefit for a protocol with strong recommendation to succeed. Furthermore, the methodology can be used for studies with data extracted from the Electronic Health Records.


Author(s):  
Amrita Bandyopadhyay ◽  
Karen Tingay ◽  
Ashley Akbari ◽  
Lucy Griffiths ◽  
Mario Cortina-Borja ◽  
...  

BackgroundHarmonisation of different data sources from various electronic health records (EHRs) across systems enhances the potential scope and granularity of data available to health data research. ObjectiveTo describe data harmonisation of routine electronic healthcare records in Wales and Scotland linked to a UK longitudinal birth cohort, the Millennium Cohort Study (MCS). MethodsComparable secondary care data was linked, with parental consent, to MCS information for 1838 and 1431 children participating in MCS and residing in Wales and Scotland, by assigning, respectively, unique Anonymised Linkage Fields to personbased records in the privacy protecting Secure Anonymised Information Linkage (SAIL) databank at Swansea University, and by the National Health Service (NHS) Information Standards Division. Survey and non-response weights were created to account for the clustered sample, sample attrition and consent to linkage. Heterogeneous variables from the Patient Episode Dataset for Wales, Emergency Department Data Set for Wales, Scottish Medical Record 01 and Accident and Emergency dataset for Scotland were harmonised enabling data to be pooled and standardised for research. FindingsOverall linkage to harmonised health care data was achieved for 98.9% (99.9% for Wales and 97.6% for Scotland) of consented MCS participants. 66% of children experienced at least one hospital admission (total 5747 hospital admissions) up totheir 14th birthday, while 60% attended A&E departments at least once (total 5221 attendances) between their 9th and 14th birthday. We managed date granularity by generating random dates of birth, standardising periods of data collection,identifying inconsistencies and then mapping and bridging differences in definitions of periods of care across countries and datasets. ConclusionsCombining and harmonising data from multiple sources and linking them to information from a longitudinal cohort create useful resources for population health research. These methods are reproducible and can be utilised by other researchersand projects.


2019 ◽  
Vol 65 (12) ◽  
pp. 1522-1531 ◽  
Author(s):  
Jacob J Hughey ◽  
Jennifer M Colby

Abstract BACKGROUND Exposure to drugs of abuse is frequently assessed using urine drug screening (UDS) immunoassays. Although fast and relatively inexpensive, UDS assays often cross-react with unrelated compounds, which can lead to false-positive results and impair patient care. The current process of identifying cross-reactivity relies largely on case reports, making it sporadic and inefficient, and rendering knowledge of cross-reactivity incomplete. Here, we present a systematic approach to discover cross-reactive substances using data from electronic health records (EHRs). METHODS Using our institution's EHR data, we assembled a data set of 698651 UDS results across 10 assays and linked each UDS result to the corresponding individual's previous medication exposures. We hypothesized that exposure to a cross-reactive ingredient would increase the odds of a false-positive screen. For 2201 assay–ingredient pairs, we quantified potential cross-reactivity as an odds ratio from logistic regression. We then evaluated cross-reactivity experimentally by spiking the ingredient or its metabolite into drug-free urine and testing the spiked samples on each assay. RESULTS Our approach recovered multiple known cross-reactivities. After accounting for concurrent exposures to multiple ingredients, we selected 18 compounds (13 parent drugs and 5 metabolites) to evaluate experimentally. We validated 12 of 13 tested assay–ingredient pairs expected to show cross-reactivity by our analysis, discovering previously unknown cross-reactivities affecting assays for amphetamines, buprenorphine, cannabinoids, and methadone. CONCLUSIONS Our findings can help laboratorians and providers interpret presumptive positive UDS results. Our data-driven approach can serve as a model for high-throughput discovery of substances that interfere with laboratory tests.


Sign in / Sign up

Export Citation Format

Share Document