scholarly journals The Emerging Landscape of Epidemiological Research Based on Biobanks Linked to Electronic Health Records: Existing Resources, Analytic Challenges and Potential Opportunities

Author(s):  
Lauren Beesley ◽  
Maxwell Salvatore ◽  
Lars Fritsche ◽  
Anita Pandit ◽  
Arvind Rao ◽  
...  

Biobanks linked to electronic health records provide a rich data resource for health-related research. With the establishment of large-scale infrastructure, the availability and utility of data from biobanks has dramatically increased over time. As more researchers become interested in using biobank data to explore a diverse spectrum of scientific questions, resources guiding the data access, design, and analysis of biobank-based studies will be crucial.  The first aim of this review is to characterize the types of biobanks that are discussed in the recent literature and provide detailed descriptions of specific biobanks including their location, size, data access, data linkages and more. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, new discoveries, and hypothesis-generating studies of disease-treatment, disease-exposure and disease-gene associations. Rather than spending time and money designing and implementing a single study with pre-defined objectives, researchers can use biobanks’ existing data-rich resources to answer scientific questions as quickly as they can analyze them. While the data are becoming increasingly available, additional thought is needed to address issues related to the design of such studies and analysis of these data. In the second aim of this review, we discuss statistical issues related to biobank research in general including study design, sampling strategy, phenotype identification, and missing data. These issues are illustrated using data from the Michigan Genomics Initiative, UK Biobank, and Genes for Good. We summarize the current body of statistical literature aimed at addressing some of these challenges and discuss some of the standing open problems in this area. This work serves to complement and extend recent reviews about biobank-based research and aims to provide a resource catalog with statistical and practical guidance to researchers pursuing biobank-based research.

2020 ◽  
Vol 17 (4) ◽  
pp. 370-376
Author(s):  
Benjamin A Goldstein

Electronic health records data are becoming a key data resource in clinical research. Owing to issues of data efficiency, electronic health records data are being used for clinical trials. This includes both large-scale pragmatic trails and smaller—more focused—point-of-care trials. While electronic health records data open up a number of scientific opportunities, they also present a number of analytic challenges. This article discusses five particular challenges related to organizing electronic health records data for analytic purposes. These are as follows: (1) data are not organized for research purposes, (2) data are both densely and irregularly observed, (3) we don’t have all data elements we may want or need, (4) data are both cross-sectional and longitudinal, and (5) data may be informatively observed. While laying out these challenges, the article notes how many of these challenges can be addressed by careful and thoughtful study design as well as by integration of clinicians and informaticians into the analytic team.


2020 ◽  
pp. 0272989X2095440
Author(s):  
Glen B. Taksler ◽  
Jarrod E. Dalton ◽  
Adam T. Perzynski ◽  
Michael B. Rothberg ◽  
Alex Milinovich ◽  
...  

Electronic health records (EHRs) offer the potential to study large numbers of patients but are designed for clinical practice, not research. Despite the increasing availability of EHR data, their use in research comes with its own set of challenges. In this article, we describe some important considerations and potential solutions for commonly encountered problems when working with large-scale, EHR-derived data for health services and community-relevant health research. Specifically, using EHR data requires the researcher to define the relevant patient subpopulation, reliably identify the primary care provider, recognize the EHR as containing episodic (i.e., unstructured longitudinal) data, account for changes in health system composition and treatment options over time, understand that the EHR is not always well-organized and accurate, design methods to identify the same patient across multiple health systems, account for the enormous size of the EHR, and consider barriers to data access. Associations found in the EHR may be nonrepresentative of associations in the general population, but a clear understanding of the EHR-based associations can be enormously valuable to the process of improving outcomes for patients in learning health care systems. In the context of building 2 large-scale EHR-derived data sets for health services research, we describe the potential pitfalls of EHR data and propose some solutions for those planning to use EHR data in their research. As ever greater amounts of clinical data are amassed in the EHR, use of these data for research will become increasingly common and important. Attention to the intricacies of EHR data will allow for more informed analysis and interpretation of results from EHR-based data sets.


2014 ◽  
Vol 53 (04) ◽  
pp. 264-268 ◽  
Author(s):  
R. Bache ◽  
M. McGilchrist ◽  
C. Daniel ◽  
M. Dugas ◽  
F. Fritz ◽  
...  

SummaryBackground: Pharmaceutical clinical trials are primarily conducted across many countries, yet recruitment numbers are frequently not met in time. Electronic health records store large amounts of potentially useful data that could aid in this process. The EHR4CR project aims at re-using EHR data for clinical research purposes.Objective: To evaluate whether the protocol feasibility platform produced by the Electronic Health Records for Clinical Research (EHR4CR) project can be installed and set up in accordance with local technical and governance requirements to execute protocol feasibility queries uniformly across national borders.Methods: We installed specifically engineered software and warehouses at local sites. Approvals for data access and usage of the platform were acquired and terminology mapping of local site codes to central platform codes were performed. A test data set, or real EHR data where approvals were in place, were loaded into data warehouses. Test feasibility queries were created on a central component of the platform and sent to the local components at eleven university hospitals.Results: To use real, de-identified EHR data we obtained permissions and approvals from ‘data controllers‘ and ethics committees. Through the platform we were able to create feasibility queries, distribute them to eleven university hospitals and retrieve aggregated patient counts of both test data and de-identified EHR data.Conclusion: It is possible to install a uniform piece of software in different university hospitals in five European countries and configure it to the requirements of the local networks, while complying with local data protection regulations. We were also able set up ETL processes and data warehouses, to reuse EHR data for feasibility queries distributed over the EHR4CR platform.


Author(s):  
Milica Milutinovic ◽  
Bart De Decker

Electronic Health Records (EHRs) are becoming the ubiquitous technology for managing patients' records in many countries. They allow for easier transfer and analysis of patient data on a large scale. However, privacy concerns linked to this technology are emerging. Namely, patients rarely fully understand how EHRs are managed. Additionally, the records are not necessarily stored within the organization where the patient is receiving her healthcare. This service may be delegated to a remote provider, and it is not always clear which health-provisioning entities have access to this data. Therefore, in this chapter the authors propose an alternative where users can keep and manage their records in their existing eHealth systems. The approach is user-centric and enables the patients to have better control over their data while still allowing for special measures to be taken in case of emergency situations with the goal of providing the required care to the patient.


2018 ◽  
Vol 26 (3) ◽  
pp. 219-227 ◽  
Author(s):  
Nathaniel D Mercaldo ◽  
Kyle B Brothers ◽  
David S Carrell ◽  
Ellen W Clayton ◽  
John J Connolly ◽  
...  

Abstract Objective We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several socio-demographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys. Discussion We observed substantial enrichment from rarer subpopulations. The magnitude of the enrichment depends on the accuracy of the variables that define the sampling strata and the overall response rate. Conclusion EHR and USC data may be used to define sampling strata that in turn may be used to enrich the final study sample. This design may be of particular interest for studies of rarer and understudied populations.


2021 ◽  
Author(s):  
Sergiusz Wesolowski ◽  
Gordon Howard Lemmon ◽  
Edgar J Hernandez ◽  
Alex Ryan Henrie ◽  
Thomas A Miller ◽  
...  

Understanding the conditionally-dependent clinical variables that drive cardiovascular health outcomes is a major challenge for precision medicine. Here, we deploy a recently developed massively scalable comorbidity discovery method called Poisson Binomial based Comorbidity discovery (PBC), to analyze Electronic Health Records (EHRs) from the University of Utah and Primary Children's Hospital (over 1.6 million patients and 77 million visits) for comorbid diagnoses, procedures, and medications. Using explainable Artificial Intelligence (AI) methodologies, we then tease apart the intertwined, conditionally-dependent impacts of comorbid conditions and demography upon cardiovascular health, focusing on the key areas of heart transplant, sinoatrial node dysfunction and various forms of congenital heart disease. The resulting multimorbidity networks make possible wide-ranging explorations of the comorbid and demographic landscapes surrounding these cardiovascular outcomes, and can be distributed as web-based tools for further community-based outcomes research. The ability to transform enormous collections of EHRs into compact, portable tools devoid of Protected Health Information solves many of the legal, technological, and data-scientific challenges associated with large-scale EHR analyzes.


2017 ◽  
pp. 528-542
Author(s):  
Milica Milutinovic ◽  
Bart De Decker

Electronic Health Records (EHRs) are becoming the ubiquitous technology for managing patients' records in many countries. They allow for easier transfer and analysis of patient data on a large scale. However, privacy concerns linked to this technology are emerging. Namely, patients rarely fully understand how EHRs are managed. Additionally, the records are not necessarily stored within the organization where the patient is receiving her healthcare. This service may be delegated to a remote provider, and it is not always clear which health-provisioning entities have access to this data. Therefore, in this chapter the authors propose an alternative where users can keep and manage their records in their existing eHealth systems. The approach is user-centric and enables the patients to have better control over their data while still allowing for special measures to be taken in case of emergency situations with the goal of providing the required care to the patient.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Laila Rasmy ◽  
Yang Xiang ◽  
Ziqian Xie ◽  
Cui Tao ◽  
Degui Zhi

AbstractDeep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21–6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.


Sign in / Sign up

Export Citation Format

Share Document