scholarly journals Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records

2017 ◽  
Author(s):  
Chia-Yen Chen ◽  
Phil H. Lee ◽  
Victor M. Castro ◽  
Jessica Minnier ◽  
Alexander W. Charney ◽  
...  

AbstractBipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally-ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3,330 BD cases and 3,952 controls of European ancestry were used to estimate SNP-based heritability (h2g) and genetic correlation(rg) between EHR-based phenotype definitions and traditionally-ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and 3 rule-based algorithms using codified EHR with decreasing levels of stringency - “coded-strict”, “coded-broad”, and “coded-broad based on a single clinical encounter” (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1,968 coded-strict, 2,581 coded-broad, 408 coded-broad-SV BD cases, and 3,952 controls. The estimated h2g were 0.24 (p=0.015), 0.09 (p=0.064), 0.13 (p=0.003), 0.00 (p=0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h2g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h2g) was 0.12 (p=0.004). These h2g were lower or similar to the h2g observed by the ICCBD+PGCBD (0.23, p=3.17E-80, total N=33,181). However, the rg between ICCBD+PGCBD and the EHR-based cases were high for 95-NLP (0.66, p=3.69x10-5), coded-strict (1.00, p=2.40x10-4), and coded-broad (0.74, p=8.11x10-7). The rg between EHR-based BDs ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.

2020 ◽  
Vol 27 (11) ◽  
pp. 1675-1687
Author(s):  
Neil S Zheng ◽  
QiPing Feng ◽  
V Eric Kerchberger ◽  
Juan Zhao ◽  
Todd L Edwards ◽  
...  

Abstract Objective Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. Materials and Methods PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype’s quantified concepts and uses them to calculate an individual’s probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. Results In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. Conclusions PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.


2013 ◽  
Vol 20 (e2) ◽  
pp. e253-e259 ◽  
Author(s):  
Yukun Chen ◽  
Robert J Carroll ◽  
Eugenia R McPeek Hinz ◽  
Anushi Shah ◽  
Anne E Eyler ◽  
...  

2013 ◽  
Vol 20 (e2) ◽  
pp. e341-e348 ◽  
Author(s):  
Jyotishman Pathak ◽  
Kent R Bailey ◽  
Calvin E Beebe ◽  
Steven Bethard ◽  
David S Carrell ◽  
...  

2018 ◽  
Vol 83 (12) ◽  
pp. 997-1004 ◽  
Author(s):  
Thomas H. McCoy ◽  
Sheng Yu ◽  
Kamber L. Hart ◽  
Victor M. Castro ◽  
Hannah E. Brown ◽  
...  

2018 ◽  
Vol 27 (01) ◽  
pp. 177-183 ◽  
Author(s):  
Christel Daniel ◽  
Dipak Kalra ◽  

Objectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select best papers published in 2017. Method: A bibliographic search using a combination of MeSH descriptors and free terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selection of best papers. Results: Among the 741 returned papers published in 2017 in the various areas of CRI, the full review process selected five best papers. The first best paper reports on the implementation of consent management considering patient preferences for the use of de-identified data of electronic health records for research. The second best paper describes an approach using natural language processing to extract symptoms of severe mental illness from clinical text. The authors of the third best paper describe the challenges and lessons learned when leveraging the EHR4CR platform to support patient inclusion in academic studies in the context of an important collaboration between private industry and public health institutions. The fourth best paper describes a method and an interactive tool for case-crossover analyses of electronic medical records for patient safety. The last best paper proposes a new method for bias reduction in association studies using electronic health records data. Conclusions: Research in the CRI field continues to accelerate and to mature, leading to tools and platforms deployed at national or international scales with encouraging results. Beyond securing these new platforms for exploiting large-scale health data, another major challenge is the limitation of biases related to the use of “real-world” data. Controlling these biases is a prerequisite for the development of learning health systems.


2019 ◽  
Vol 111 (1) ◽  
pp. 110-121 ◽  
Author(s):  
Bianca Vora ◽  
Elizabeth A E Green ◽  
Natalia Khuri ◽  
Frida Ballgren ◽  
Marina Sirota ◽  
...  

ABSTRACT Background Transporter-mediated drug–nutrient interactions have the potential to cause serious adverse events. However, unlike drug–drug interactions, these drug–nutrient interactions receive little attention during drug development. The clinical importance of drug–nutrient interactions was highlighted when a phase III clinical trial was terminated due to severe adverse events resulting from potent inhibition of thiamine transporter 2 (ThTR-2; SLC19A3). Objective In this study, we tested the hypothesis that therapeutic drugs inhibit the intestinal thiamine transporter ThTR-2, which may lead to thiamine deficiency. Methods For this exploration, we took a multifaceted approach, starting with a high-throughput in vitro primary screen to identify inhibitors, building in silico models to characterize inhibitors, and leveraging real-world data from electronic health records to begin to understand the clinical relevance of these inhibitors. Results Our high-throughput screen of 1360 compounds, including many clinically used drugs, identified 146 potential inhibitors at 200 μM. Inhibition kinetics were determined for 28 drugs with half-maximal inhibitory concentration (IC50) values ranging from 1.03 μM to >1 mM. Several oral drugs, including metformin, were predicted to have intestinal concentrations that may result in ThTR-2–mediated drug–nutrient interactions. Complementary analysis using electronic health records suggested that thiamine laboratory values are reduced in individuals receiving prescription drugs found to significantly inhibit ThTR-2, particularly in vulnerable populations (e.g., individuals with alcoholism). Conclusions Our comprehensive analysis of prescription drugs suggests that several marketed drugs inhibit ThTR-2, which may contribute to thiamine deficiency, especially in at-risk populations.


2019 ◽  
Author(s):  
Lauren J. Beesley ◽  
Bhramar Mukherjee

AbstractHealth research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-specific factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting.Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies to address this situation. For all methods proposed, we derive valid standard errors and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative (MGI), a longitudinal EHR-linked biorepository.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (6) ◽  
pp. e1009593
Author(s):  
Neil S. Zheng ◽  
Cosby A. Stone ◽  
Lan Jiang ◽  
Christian M. Shaffer ◽  
V. Eric Kerchberger ◽  
...  

Understanding the contribution of genetic variation to drug response can improve the delivery of precision medicine. However, genome-wide association studies (GWAS) for drug response are uncommon and are often hindered by small sample sizes. We present a high-throughput framework to efficiently identify eligible patients for genetic studies of adverse drug reactions (ADRs) using “drug allergy” labels from electronic health records (EHRs). As a proof-of-concept, we conducted GWAS for ADRs to 14 common drug/drug groups with 81,739 individuals from Vanderbilt University Medical Center’s BioVU DNA Biobank. We identified 7 genetic loci associated with ADRs at P < 5 × 10−8, including known genetic associations such as CYP2D6 and OPRM1 for CYP2D6-metabolized opioid ADR. Additional expression quantitative trait loci and phenome-wide association analyses added evidence to the observed associations. Our high-throughput framework is both scalable and portable, enabling impactful pharmacogenomic research to improve precision medicine.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Chia-Yen Chen ◽  
Phil H. Lee ◽  
Victor M. Castro ◽  
Jessica Minnier ◽  
Alexander W. Charney ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document