scholarly journals PheWAS-ME: a web-app for interactive exploration of multimorbidity patterns in PheWAS

Author(s):  
Nick Strayer ◽  
Jana K Shirey-Rice ◽  
Yu Shyr ◽  
Joshua C Denny ◽  
Jill M Pulley ◽  
...  

Abstract Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene–disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability and implementation A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Nick Strayer ◽  
Jana K Shirey-Rice ◽  
Yu Shyr ◽  
Joshua C. Denny ◽  
Jill M. Pulley ◽  
...  

AbstractSummaryElectronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities to use big data for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for high-throughput evaluation of relationships between a set of genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between pairs of a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene-disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. Here, we present a web application to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results in an interactive dashboard, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data.AvailabilityA demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. A sample simulated-dataset is provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer).


2017 ◽  
Author(s):  
Josine Min ◽  
Gibran Hemani ◽  
George Davey Smith ◽  
Caroline Relton ◽  
Matthew Suderman

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.


2021 ◽  
Author(s):  
IS Arriaga-MacKenzie ◽  
G Matesi ◽  
S Chen ◽  
A Ronco ◽  
KM Marker ◽  
...  

AbstractPublicly available genetic summary data have high utility in research and the clinic including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. While several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies from summary data. Using continental reference ancestry, African (AFR), Non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v2.1 exome and genome groups and subgroups finding heterogeneous continental ancestry for several groups including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix’s ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.


2019 ◽  
Author(s):  
Tao Wang ◽  
Quanwei Yin ◽  
Yongzhuang Liu ◽  
Jin Chen ◽  
Yadong Wang ◽  
...  

AbstractMotivationQuantitative trait locus (QTL) analysis of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), has been widely used to infer the effects of genomic variation on multiple levels of molecular activities. However, the power of xQTL (various types of QTLs) detection is largely limited by missing association statistics due to missing genotypes and limited effective sample size. Existing hidden Markov model (HMM)-based imputation approaches require individual-level genotypes and molecular traits, which are rarely available. No available implementation exists for the imputation of xQTL summary statistics when individual-level data are missed.ResultsWe present xQTLImp, a C++ software package specifically designed for efficient imputation of xQTL summary statistics based on multivariate Gaussian approximation. Experiments on a single-cell eQTL dataset demonstrates that a considerable amount of novel significant eQTL associations can be rediscovered by xQTLImp.AvailabilitySoftware is available at https://github.com/hitbc/[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4283-4290
Author(s):  
Jiajing Xie ◽  
Yang Xu ◽  
Haifeng Chen ◽  
Meirong Chi ◽  
Jun He ◽  
...  

Abstract Motivation For some specific tissues, such as the heart and brain, normal controls are difficult to obtain. Thus, studies with only a particular type of disease samples (one phenotype) cannot be analyzed using common methods, such as significance analysis of microarrays, edgeR and limma. The RankComp algorithm, which was mainly developed to identify individual-level differentially expressed genes (DEGs), can be applied to identify population-level DEGs for the one-phenotype data but cannot identify the dysregulation directions of DEGs. Results Here, we optimized the RankComp algorithm, termed PhenoComp. Compared with RankComp, PhenoComp provided the dysregulation directions of DEGs and had more robust detection power in both simulated and real one-phenotype data. Moreover, using the DEGs detected by common methods as the ‘gold standard’, the results showed that the DEGs detected by PhenoComp using only one-phenotype data were comparable to those identified by common methods using case-control samples, independent of the measurement platform. PhenoComp also exhibited good performance for weakly differential expression signal data. Availability and implementation The PhenoComp algorithm is available on the web at https://github.com/XJJ-student/PhenoComp. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Amy M Mason ◽  
Stephen Burgess

Motivation Mendelian randomisation methods that estimate non-linear exposure-outcome relationships typically require individual-level data. This package implements non-linear Mendelian randomisation methods using stratified summarised data, facilitating analyses where individual-level data cannot easily be shared, and additionally increasing reproducibility as summarised data can be reported. Dependence on summarised data means the methods are independent of the form of the individual-level data, increasing flexibility to different outcome types (such as continuous, binary, or time-to-event outcomes). Implementation SUMnlmr is available as an R package (version 3.1.0 or higher). General features The package implements the previously proposed fractional polynomial and piecewise linear methods on stratified summarised data that can either be estimated from individual-level data using the package or supplied by a collaborator. It constructs plots to visualise the estimated exposure-outcome relationship, and provides statistics to assess preference for a non-linear model over a linear model. Availability The package is freely available from GitHub [ https://github.com/amymariemason/SUMnlmr].


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ruowang Li ◽  
Rui Duan ◽  
Xinyuan Zhang ◽  
Thomas Lumley ◽  
Sarah Pendergrass ◽  
...  

AbstractIncreasingly, clinical phenotypes with matched genetic data from bio-bank linked electronic health records (EHRs) have been used for pleiotropy analyses. Thus far, pleiotropy analysis using individual-level EHR data has been limited to data from one site. However, it is desirable to integrate EHR data from multiple sites to improve the detection power and generalizability of the results. Due to privacy concerns, individual-level patients’ data are not easily shared across institutions. As a result, we introduce Sum-Share, a method designed to efficiently integrate EHR and genetic data from multiple sites to perform pleiotropy analysis. Sum-Share requires only summary-level data and one round of communication from each site, yet it produces identical test statistics compared with that of pooled individual-level data. Consequently, Sum-Share can achieve lossless integration of multiple datasets. Using real EHR data from eMERGE, Sum-Share is able to identify 1734 potential pleiotropic SNPs for five cardiovascular diseases.


2017 ◽  
Vol 48 (2) ◽  
pp. 296-325 ◽  
Author(s):  
André Klima ◽  
Thomas Schlesinger ◽  
Paul W. Thurner ◽  
Helmut Küchenhoff

Our objective is the estimation of voter transitions between two consecutive parliamentary elections. Usually, such analyses have been based either on individual survey data or on aggregated data. To move beyond these methods and their respective problems, we propose the application of so-called hybrid models, which combine aggregate and individual data. We use a Bayesian approach and extend a multinomial-Dirichlet model proposed in the ecological inference literature. Our new hybrid model has been implemented in the R-package eiwild (= Ecological Inference with individual-level data). Based on extensive simulations, we are able to show that our new estimator exhibits a very good estimation performance in many realistic scenarios. Application case is the voter transition between the Bavarian Regional election and the German federal elections 2013 in the Metropolitan City of Munich. Our approach is also applicable to other areas of electoral research, market research, and epidemiology.


2020 ◽  
Vol 36 (9) ◽  
pp. 2856-2861
Author(s):  
Gabriel E Hoffman ◽  
Jaroslav Bendl ◽  
Kiran Girdhar ◽  
Panos Roussos

Abstract Motivation Identifying correlated epigenetic features and finding differences in correlation between individuals with disease compared to controls can give novel insight into disease biology. This framework has been successful in analysis of gene expression data, but application to epigenetic data has been limited by the computational cost, lack of scalable software and lack of robust statistical tests. Results Decorate, differential epigenetic correlation test, identifies correlated epigenetic features and finds clusters of features that are differentially correlated between two or more subsets of the data. The software scales to genome-wide datasets of epigenetic assays on hundreds of individuals. We apply decorate to four large-scale datasets of DNA methylation, ATAC-seq and histone modification ChIP-seq. Availability and implementation decorate R package is available from https://github.com/GabrielHoffman/decorate. Supplementary information Supplementary data are available at Bioinformatics online.


2022 ◽  
Author(s):  
Laurence Howe ◽  
Humaira Rasheed ◽  
Paul R Jones ◽  
Dorret I Boomsma ◽  
David M Evans ◽  
...  

Previous Mendelian randomization (MR) studies using population samples (population-MR) have provided evidence for beneficial effects of educational attainment on health outcomes in adulthood. However, estimates from these studies may have been susceptible to bias from population stratification, assortative mating and indirect genetic effects due to unadjusted parental genotypes. Mendelian randomization using genetic association estimates derived from within-sibship models (within-sibship MR) can avoid these potential biases because genetic differences between siblings are due to random segregation at meiosis. Applying both population and within-sibship MR, we estimated the effects of genetic liability to educational attainment on body mass index (BMI), cigarette smoking, systolic blood pressure (SBP) and all-cause mortality. MR analyses used individual-level data on 72,932 siblings from UK Biobank and the Norwegian HUNT study and summary-level data from a within-sibship Genome-wide Association Study including over 140,000 individuals. Both population and within-sibship MR estimates provided evidence that educational attainment influences BMI, cigarette smoking and SBP. Genetic variant-outcome associations attenuated in the within-sibship model, but genetic variant-educational attainment associations also attenuated to a similar extent. Thus, within-sibship and population MR estimates were largely consistent. The within-sibship MR estimate of education on mortality was imprecise but consistent with a putative effect. These results provide evidence of beneficial individual-level effects of education (or liability to education) on adulthood health, independent of potential demographic and family-level confounders.


Sign in / Sign up

Export Citation Format

Share Document