xQTLImp: efficient and accurate xQTL summary statistics imputation

Mapping Intimacies ◽

10.1101/726182 ◽

2019 ◽

Author(s):

Tao Wang ◽

Quanwei Yin ◽

Yongzhuang Liu ◽

Jin Chen ◽

Yadong Wang ◽

...

Keyword(s):

Gaussian Approximation ◽

Genomic Variation ◽

Supplementary Information ◽

Effective Sample Size ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

Multiple Levels ◽

Trait Locus ◽

Missing Genotypes

AbstractMotivationQuantitative trait locus (QTL) analysis of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), has been widely used to infer the effects of genomic variation on multiple levels of molecular activities. However, the power of xQTL (various types of QTLs) detection is largely limited by missing association statistics due to missing genotypes and limited effective sample size. Existing hidden Markov model (HMM)-based imputation approaches require individual-level genotypes and molecular traits, which are rarely available. No available implementation exists for the imputation of xQTL summary statistics when individual-level data are missed.ResultsWe present xQTLImp, a C++ software package specifically designed for efficient imputation of xQTL summary statistics based on multivariate Gaussian approximation. Experiments on a single-cell eQTL dataset demonstrates that a considerable amount of novel significant eQTL associations can be rediscovered by xQTLImp.AvailabilitySoftware is available at https://github.com/hitbc/[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.

Risk Projection for Time-to-event Outcome Leveraging Summary Statistics With Source Individual-level Data

Journal of the American Statistical Association ◽

10.1080/01621459.2021.1895810 ◽

2021 ◽

pp. 1-34

Author(s):

Jiayin Zheng ◽

Yingye Zheng ◽

Li Hsu

Keyword(s):

Summary Statistics ◽

Time To Event ◽

Individual Level ◽

Level Data ◽

Risk Projection

Equivalence of LD-Score Regression and Individual-Level-Data Methods

10.1101/211821 ◽

2017 ◽

Cited By ~ 8

Author(s):

Ronald de Vlaming ◽

Magnus Johannesson ◽

Patrik K.E. Magnusson ◽

M. Arfan Ikram ◽

Peter M. Visscher

Keyword(s):

Maximum Likelihood ◽

Recent Work ◽

Principal Components ◽

Population Stratification ◽

Summary Statistics ◽

Test Statistics ◽

Ceteris Paribus ◽

Individual Level ◽

Level Data ◽

Genomic Relatedness

AbstractLD-score (LDSC) regression disentangles the contribution of polygenic signal, in terms of SNP-based heritability, and population stratification, in terms of a so-called intercept, to GWAS test statistics. Whereas LDSC regression uses summary statistics, methods like Haseman-Elston (HE) regression and genomic-relatedness-matrix (GRM) restricted maximum likelihood infer parameters such as SNP-based heritability from individual-level data directly. Therefore, these two types of methods are typically considered to be profoundly different. Nevertheless, recent work has revealed that LDSC and HE regression yield near-identical SNP-based heritability estimates when confounding stratification is absent. We now extend the equivalence; under the stratification assumed by LDSC regression, we show that the intercept can be estimated from individual-level data by transforming the coefficients of a regression of the phenotype on the leading principal components from the GRM. Using simulations, considering various degrees and forms of population stratification, we find that intercept estimates obtained from individual-level data are nearly equivalent to estimates from LDSC regression (R2> 99%). An empirical application corroborates these findings. Hence, LDSC regression is not profoundly different from methods using individual-level data; parameters that are identified by LDSC regression are also identified by methods using individual-level data. In addition, our results indicate that, under strong stratification, there is misattribution of stratification to the slope of LDSC regression, inflating estimates of SNP-based heritability from LDSC regression ceteris paribus. Hence, the intercept is not a panacea for population stratification. Consequently, LDSC-regression estimates should be interpreted with caution, especially when the intercept estimate is significantly greater than one.

CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies

10.1101/652263 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yi Yang ◽

Xingjie Shi ◽

Yuling Jiao ◽

Jian Huang ◽

Min Chen ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Gwas Data ◽

Supplementary Information ◽

Summary Statistics ◽

Individual Level ◽

The Relationship

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.

Mendelian randomization while jointly modeling cis genetics identifies causal relationships between gene expression and lipids

Nature Communications ◽

10.1038/s41467-020-18716-x ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Adriaan van der Graaf ◽

◽

Annique Claringbould ◽

Antoine Rimbert ◽

Harm-Jan Westra ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Mendelian Randomization ◽

Low Density Lipoprotein ◽

Density Lipoprotein ◽

Summary Statistics ◽

Low Density Lipoprotein Cholesterol ◽

Causal Inferences ◽

Individual Level ◽

Level Data

Abstract Inference of causality between gene expression and complex traits using Mendelian randomization (MR) is confounded by pleiotropy and linkage disequilibrium (LD) of gene-expression quantitative trait loci (eQTL). Here, we propose an MR method, MR-link, that accounts for unobserved pleiotropy and LD by leveraging information from individual-level data, even when only one eQTL variant is present. In simulations, MR-link shows false-positive rates close to expectation (median 0.05) and high power (up to 0.89), outperforming all other tested MR methods and coloc. Application of MR-link to low-density lipoprotein cholesterol (LDL-C) measurements in 12,449 individuals with expression and protein QTL summary statistics from blood and liver identifies 25 genes causally linked to LDL-C. These include the known SORT1 and ApoE genes as well as PVRL2, located in the APOE locus, for which a causal role in liver was not known. Our results showcase the strength of MR-link for transcriptome-wide causal inferences.

PheWAS-ME: a web-app for interactive exploration of multimorbidity patterns in PheWAS

Bioinformatics ◽

10.1093/bioinformatics/btaa870 ◽

2020 ◽

Author(s):

Nick Strayer ◽

Jana K Shirey-Rice ◽

Yu Shyr ◽

Joshua C Denny ◽

Jill M Pulley ◽

...

Keyword(s):

Genetic Variant ◽

Statistical Tests ◽

R Package ◽

Supplementary Information ◽

Health Records ◽

Individual Level ◽

Level Data ◽

Phenotype Data ◽

Tests Of Association ◽

Web App

Abstract Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene–disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability and implementation A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.

Approximate conditional phenotype analysis based on genome wide association summary statistics

Scientific Reports ◽

10.1038/s41598-021-82000-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Peitao Wu ◽

Biqi Wang ◽

Steven A. Lubitz ◽

Emelia J. Benjamin ◽

James B. Meigs ◽

...

Keyword(s):

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Summary Statistics ◽

Phenotypic Data ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

A Genome ◽

Phenotype Analysis

AbstractBecause single genetic variants may have pleiotropic effects, one trait can be a confounder in a genome-wide association study (GWAS) that aims to identify loci associated with another trait. A typical approach to address this issue is to perform an additional analysis adjusting for the confounder. However, obtaining conditional results can be time-consuming. We propose an approximate conditional phenotype analysis based on GWAS summary statistics, the covariance between outcome and confounder, and the variant minor allele frequency (MAF). GWAS summary statistics and MAF are taken from GWAS meta-analysis results while the traits covariance may be estimated by two strategies: (i) estimates from a subset of the phenotypic data; or (ii) estimates from published studies. We compare our two strategies with estimates using individual level data from the full GWAS sample (gold standard). A simulation study for both binary and continuous traits demonstrates that our approximate approach is accurate. We apply our method to the Framingham Heart Study (FHS) GWAS and to large-scale cardiometabolic GWAS results. We observed a high consistency of genetic effect size estimates between our method and individual level data analysis. Our approach leads to an efficient way to perform approximate conditional analysis using large-scale GWAS summary statistics.

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Nature Communications ◽

10.1038/s41467-019-12653-0 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 34

Author(s):

Luke R. Lloyd-Jones ◽

Jian Zeng ◽

Julia Sidorenko ◽

Loïc Yengo ◽

Gerhard Moser ◽

...

Keyword(s):

Multiple Regression ◽

Association Studies ◽

Meta Analysis ◽

Multiple Regression Model ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

The Uk

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.

Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations

Genetics ◽

10.1534/genetics.118.301109 ◽

2018 ◽

Vol 210 (1) ◽

pp. 53-69 ◽

Cited By ~ 7

Author(s):

Jeremie Vandenplas ◽

Mario P. L. Calus ◽

Gregor Gorjanc

Keyword(s):

Genomic Prediction ◽

Summary Statistics ◽

Multiple Populations ◽

Individual Level ◽

Level Data

CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz880 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2009-2016 ◽

Cited By ~ 6

Author(s):

Yi Yang ◽

Xingjie Shi ◽

Yuling Jiao ◽

Jian Huang ◽

Min Chen ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Gwas Data ◽

Supplementary Information ◽

Summary Statistics ◽

Individual Level ◽

The Relationship

Abstract Motivation Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. Results In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. Availability and implementation The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. Supplementary information Supplementary data are available at Bioinformatics online.

An Approach to Identify New Pleiotropic Genetic Loci From Publicly Available Univariate GWAS Results

Innovation in Aging ◽

10.1093/geroni/igaa057.426 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 130-130

Author(s):

Yury Loika ◽

Alexander Kulminski

Keyword(s):

Complex Traits ◽

Metabolic Networks ◽

Association Studies ◽

Density Lipoprotein ◽

Reproductive Age ◽

Summary Statistics ◽

Omnibus Test ◽

Individual Level ◽

Age Related ◽

Level Data

Abstract The connections between genes and multifactorial polygenic age-related traits are not trivial due to complexity of metabolic networks in an organism, which were primarily adapted to maximize fitness at reproductive age in ancient environments. Given this complexity, pleiotropy in predisposition to complex traits appears to be common phenomenon. Identifying mechanisms of pleiotropic predisposition to multiple age-related traits can be a key factor in developing strategies for extending health-span and lifespan. Correlation between complex traits may be a factor shedding light on these mechanisms. Recently, we used an omnibus test leveraging correlation between multiple age-related traits to gain insights into pleiotropic predisposition to them. The analysis using individual-level data identified large number of new pleiotropic loci and highlighted a novel phenomenon of antagonistic genetic heterogeneity, which was characterized by antagonistic directions of genetic effects for directly correlated traits. Here, we demonstrate feasibility of our approach using summary statistics from univariate genome-wide (GW) association studies (GWAS). Our analysis focused on the results for high density lipoprotein cholesterol (HDL-C) and triglycerides (TG) from the Global Lipids Genetic Consortium, which reported 94 GW significant loci (p≤5×10-8). The traits’ correlation was estimated from the individual level data. Our approach identified 28 loci with pleiotropic predisposition to HDL-C and TG at p≤5×10-8, which did not attain univariate GW significance with either of these traits. Fifteen of them (53%) demonstrated antagonistic heterogeneity. These results show that our approach can be efficiently used in the analysis of summary statistics from published studies to identify novel pleiotropic loci.