Risk Projection for Time-to-event Outcome Leveraging Summary Statistics With Source Individual-level Data

AbstractLD-score (LDSC) regression disentangles the contribution of polygenic signal, in terms of SNP-based heritability, and population stratification, in terms of a so-called intercept, to GWAS test statistics. Whereas LDSC regression uses summary statistics, methods like Haseman-Elston (HE) regression and genomic-relatedness-matrix (GRM) restricted maximum likelihood infer parameters such as SNP-based heritability from individual-level data directly. Therefore, these two types of methods are typically considered to be profoundly different. Nevertheless, recent work has revealed that LDSC and HE regression yield near-identical SNP-based heritability estimates when confounding stratification is absent. We now extend the equivalence; under the stratification assumed by LDSC regression, we show that the intercept can be estimated from individual-level data by transforming the coefficients of a regression of the phenotype on the leading principal components from the GRM. Using simulations, considering various degrees and forms of population stratification, we find that intercept estimates obtained from individual-level data are nearly equivalent to estimates from LDSC regression (R2> 99%). An empirical application corroborates these findings. Hence, LDSC regression is not profoundly different from methods using individual-level data; parameters that are identified by LDSC regression are also identified by methods using individual-level data. In addition, our results indicate that, under strong stratification, there is misattribution of stratification to the slope of LDSC regression, inflating estimates of SNP-based heritability from LDSC regression ceteris paribus. Hence, the intercept is not a panacea for population stratification. Consequently, LDSC-regression estimates should be interpreted with caution, especially when the intercept estimate is significantly greater than one.

Download Full-text

ISSUES OF CONVENTIONAL META-ANALYSIS FOR TIME-TO-EVENT OUTCOMES: REANALYSIS RESULTS OF THE EFFICACY AND SAFETY OF NOVEL ORAL ANTICOAGULANTS VERSUS WARFARIN IN PATIENTS WITH ATRIAL FIBRILLATION USING RECONSTRUCTED INDIVIDUAL-LEVEL DATA

Journal of the American College of Cardiology ◽

10.1016/s0735-1097(16)31950-7 ◽

2016 ◽

Vol 67 (13) ◽

pp. 1949

Author(s):

Masayuki Kaneko ◽

Hajime Uno

Keyword(s):

Atrial Fibrillation ◽

Meta Analysis ◽

Oral Anticoagulants ◽

Novel Oral Anticoagulants ◽

Efficacy And Safety ◽

Time To Event ◽

Individual Level ◽

Level Data

Download Full-text

Mendelian randomization while jointly modeling cis genetics identifies causal relationships between gene expression and lipids

Nature Communications ◽

10.1038/s41467-020-18716-x ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Adriaan van der Graaf ◽

◽

Annique Claringbould ◽

Antoine Rimbert ◽

Harm-Jan Westra ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Mendelian Randomization ◽

Low Density Lipoprotein ◽

Density Lipoprotein ◽

Summary Statistics ◽

Low Density Lipoprotein Cholesterol ◽

Causal Inferences ◽

Individual Level ◽

Level Data

Abstract Inference of causality between gene expression and complex traits using Mendelian randomization (MR) is confounded by pleiotropy and linkage disequilibrium (LD) of gene-expression quantitative trait loci (eQTL). Here, we propose an MR method, MR-link, that accounts for unobserved pleiotropy and LD by leveraging information from individual-level data, even when only one eQTL variant is present. In simulations, MR-link shows false-positive rates close to expectation (median 0.05) and high power (up to 0.89), outperforming all other tested MR methods and coloc. Application of MR-link to low-density lipoprotein cholesterol (LDL-C) measurements in 12,449 individuals with expression and protein QTL summary statistics from blood and liver identifies 25 genes causally linked to LDL-C. These include the known SORT1 and ApoE genes as well as PVRL2, located in the APOE locus, for which a causal role in liver was not known. Our results showcase the strength of MR-link for transcriptome-wide causal inferences.

Download Full-text

Approximate conditional phenotype analysis based on genome wide association summary statistics

Scientific Reports ◽

10.1038/s41598-021-82000-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Peitao Wu ◽

Biqi Wang ◽

Steven A. Lubitz ◽

Emelia J. Benjamin ◽

James B. Meigs ◽

...

Keyword(s):

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Summary Statistics ◽

Phenotypic Data ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

A Genome ◽

Phenotype Analysis

AbstractBecause single genetic variants may have pleiotropic effects, one trait can be a confounder in a genome-wide association study (GWAS) that aims to identify loci associated with another trait. A typical approach to address this issue is to perform an additional analysis adjusting for the confounder. However, obtaining conditional results can be time-consuming. We propose an approximate conditional phenotype analysis based on GWAS summary statistics, the covariance between outcome and confounder, and the variant minor allele frequency (MAF). GWAS summary statistics and MAF are taken from GWAS meta-analysis results while the traits covariance may be estimated by two strategies: (i) estimates from a subset of the phenotypic data; or (ii) estimates from published studies. We compare our two strategies with estimates using individual level data from the full GWAS sample (gold standard). A simulation study for both binary and continuous traits demonstrates that our approximate approach is accurate. We apply our method to the Framingham Heart Study (FHS) GWAS and to large-scale cardiometabolic GWAS results. We observed a high consistency of genetic effect size estimates between our method and individual level data analysis. Our approach leads to an efficient way to perform approximate conditional analysis using large-scale GWAS summary statistics.

Download Full-text

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Nature Communications ◽

10.1038/s41467-019-12653-0 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 34

Author(s):

Luke R. Lloyd-Jones ◽

Jian Zeng ◽

Julia Sidorenko ◽

Loïc Yengo ◽

Gerhard Moser ◽

...

Keyword(s):

Multiple Regression ◽

Association Studies ◽

Meta Analysis ◽

Multiple Regression Model ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

The Uk

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.

Download Full-text

xQTLImp: efficient and accurate xQTL summary statistics imputation

10.1101/726182 ◽

2019 ◽

Author(s):

Tao Wang ◽

Quanwei Yin ◽

Yongzhuang Liu ◽

Jin Chen ◽

Yadong Wang ◽

...

Keyword(s):

Gaussian Approximation ◽

Genomic Variation ◽

Supplementary Information ◽

Effective Sample Size ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

Multiple Levels ◽

Trait Locus ◽

Missing Genotypes

AbstractMotivationQuantitative trait locus (QTL) analysis of multiomic molecular traits, such as gene transcription (eQTL), DNA methylation (mQTL) and histone modification (haQTL), has been widely used to infer the effects of genomic variation on multiple levels of molecular activities. However, the power of xQTL (various types of QTLs) detection is largely limited by missing association statistics due to missing genotypes and limited effective sample size. Existing hidden Markov model (HMM)-based imputation approaches require individual-level genotypes and molecular traits, which are rarely available. No available implementation exists for the imputation of xQTL summary statistics when individual-level data are missed.ResultsWe present xQTLImp, a C++ software package specifically designed for efficient imputation of xQTL summary statistics based on multivariate Gaussian approximation. Experiments on a single-cell eQTL dataset demonstrates that a considerable amount of novel significant eQTL associations can be rediscovered by xQTLImp.AvailabilitySoftware is available at https://github.com/hitbc/[email protected] or [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations

Genetics ◽

10.1534/genetics.118.301109 ◽

2018 ◽

Vol 210 (1) ◽

pp. 53-69 ◽

Cited By ~ 7

Author(s):

Jeremie Vandenplas ◽

Mario P. L. Calus ◽

Gregor Gorjanc

Keyword(s):

Genomic Prediction ◽

Summary Statistics ◽

Multiple Populations ◽

Individual Level ◽

Level Data

Download Full-text

An Approach to Identify New Pleiotropic Genetic Loci From Publicly Available Univariate GWAS Results

Innovation in Aging ◽

10.1093/geroni/igaa057.426 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 130-130

Author(s):

Yury Loika ◽

Alexander Kulminski

Keyword(s):

Complex Traits ◽

Metabolic Networks ◽

Association Studies ◽

Density Lipoprotein ◽

Reproductive Age ◽

Summary Statistics ◽

Omnibus Test ◽

Individual Level ◽

Age Related ◽

Level Data

Abstract The connections between genes and multifactorial polygenic age-related traits are not trivial due to complexity of metabolic networks in an organism, which were primarily adapted to maximize fitness at reproductive age in ancient environments. Given this complexity, pleiotropy in predisposition to complex traits appears to be common phenomenon. Identifying mechanisms of pleiotropic predisposition to multiple age-related traits can be a key factor in developing strategies for extending health-span and lifespan. Correlation between complex traits may be a factor shedding light on these mechanisms. Recently, we used an omnibus test leveraging correlation between multiple age-related traits to gain insights into pleiotropic predisposition to them. The analysis using individual-level data identified large number of new pleiotropic loci and highlighted a novel phenomenon of antagonistic genetic heterogeneity, which was characterized by antagonistic directions of genetic effects for directly correlated traits. Here, we demonstrate feasibility of our approach using summary statistics from univariate genome-wide (GW) association studies (GWAS). Our analysis focused on the results for high density lipoprotein cholesterol (HDL-C) and triglycerides (TG) from the Global Lipids Genetic Consortium, which reported 94 GW significant loci (p≤5×10-8). The traits’ correlation was estimated from the individual level data. Our approach identified 28 loci with pleiotropic predisposition to HDL-C and TG at p≤5×10-8, which did not attain univariate GW significance with either of these traits. Fifteen of them (53%) demonstrated antagonistic heterogeneity. These results show that our approach can be efficiently used in the analysis of summary statistics from published studies to identify novel pleiotropic loci.

Download Full-text

Bayesian large-scale multiple regression with summary statistics from genome-wide association studies

10.1101/042457 ◽

2016 ◽

Cited By ~ 5

Author(s):

Xiang Zhu ◽

Matthew Stephens

Keyword(s):

Multiple Regression ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

Wide Range

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.

Download Full-text