scholarly journals Bayesian large-scale multiple regression with summary statistics from genome-wide association studies

2017 ◽  
Vol 11 (3) ◽  
pp. 1561-1592 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens
2016 ◽  
Author(s):  
Xiang Zhu ◽  
Matthew Stephens

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


Author(s):  
Tim B Bigdeli ◽  
Ayman H Fanous ◽  
Yuli Li ◽  
Nallakkandi Rajeevan ◽  
Frederick Sayward ◽  
...  

Abstract Background Schizophrenia (SCZ) and bipolar disorder (BIP) are debilitating neuropsychiatric disorders, collectively affecting 2% of the world’s population. Recognizing the major impact of these psychiatric disorders on the psychosocial function of more than 200 000 US Veterans, the Department of Veterans Affairs (VA) recently completed genotyping of more than 8000 veterans with SCZ and BIP in the Cooperative Studies Program (CSP) #572. Methods We performed genome-wide association studies (GWAS) in CSP #572 and benchmarked the predictive value of polygenic risk scores (PRS) constructed from published findings. We combined our results with available summary statistics from several recent GWAS, realizing the largest and most diverse studies of these disorders to date. Results Our primary GWAS uncovered new associations between CHD7 variants and SCZ, and novel BIP associations with variants in Sortilin Related VPS10 Domain Containing Receptor 3 (SORCS3) and downstream of PCDH11X. Combining our results with published summary statistics for SCZ yielded 39 novel susceptibility loci including CRHR1, and we identified 10 additional findings for BIP (28 326 cases and 90 570 controls). PRS trained on published GWAS were significantly associated with case-control status among European American (P < 10–30) and African American (P < .0005) participants in CSP #572. Conclusions We have demonstrated that published findings for SCZ and BIP are robustly generalizable to a diverse cohort of US veterans. Leveraging available summary statistics from GWAS of global populations, we report 52 new susceptibility loci and improved fine-mapping resolution for dozens of previously reported associations.


2020 ◽  
Vol 117 (21) ◽  
pp. 11608-11613 ◽  
Author(s):  
Marcelo Blatt ◽  
Alexander Gusev ◽  
Yuriy Polyakov ◽  
Shafi Goldwasser

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.


2019 ◽  
Vol 122 (2) ◽  
pp. 121-130 ◽  
Author(s):  
Marie-Joe Dib ◽  
Ruan Elliott ◽  
Kourosh R. Ahmadi

AbstractRapid advances in ‘omics’ technologies have paved the way forward to an era where more ‘precise’ approaches – ‘precision’ nutrition – which leverage data on genetic variability alongside the traditional indices, have been put forth as the state-of-the-art solution to redress the effects of malnutrition across the life course. We purport that this inference is premature and that it is imperative to first review and critique the existing evidence from large-scale epidemiological findings. We set out to provide a critical evaluation of findings from genome-wide association studies (GWAS) in the roadmap to precision nutrition, focusing on GWAS of micronutrient disposition. We found that a large number of loci associated with biomarkers of micronutrient status have been identified. Mean estimates of heritability of micronutrient status ranged between 20 and 35 % for minerals, 56–59 % for water-soluble and 30–70 % for fat-soluble vitamins. With some exceptions, the majority of the identified genetic variants explained little of the overall variance in status for each micronutrient, ranging between 1·3 and 8 % (minerals), <0·1–12 % (water-soluble) and 1·7–2·3 % for (fat-soluble) vitamins. However, GWAS have provided some novel insight into mechanisms that underpin variability in micronutrient status. Our findings highlight obvious gaps that need to be addressed if the full scope of precision nutrition is ever to be realised, including research aimed at (i) dissecting the genetic basis of micronutrient deficiencies or ‘response’ to intake/supplementation (ii) identifying trans-ethnic and ethnic-specific effects (iii) identifying gene–nutrient interactions for the purpose of unravelling molecular ‘behaviour’ in a range of environmental contexts.


Sign in / Sign up

Export Citation Format

Share Document