scholarly journals A comparative study of data integration methods, integrating genetic association and functional annotation summary statistics

2020 ◽  
Author(s):  
Jianhui Gao ◽  
Lei Sun

AbstractPower of many genome-wide association studies (GWAS) remains low despite of increasing sample size, because the genetic effects for complex traits are small, the case sample size may not be large, and the variants analyzed may be rare. One direction is to integrate available functional annotation meta-score such as CADD and Eigen to increase power of a GWAS. Here we examine four data-integration methods, including meta-analysis, Fisher’s method, weighted p-value, and stratified FDR control, all based on summary statistics only. We focus on robustness study, considering settings where the functional meta-score mayor may not be informative, or possibly be misleading. In addition to extensive simulation studies, we also apply the four methods to 945 binary outcomes in the UK Biobank data, including all 633 traits with ICD-10 codes, 28 self-reported cancers and 284 self-reported non-cancer diseases, integrating publicly available GWAS summary statistics (http://www.nealelab.is/uk-biobank/) with CADD or Eigen scores. While the trade-off between power and robustness observation is expected, our application shows some but limited utility of current functional meta-score in terms of leading to new genome-wide significant association findings.

Author(s):  
Nasa Sinnott-Armstrong ◽  
Sahin Naqvi ◽  
Manuel Rivas ◽  
Jonathan K Pritchard

SummaryGenome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. However, for most traits it remains difficult to interpret what genes and biological processes are impacted by the top hits. Here, as a contrast, we describe UK Biobank GWAS results for three molecular traits—urate, IGF-1, and testosterone—that are biologically simpler than most diseases, and for which we know a great deal in advance about the core genes and pathways. Unlike most GWAS of complex traits, for all three traits we find that most top hits are readily interpretable. We observe huge enrichment of significant signals near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of variation in each trait, including insights into differences in testosterone regulation between females and males. Meanwhile, in other respects the results are reminiscent of GWAS for more-complex traits. In particular, even these molecular traits are highly polygenic, with most of the variance coming not from core genes, but from thousands to tens of thousands of variants spread across most of the genome. Given that diseases are often impacted by many distinct biological processes, including these three, our results help to illustrate why so many variants can affect risk for any given disease.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Matthew R Robinson ◽  
Maciej Trzaskowski ◽  
Zhi-Xiang Zhu ◽  
...  

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.


2017 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
Albert Tenesa

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.


Author(s):  
Lars G. Fritsche ◽  
Snehal Patil ◽  
Lauren J. Beesley ◽  
Peter VandeHaar ◽  
Maxwell Salvatore ◽  
...  

AbstractTo facilitate scientific collaboration on polygenic risk scores (PRS) research, we created an extensive PRS online repository for 49 common cancer traits integrating freely available genome-wide association studies (GWAS) summary statistics from three sources: published GWAS, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWAS. Our framework condenses these summary statistics into PRS using various approaches such as linkage disequilibrium pruning / p-value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRS in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance, calibration, and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRS. We expect this integrated platform to accelerate PRS-related cancer research.


2019 ◽  
Vol 10 ◽  
Author(s):  
Justin M. Luningham ◽  
Daniel B. McArtor ◽  
Anne M. Hendriks ◽  
Catharina E. M. van Beijsterveldt ◽  
Paul Lichtenstein ◽  
...  

2021 ◽  
Vol 118 (25) ◽  
pp. e2023184118
Author(s):  
Yuchang Wu ◽  
Xiaoyuan Zhong ◽  
Yunong Lin ◽  
Zijie Zhao ◽  
Jiawen Chen ◽  
...  

Marginal effect estimates in genome-wide association studies (GWAS) are mixtures of direct and indirect genetic effects. Existing methods to dissect these effects require family-based, individual-level genetic, and phenotypic data with large samples, which is difficult to obtain in practice. Here, we propose a statistical framework to estimate direct and indirect genetic effects using summary statistics from GWAS conducted on own and offspring phenotypes. Applied to birth weight, our method showed nearly identical results with those obtained using individual-level data. We also decomposed direct and indirect genetic effects of educational attainment (EA), which showed distinct patterns of genetic correlations with 45 complex traits. The known genetic correlations between EA and higher height, lower body mass index, less-active smoking behavior, and better health outcomes were mostly explained by the indirect genetic component of EA. In contrast, the consistently identified genetic correlation of autism spectrum disorder (ASD) with higher EA resides in the direct genetic component. A polygenic transmission disequilibrium test showed a significant overtransmission of the direct component of EA from healthy parents to ASD probands. Taken together, we demonstrate that traditional GWAS approaches, in conjunction with offspring phenotypic data collection in existing cohorts, could greatly benefit studies on genetic nurture and shed important light on the interpretation of genetic associations for human complex traits.


2021 ◽  
Author(s):  
Jicai Jiang

Using summary statistics from genome-wide association studies (GWAS) has been widely used for fine-mapping complex traits in humans. The statistical framework was largely developed for unrelated samples. Though it is possible to apply the framework to fine-mapping with related individuals, extensive modifications are needed. Unfortunately, this has often been ignored in summary-statistics-based fine-mapping with related individuals. In this paper, we show in theory and simulation what modifications are necessary to extend the use of summary statistics to related individuals. The analysis also demonstrates that though existing summary-statistics-based fine-mapping methods can be adapted for related individuals, they appear to have no computational advantage over individual-data-based methods.


2019 ◽  
Author(s):  
Daniel F. Levey ◽  
Joel Gelernter ◽  
Renato Polimanti ◽  
Hang Zhou ◽  
Zhongshan Cheng ◽  
...  

AbstractWe used GWAS in the Million Veteran Program sample (nearly 200,000 informative individuals) using a continuous trait for anxiety (GAD-2) to identify 5 genome-wide significant (GWS) signals for European Americans (EA) and 1 for African Americans. The strongest findings were on chromosome 3 (rs4603973, p=7.40×10−11) near the SATB1 locus, a global regulator of gene expression and on chromosome 6 (rs6557168, p=1.04×10−9) near ESR1 which encodes estrogen receptor α. A locus identified on chromosome 7 near MADIL1 (p=1.62×10−8) has been previously identified in GWAS of bipolar disorder and of schizophrenia and may represent a risk factor for psychiatric disorders broadly. SNP-based heritability was estimated to be ~6% for GAD-2. We also GWASed for self-reported anxiety disorder diagnoses (N=224,330) and identified two GWS loci, one (rs35546597, MAF=0.42, p=1.88×10−8) near the AURKB locus, and the other (rsl0534613, MAF=0.41, p=4.92×10−8) near the IQCHE and MADIL1 locus identified in the GAD-2 analysis. We demonstrate reproducibility by replicating our top findings in the summary statistics from the Anxiety NeuroGenetics Study (ANGST) and a UK Biobank neuroticism GWAS. We also replicated top findings from a large UK Biobank preprint, demonstrating stability of GWAS findings in complex traits once sufficient power is attained. Finally, we found evidence of significant genetic overlap between anxiety and major depression using polygenic risk scores, but also found that the main anxiety signals are independent of those for MDD. This work presents novel insights into the neurobiological risk underpinning anxiety and related psychiatric disorders.SignificanceAnxiety disorders are common and often disabling. They are also frequently co-morbid with other mental disorders such as major depressive disorder (MDD); these disorders may share commonalities in their underlying genetic architecture. Using one of the largest homogenously phenotyped cohorts available, the Million Veteran Program sample, we investigated common variants associated with anxiety in genome-wide association studies (GWASes), using survey results from the GAD-2 anxiety scale (as a continuous trait, n=199,611), and self-reported anxiety disorder diagnosis (as a binary trait, n=224,330). This largest GWAS to date for anxiety and related traits identified numerous novel significant associations, several of which are replicated in other datasets, and allows inference of underlying biology.


Sign in / Sign up

Export Citation Format

Share Document