scholarly journals The distribution of common-variant effect sizes

2020 ◽  
Author(s):  
Luke Jen O’Connor

AbstractThe genetic effect-size distribution describes the number of variants that affect disease risk and the range of their effect sizes. Accurate estimates of this distribution would provide insights into genetic architecture and set sample-size targets for future genome-wide association studies. We developed Fourier Mixture Regression (FMR) to estimate common-variant effect-size distributions from GWAS summary statistics. We validated FMR in simulations and in analyses of UK Biobank data, using interim-release summary statistics (max N=145k) to predict the results of the full release (N=460k). Analyzing summary statistics for 10 diseases (avg Neff=169k) and 22 other traits, we estimated the sample size required for genome-wide significant SNPs to explain 50% of SNP-heritability. For most diseases the requisite number of cases is 100k-1M, an attainable number; ten times more would be required to explain 90% of heritability. In well-powered GWAS, genome-wide significance is a conservative threshold, and loci at less stringent thresholds have true positive rates that remain close to 1 if confounding is controlled. Analyzing the shape of the effect-size distribution, we estimate that heritability accumulates across many thousands of SNPs with a wide range of effect sizes: the largest effects (at the 90th percentile of heritability) are 100 times larger than the smallest (10th percentile), and while the midpoint of this range varies across traits, its size is similar. These results suggest attainable sample size targets for future GWAS, and they underscore the complexity of genetic architecture.

2016 ◽  
Author(s):  
Daniel S. Quintana

AbstractThe calculation of heart rate variability (HRV) is a popular tool used to investigate differences in cardiac autonomic control between population samples. When interpreting effect sizes to quantify the magnitude of group differences, researchers typically use Cohen's guidelines of small (0.2), medium (0.5), and large (0.8) effects. However, these guidelines were only proposed for use when the effect size distribution (ESD) was unknown. Despite the availability of effect sizes from hundreds of HRV studies, researchers still largely rely on Cohen's guidelines to interpret effect sizes. This article describes an ESD analysis of 297 HRV effect sizes from case-control studies, revealing that the 25th, 50th, and 75th effect size percentiles correspond with effect sizes of 0.25, 0.5, and 0.84, respectively. The ESD for separate clinical groups are also presented. The data suggests that Cohen's guidelines underestimate the magnitude of small and large effect sizes for the body of HRV case-control research. Therefore, to better reflect observed HRV effect sizes, the data suggest that effect sizes of 0.25, 0.5, and 0.85 should be interpreted as small, medium, and large effects. Researchers are encouraged to use the ESD dataset or their own collected datasets in tandem with the provided analysis script to perform bespoke ESD analyses relevant to their specific research area.


2010 ◽  
Vol 42 (7) ◽  
pp. 570-575 ◽  
Author(s):  
Ju-Hyun Park ◽  
Sholom Wacholder ◽  
Mitchell H Gail ◽  
Ulrike Peters ◽  
Kevin B Jacobs ◽  
...  

Author(s):  
Junji Morisawa ◽  
Takahiro Otani ◽  
Jo Nishino ◽  
Ryo Emoto ◽  
Kunihiko Takahashi ◽  
...  

AbstractBayes factor analysis has the attractive property of accommodating the risks of both false negatives and false positives when identifying susceptibility gene variants in genome-wide association studies (GWASs). For a particular SNP, the critical aspect of this analysis is that it incorporates the probability of obtaining the observed value of a statistic on disease association under the alternative hypotheses of non-null association. An approximate Bayes factor (ABF) was proposed by Wakefield (Genetic Epidemiology 2009;33:79–86) based on a normal prior for the underlying effect-size distribution. However, misspecification of the prior can lead to failure in incorporating the probability under the alternative hypothesis. In this paper, we propose a semi-parametric, empirical Bayes factor (SP-EBF) based on a nonparametric effect-size distribution estimated from the data. Analysis of several GWAS datasets revealed the presence of substantial numbers of SNPs with small effect sizes, and the SP-EBF attributed much greater significance to such SNPs than the ABF. Overall, the SP-EBF incorporates an effect-size distribution that is estimated from the data, and it has the potential to improve the accuracy of Bayes factor analysis in GWASs.


2016 ◽  
Vol 283 (1828) ◽  
pp. 20153065 ◽  
Author(s):  
Emily L. Dittmar ◽  
Christopher G. Oakley ◽  
Jeffrey K. Conner ◽  
Billie A. Gould ◽  
Douglas W. Schemske

The distribution of effect sizes of adaptive substitutions has been central to evolutionary biology since the modern synthesis. Early theory proposed that because large-effect mutations have negative pleiotropic consequences, only small-effect mutations contribute to adaptation. More recent theory suggested instead that large-effect mutations could be favoured when populations are far from their adaptive peak. Here we suggest that the distributions of effect sizes are expected to differ among study systems, reflecting the wide variation in evolutionary forces and ecological conditions experienced in nature. These include selection, mutation, genetic drift, gene flow, and other factors such as the degree of pleiotropy, the distance to the phenotypic optimum, whether the optimum is stable or moving, and whether new mutation or standing genetic variation provides the source of adaptive alleles. Our goal is to review how these factors might affect the distribution of effect sizes and to identify new research directions. Until more theory and empirical work is available, we feel that it is premature to make broad generalizations about the effect size distribution of adaptive substitutions important in nature.


2019 ◽  
Author(s):  
Alexey A. Shadrin ◽  
Oleksandr Frei ◽  
Olav B. Smeland ◽  
Francesco Bettella ◽  
Kevin S. O’Connell ◽  
...  

AbstractDetermining the contribution of functional genetic categories is fundamental to understanding the genetic etiology of complex human traits and diseases. Here we present Annotation Informed MiXeR: a likelihood-based method to estimate the number of variants influencing a phenotype and their effect sizes across different functional annotation categories of the genome using summary statistics from genome-wide association studies. Applying the model to 11 complex phenotypes suggests diverse patterns of functional category-specific genetic architectures across human diseases and traits.


2015 ◽  
Vol 135 (2) ◽  
pp. 171-184 ◽  
Author(s):  
Lei Zhang ◽  
Yue-Ping Shen ◽  
Wen-Zhu Hu ◽  
Shu Ran ◽  
Yong Lin ◽  
...  

2015 ◽  
Author(s):  
Dominic Holland ◽  
Yunpeng Wang ◽  
Wesley K Thompson ◽  
Andrew Schork ◽  
Chi-Hua Chen ◽  
...  

Genome-wide Association Studies (GWAS) result in millions of summary statistics (``z-scores'') for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities that does not require raw genotype data, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype -- the proportion of SNPs (after uniform pruning, so that large LD blocks are not over-represented) likely to be in strong LD with causal/mechanistically associated SNPs -- and predicting the proportion of chip heritability explainable by genome wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N=82,315) and additionally, for purposes of illustration, putamen volume (N=12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We estimate the degree to which effect sizes are over-estimated when based on linear regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.


Sign in / Sign up

Export Citation Format

Share Document