A new method for estimating effect size distribution and heritability from genome-wide association summary results

2015 ◽  
Vol 135 (2) ◽  
pp. 171-184 ◽  
Author(s):  
Lei Zhang ◽  
Yue-Ping Shen ◽  
Wen-Zhu Hu ◽  
Shu Ran ◽  
Yong Lin ◽  
...  
2010 ◽  
Vol 42 (7) ◽  
pp. 570-575 ◽  
Author(s):  
Ju-Hyun Park ◽  
Sholom Wacholder ◽  
Mitchell H Gail ◽  
Ulrike Peters ◽  
Kevin B Jacobs ◽  
...  

Author(s):  
Junji Morisawa ◽  
Takahiro Otani ◽  
Jo Nishino ◽  
Ryo Emoto ◽  
Kunihiko Takahashi ◽  
...  

AbstractBayes factor analysis has the attractive property of accommodating the risks of both false negatives and false positives when identifying susceptibility gene variants in genome-wide association studies (GWASs). For a particular SNP, the critical aspect of this analysis is that it incorporates the probability of obtaining the observed value of a statistic on disease association under the alternative hypotheses of non-null association. An approximate Bayes factor (ABF) was proposed by Wakefield (Genetic Epidemiology 2009;33:79–86) based on a normal prior for the underlying effect-size distribution. However, misspecification of the prior can lead to failure in incorporating the probability under the alternative hypothesis. In this paper, we propose a semi-parametric, empirical Bayes factor (SP-EBF) based on a nonparametric effect-size distribution estimated from the data. Analysis of several GWAS datasets revealed the presence of substantial numbers of SNPs with small effect sizes, and the SP-EBF attributed much greater significance to such SNPs than the ABF. Overall, the SP-EBF incorporates an effect-size distribution that is estimated from the data, and it has the potential to improve the accuracy of Bayes factor analysis in GWASs.


2020 ◽  
Author(s):  
Luke Jen O’Connor

AbstractThe genetic effect-size distribution describes the number of variants that affect disease risk and the range of their effect sizes. Accurate estimates of this distribution would provide insights into genetic architecture and set sample-size targets for future genome-wide association studies. We developed Fourier Mixture Regression (FMR) to estimate common-variant effect-size distributions from GWAS summary statistics. We validated FMR in simulations and in analyses of UK Biobank data, using interim-release summary statistics (max N=145k) to predict the results of the full release (N=460k). Analyzing summary statistics for 10 diseases (avg Neff=169k) and 22 other traits, we estimated the sample size required for genome-wide significant SNPs to explain 50% of SNP-heritability. For most diseases the requisite number of cases is 100k-1M, an attainable number; ten times more would be required to explain 90% of heritability. In well-powered GWAS, genome-wide significance is a conservative threshold, and loci at less stringent thresholds have true positive rates that remain close to 1 if confounding is controlled. Analyzing the shape of the effect-size distribution, we estimate that heritability accumulates across many thousands of SNPs with a wide range of effect sizes: the largest effects (at the 90th percentile of heritability) are 100 times larger than the smallest (10th percentile), and while the midpoint of this range varies across traits, its size is similar. These results suggest attainable sample size targets for future GWAS, and they underscore the complexity of genetic architecture.


2017 ◽  
Vol 77 ◽  
pp. 211-218 ◽  
Author(s):  
Jieyun Li ◽  
Awais Rasheed ◽  
Qi Guo ◽  
Yan Dong ◽  
Jindong Liu ◽  
...  

Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


Author(s):  
Ian J. Deary

‘What are the contributions of environments and genes to intelligence differences?’ asks whether genetic inheritance and the environments people experience affect intelligence differences. Researchers use two main resources to answer this question: twins and samples of DNA. Studies of identical and non-identical twins are used to show the contributions of genes, shared environment, and non-shared environment to people’s differences in traits. Twin studies tell us that by adulthood, about two-thirds of intelligence differences are caused by how people vary in their genetic inheritance, and that both shared and non-shared environments make significant contributions to intelligence differences. The introduction of genome-wide association studies in 2011 has provided a new method of estimating the heritability of intelligence.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiali Sun ◽  
Qingtai Wu ◽  
Dafeng Shen ◽  
Yangjun Wen ◽  
Fengrong Liu ◽  
...  

AbstractOne of the most important tasks in genome-wide association analysis (GWAS) is the detection of single-nucleotide polymorphisms (SNPs) which are related to target traits. With the development of sequencing technology, traditional statistical methods are difficult to analyze the corresponding high-dimensional massive data or SNPs. Recently, machine learning methods have become more popular in high-dimensional genetic data analysis for their fast computation speed. However, most of machine learning methods have several drawbacks, such as poor generalization ability, over-fitting, unsatisfactory classification and low detection accuracy. This study proposed a two-stage algorithm based on least angle regression and random forest (TSLRF), which firstly considered the control of population structure and polygenic effects, then selected the SNPs that were potentially related to target traits by using least angle regression (LARS), furtherly analyzed this variable subset using random forest (RF) to detect quantitative trait nucleotides (QTNs) associated with target traits. The new method has more powerful detection in simulation experiments and real data analyses. The results of simulation experiments showed that, compared with the existing approaches, the new method effectively improved the detection ability of QTNs and model fitting degree, and required less calculation time. In addition, the new method significantly distinguished QTNs and other SNPs. Subsequently, the new method was applied to analyze five flowering-related traits in Arabidopsis. The results showed that, the distinction between QTNs and unrelated SNPs was more significant than the other methods. The new method detected 60 genes confirmed to be related to the target trait, which was significantly higher than the other methods, and simultaneously detected multiple gene clusters associated with the target trait.


Sign in / Sign up

Export Citation Format

Share Document