Exact p-values for large-scale single step genome-wide association, with an application for birth weight in American Angus

ABSTRACTBACKGROUNDSingle Step GBLUP (SSGBLUP) is the most comprehensive method for genomic prediction. Point estimates of marker effects from SSGBLUP are often used for Genome Wide Association Studies (GWAS) without a formal framework of hypothesis testing. Our objective was to implement p-values for GWAS studies in the ssGBLUP framework, showing algorithms, computational procedures, and an application to a large beef cattle population.METHODSP-values were obtained based on the prediction error (co)variance for SNP, which uses the inverse of the coefficient matrix and formulas to compute SNP effects.RESULTSComputation of p-values took a negligible time for a dataset with almost 2 million animals in the pedigree and 1424 genotyped sires, and no inflation was observed. The SNP passing the Bonferroni threshold of 5.9 in the −log10 scale were the same as those that explained the highest proportion of additive genetic variance, but the latter was penalized (as GWAS signal) by low allele frequency.CONCLUSIONThe exact p-value for SSGWAS is a very general and efficient strategy for QTL detection and testing. It can be used in complex data sets such as used in animal breeding, where only a proportion of pedigreed animals are genotyped.

Download Full-text

FORGE: multivariate calculation of gene-wide p-values from Genome-Wide Association Studies Authors and Affiliations

10.1101/023648 ◽

2015 ◽

Cited By ~ 2

Author(s):

Inti Inal Pedroso ◽

Michael R Barnes ◽

Anbarasu Lourdusamy ◽

Ammar Al-Chalabi ◽

Gerome Breen

Keyword(s):

Statistical Power ◽

Association Studies ◽

Single Point ◽

Genome Wide Association ◽

P Value ◽

Disease Genes ◽

Snp Analysis ◽

Genome Wide Association Studies ◽

P Values ◽

Genome Wide

Genome-wide association studies (GWAS) have proven a valuable tool to explore the genetic basis of many traits. However, many GWAS lack statistical power and the commonly used single-point analysis method needs to be complemented to enhance power and interpretation. Multivariate region or gene-wide association are an alternative, allowing for identification of disease genes in a manner more robust to allelic heterogeneity. Gene-based association also facilitates systems biology analyses by generating a single p-value per gene. We have designed and implemented FORGE, a software suite which implements a range of methods for the combination of p-values for the individual genetic variants within a gene or genomic region. The software can be used with summary statistics (marker ids and p-values) and accepts as input the result file formats of commonly used genetic association software. When applied to a study of Crohn's disease susceptibility, it identified all genes found by single SNP analysis and additional genes identified by large independent meta-analysis. FORGE p-values on gene-set analyses highlighted association with the Jak-STAT and cytokine signalling pathways, both previously associated with CD. We highlight the software's main features, its future development directions and provide a comparison with alternative available software tools. FORGE can be freely accessed at https://github.com/inti/FORGE.

Download Full-text

Efficient and Powerful Method for Combining P-Values in Genome-Wide Association Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2015.2509977 ◽

2016 ◽

Vol 13 (6) ◽

pp. 1100-1106 ◽

Cited By ~ 2

Author(s):

Natalia Vilor-Tejedor ◽

Juan R. Gonzalez ◽

M. Luz Calle

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Powerful Method ◽

P Values ◽

Genome Wide

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text

Genome-Wide Association Study Identifies Loci Associated with Sensitive Skin

Cosmetics ◽

10.3390/cosmetics7020049 ◽

2020 ◽

Vol 7 (2) ◽

pp. 49

Author(s):

Miranda A. Farage ◽

Yunxuan Jiang ◽

Jay P. Tiesman ◽

Pierre Fontanillas ◽

Rosemarie Osborne

Keyword(s):

Genome Wide Association Study ◽

Association Studies ◽

Genome Wide Association ◽

European Ancestry ◽

P Value ◽

Genome Wide Association Studies ◽

Online Questionnaire ◽

Sensitive Skin ◽

Genome Wide ◽

Skin Conditions

Individuals suffering from sensitive skin often have other skin conditions and/or diseases, such as fair skin, freckles, rosacea, or atopic dermatitis. Genome-wide association studies (GWAS) have been performed for some of these conditions, but not for sensitive skin. In this study, a total of 23,426 unrelated participants of European ancestry from the 23andMe database were evaluated for self-declared sensitive skin, other skin conditions, and diseases using an online questionnaire format. Responders were separated into two groups: those who declared they had sensitive skin (n = 8971) and those who declared their skin was not sensitive (controls, n = 14,455). A GWAS of sensitive skin individuals identified three genome-wide significance loci (p-value < 5 × 10−8) and seven suggestive loci (p-value < 1 × 10−6). Of the three most significant loci, all have been associated with pigmentation and two have been associated with acne.

Download Full-text

A Regression-based Framework for Scalable Pathway-guided Search in Genome-wide Association Studies

10.1101/241265 ◽

2017 ◽

Author(s):

Shrayashi Biswas ◽

Soumen Pal ◽

Samsiddhi Bhattacharjee

Keyword(s):

Association Studies ◽

Analytical Framework ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Biological Databases ◽

Disease Etiology ◽

P Values ◽

Guided Search ◽

Genome Wide ◽

A Genome

AbstractTraditional unbiased genome-wide association studies (GWAS) have successfully identified thousands of loci associated with various complex diseases but there is evidence to suggest that many variants were missed at stringent genome-wide thresholds. Fortunately, there is a rapidly increasing amount of prior knowledge in publicly available genomic datasets and biological databases that can be harnessed to enhance the power of discovering SNPs/Genes from existing or new GWAS datasets. For most diseases, many of the identified loci tend to cluster into a few specific biological pathways/networks. From the point of view of disease etiology, such clustering is generally to be expected. This phenomenon can be exploited to conduct a more powerful genome-wide scan that is tailored to identify loci that are interconnected in pathways. We propose a scalable regression-based analytical framework to enable such a pathway-guided GWAS and demonstrate that it provides significant gains in power to detect disease associated SNPs. Our method requires two inputs, namely a) genome-wide summary level data (e.g., SNP p-values) and b) a grouping of genes into biologically meaningful categories (e.g., a database of pathways). It automatically adjusts the input p-values by incorporating the knowledge derived adaptively from the data and the pathways specified. The method involves a regularized logistic regression analysis to derive priors of each SNP and then re-weights the p-values of SNPs so as to maximize overall power of making discoveries. It increases the power to discover SNPs co-clustering into some of these pathways, while maintaining the global type-1 error (FWER) at the desired level. We used whole-genome simulations and summary data from real GWA studies of psoriasis, SLE, coronary artery disease and type-2 diabetes to illustrate the power improvement achieved by pathway-guided search. Our pipeline implemented as an R package can flexibly handle large number of prior annotations possibly derived from multiple databases.

Download Full-text

A meta-analysis of the genome-wide association studies on two genetically correlated phenotypes (self-reported headache and self-reported migraine) identifies four new risk loci for headaches (N=397,385)

10.1101/2021.09.15.21263668 ◽

2021 ◽

Author(s):

Weihua Meng ◽

Parminder Reel ◽

Charvi Nangia ◽

Aravind Rajendrakumar ◽

Harry Hebert ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

The Self ◽

Genome Wide Association ◽

P Value ◽

Clinical Settings ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genetic Mechanisms ◽

The Uk

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.

Download Full-text

Flaw or discovery? Calculating exact p-values for genome-wide association studies in inbred populations

10.1101/015339 ◽

2015 ◽

Author(s):

Xia Shen

Keyword(s):

Allele Frequency ◽

Frequency Distribution ◽

Minor Allele Frequency ◽

Association Studies ◽

Genome Wide Association ◽

Allele Frequency Distribution ◽

Genome Wide Association Studies ◽

P Values ◽

Genome Wide ◽

Inbred Populations

Motivation: Genome-wide association studies have been conducted in inbred populations where the sample size is small. The ordinary association p-values and multiple testing correction therefore become questionable, as the detected genetic effect may or may not be due to chance, depending on the minor allele frequency distribution across the genome. Instead of permutation testing, marker-specific false positive rate can be analytically calculated in inbred populations without heterozygotes. Results: Solutions of exact p-values for genome-wide association studies in inbred populations were derived and implemented. An example is presented to illustrate that the marker-specific experiment-wise p-value varies as the genome-wide minor allele frequency distribution changes. A simulation using real Arabidopsis thaliana genome indicates that the use of exact p-values improves detection power and reduces inflation due to population structure. An analysis of a defense-related case-control phenotype using the exact p-values revealed the causal locus, where markers with higher MAFs had smaller p-values than the top variants with lower MAFs in ordinary genome-wide association analysis. Availability and Implementation: Project URL: https://r-forge.r-project.org/projects/statomics/. The R package p.exact: https://r-forge.r-project.org/R/?group_id=2030.

Download Full-text

SeqBreed: a python tool to evaluate genomic prediction in complex scenarios

10.1101/748624 ◽

2019 ◽

Author(s):

M. Pérez-Enciso ◽

L. C. Ramírez-Ayala ◽

L.M. Zingaretti

Keyword(s):

Genomic Prediction ◽

Predictive Accuracy ◽

Sequence Data ◽

Association Studies ◽

Single Step ◽

Genome Wide Association ◽

Drosophila Genome ◽

Genome Wide Association Studies ◽

Complex Phenotypes ◽

Genome Wide

AbstractBackgroundGenomic Prediction (GP) is the procedure whereby molecular information is used to predict complex phenotypes. Although GP can significantly enhance predictive accuracy, it can be expensive and difficult to implement. To help in designing optimum experiments, including genome wide association studies and genomic selection experiments, we have developed SeqBreed, a generic and flexible python3 forward simulator.ResultsSeqBreed accommodates sex and mitochondrion chromosomes as well as autopolyploidy. It can simulate any number of complex phenotypes determined by any number of causal loci. SeqBreed implements several GP methods, including single step GBLUP. We demonstrate its functionality with Drosophila Genome Reference Panel (DGRP) sequence data and with tetraploid potato genotypes.ConclusionsSeqBreed is a flexible and easy to use tool appropriate for optimizing GP or genome wide association studies. It incorporates some of the most popular GP methods and includes several visualization tools. Code is open and can be freely modified. Software, documentation and examples are available at https://github.com/miguelperezenciso/SeqBreed.

Download Full-text

The paltry power of priors versus populations

10.1101/737676 ◽

2019 ◽

Author(s):

Jianan Zhan ◽

Dan E. Arking ◽

Joel S. Bader

Keyword(s):

Population Size ◽

Multiple Testing ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Significant Finding ◽

Rna Seq ◽

Test Power ◽

Genome Wide

AbstractBiological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods.Author summaryBiological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.

Download Full-text

Multi-ethnic genome-wide association study of decomposed cardioelectric phenotypes illustrates strategies to identify and characterize evidence of shared genetic effects for complex traits

10.1101/654012 ◽

2019 ◽

Cited By ~ 1

Author(s):

Antoine R. Baldassari ◽

Colleen M. Sitlani ◽

Heather M. Highland ◽

Dan E. Arking ◽

Steve Buyske ◽

...

Keyword(s):

Complex Traits ◽

Genome Wide Association Study ◽

Association Studies ◽

Genetic Effects ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Genome Wide ◽

Trait Loci

ABSTRACTBackgroundPublished genome-wide association studies (GWAS) are mainly European-centric, examine a narrow view of phenotypic variation, and infrequently interrogate genetic effects shared across traits. We therefore examined the extent to which a multi-ethnic, combined trait GWAS of phenotypes that map to well-defined biology can enable detection and characterization of complex trait loci.MethodsWith 1000 Genomes Phase 3 imputed data in 34,668 participants (15% African American; 3% Chinese American; 51% European American; 30% Hispanic/Latino), we performed covariate-adjusted univariate GWAS of six contiguous electrocardiogram (ECG) traits that decomposed an average heartbeat and two commonly reported composite ECG traits that summed contiguous traits. Combined phenotype testing was performed using the adaptive sum of powered scores test (aSPU).ResultsWe identified six novel and 87 known ECG trait loci (aSPU p-value < 5E-9). Lead SNP rs3211938 at novel locus CD36 was common in African Americans (minor allele frequency=10%) and near-monomorphic in European Americans, with effect sizes for the composite trait, QT interval, among the largest reported. Only one novel locus was detected for the composite traits, due to opposite directions of effects across contiguous traits that summed to near-zero. Combined phenotype testing did not detect novel loci unapparent by univariate testing. However, this approach aided locus characterization, particularly when loci harbored multiple independent signals that differed by trait.ConclusionsDespite including one-third as few participants as the largest published GWAS of ECG traits, our study identifies multiple novel ECG genetic loci, emphasizing the importance of ancestral diversity and phenotype measurement in this era of ever-growing GWAS.AUTHOR SUMMARYWe leveraged a multiethnic cohort with precise measures of cardioelectric function to identify novel genetic loci affecting this complex, multifaceted phenotype. The success of our approach stresses the importance of phenotypic precision and participant diversity for future locus discovery and characterization efforts, and cautions against compromises made in genome-wide association studies to pursue ever-growing sample sizes.

Download Full-text