Flaw or discovery? Calculating exact p-values for genome-wide association studies in inbred populations

P Values ◽

Genome Wide ◽

Inbred Populations

Motivation: Genome-wide association studies have been conducted in inbred populations where the sample size is small. The ordinary association p-values and multiple testing correction therefore become questionable, as the detected genetic effect may or may not be due to chance, depending on the minor allele frequency distribution across the genome. Instead of permutation testing, marker-specific false positive rate can be analytically calculated in inbred populations without heterozygotes. Results: Solutions of exact p-values for genome-wide association studies in inbred populations were derived and implemented. An example is presented to illustrate that the marker-specific experiment-wise p-value varies as the genome-wide minor allele frequency distribution changes. A simulation using real Arabidopsis thaliana genome indicates that the use of exact p-values improves detection power and reduces inflation due to population structure. An analysis of a defense-related case-control phenotype using the exact p-values revealed the causal locus, where markers with higher MAFs had smaller p-values than the top variants with lower MAFs in ordinary genome-wide association analysis. Availability and Implementation: Project URL: https://r-forge.r-project.org/projects/statomics/. The R package p.exact: https://r-forge.r-project.org/R/?group_id=2030.

Comparison of the Performance of Two Commercial Genome-Wide Association Study Genotyping Platforms in Han Chinese Samples

G3 Genes|Genome|Genetics ◽

10.1534/g3.112.004069 ◽

2013 ◽

Vol 3 (1) ◽

pp. 23-29 ◽

Cited By ~ 16

Author(s):

Lei Jiang ◽

Dana Willner ◽

Patrick Danoy ◽

Huji Xu ◽

Matthew A Brown

Keyword(s):

Allele Frequency ◽

Minor Allele Frequency ◽

Genome Wide Association Study ◽

Association Studies ◽

Han Chinese ◽

Minor Allele ◽

Genome Wide Association ◽

Nucleotide Polymorphisms ◽

Abstract Most genome-wide association studies to date have been performed in populations of European descent, but there is increasing interest in expanding these studies to other populations. The performance of genotyping chips in Asian populations is not well established. Therefore, we sought to test the performance of widely used fixed-marker, genome-wide association studies chips in the Han Chinese population. Non-HapMap Chinese samples (n = 396) were genotyped using the Illumina OmniExpress and Affymetrix 6.0 platforms, whereas a subset also were genotyped using the Immunochip. Genotyped markers from the Affymetrix 6.0 and Illumina OmniExpress were used for full genome imputation based on the HapMap 2 JPT+CHB (Japanese from Tokyo, Japan and Chinese from Beijing, China) reference panel. The concordance between markers genotypes for the three platforms was very high whether directly genotyped or genotyped and imputed single nucleotide polymorphisms (SNPs; >99.8% for directly genotyped and >99.5% for genotyped and imputed SNPs, respectively) were compared. The OmniExpress chip data enabled more SNPs to be imputed, particularly SNPs with minor allele frequency >5%. The OmniExpress chip achieved better coverage of HapMap SNPs than the Affymetrix 6.0 chip (73.6% vs. 65.9%, respectively, for minor allele frequency >5%). The Affymetrix 6.0 and Illumina OmniExpress chip have similar genotyping accuracy and provide similar accuracy of imputed SNPs. The OmniExpress chip however provides better coverage of Asian HapMap SNPs, although its coverage of HapMap SNPs is moderate.

Efficient and Powerful Method for Combining P-Values in Genome-Wide Association Studies

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2015.2509977 ◽

2016 ◽

Vol 13 (6) ◽

pp. 1100-1106 ◽

Cited By ~ 2

Author(s):

Natalia Vilor-Tejedor ◽

Juan R. Gonzalez ◽

M. Luz Calle

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Powerful Method ◽

P Values ◽

Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing - Studies in Computational Intelligence ◽

An Evaluation of the MiDCoP Method for Imputing Allele Frequency in Genome Wide Association Studies

10.1007/978-3-319-10389-1_5 ◽

2015 ◽

pp. 57-67

Author(s):

Yadu Gautam ◽

Carl Lee ◽

Chin-I Cheng ◽

Carl Langefeld

Keyword(s):

Allele Frequency ◽

Association Studies ◽

Genome Wide Association ◽

A Regression-based Framework for Scalable Pathway-guided Search in Genome-wide Association Studies

10.1101/241265 ◽

2017 ◽

Author(s):

Shrayashi Biswas ◽

Soumen Pal ◽

Samsiddhi Bhattacharjee

Keyword(s):

Association Studies ◽

Analytical Framework ◽

Genome Wide Association ◽

Biological Databases ◽

Disease Etiology ◽

P Values ◽

Guided Search ◽

Genome Wide ◽

A Genome

AbstractTraditional unbiased genome-wide association studies (GWAS) have successfully identified thousands of loci associated with various complex diseases but there is evidence to suggest that many variants were missed at stringent genome-wide thresholds. Fortunately, there is a rapidly increasing amount of prior knowledge in publicly available genomic datasets and biological databases that can be harnessed to enhance the power of discovering SNPs/Genes from existing or new GWAS datasets. For most diseases, many of the identified loci tend to cluster into a few specific biological pathways/networks. From the point of view of disease etiology, such clustering is generally to be expected. This phenomenon can be exploited to conduct a more powerful genome-wide scan that is tailored to identify loci that are interconnected in pathways. We propose a scalable regression-based analytical framework to enable such a pathway-guided GWAS and demonstrate that it provides significant gains in power to detect disease associated SNPs. Our method requires two inputs, namely a) genome-wide summary level data (e.g., SNP p-values) and b) a grouping of genes into biologically meaningful categories (e.g., a database of pathways). It automatically adjusts the input p-values by incorporating the knowledge derived adaptively from the data and the pathways specified. The method involves a regularized logistic regression analysis to derive priors of each SNP and then re-weights the p-values of SNPs so as to maximize overall power of making discoveries. It increases the power to discover SNPs co-clustering into some of these pathways, while maintaining the global type-1 error (FWER) at the desired level. We used whole-genome simulations and summary data from real GWA studies of psoriasis, SLE, coronary artery disease and type-2 diabetes to illustrate the power improvement achieved by pathway-guided search. Our pipeline implemented as an R package can flexibly handle large number of prior annotations possibly derived from multiple databases.

FORGE: multivariate calculation of gene-wide p-values from Genome-Wide Association Studies Authors and Affiliations

10.1101/023648 ◽

2015 ◽

Cited By ~ 2

Author(s):

Inti Inal Pedroso ◽

Michael R Barnes ◽

Anbarasu Lourdusamy ◽

Ammar Al-Chalabi ◽

Gerome Breen

Keyword(s):

Statistical Power ◽

Association Studies ◽

Single Point ◽

Genome Wide Association ◽

P Value ◽

Disease Genes ◽

Snp Analysis ◽

P Values ◽

Genome-wide association studies (GWAS) have proven a valuable tool to explore the genetic basis of many traits. However, many GWAS lack statistical power and the commonly used single-point analysis method needs to be complemented to enhance power and interpretation. Multivariate region or gene-wide association are an alternative, allowing for identification of disease genes in a manner more robust to allelic heterogeneity. Gene-based association also facilitates systems biology analyses by generating a single p-value per gene. We have designed and implemented FORGE, a software suite which implements a range of methods for the combination of p-values for the individual genetic variants within a gene or genomic region. The software can be used with summary statistics (marker ids and p-values) and accepts as input the result file formats of commonly used genetic association software. When applied to a study of Crohn's disease susceptibility, it identified all genes found by single SNP analysis and additional genes identified by large independent meta-analysis. FORGE p-values on gene-set analyses highlighted association with the Jak-STAT and cytokine signalling pathways, both previously associated with CD. We highlight the software's main features, its future development directions and provide a comparison with alternative available software tools. FORGE can be freely accessed at https://github.com/inti/FORGE.

Evaluating Strategies for Marker Ranking in Genome-wide Association Studies of Complex Traits

Methods of Information in Medicine ◽

10.3414/me09-02-0055 ◽

2010 ◽

Vol 49 (06) ◽

pp. 632-640 ◽

Cited By ~ 2

Author(s):

J. Hebebrand ◽

H.-E. Wichmann ◽

K.-H. Jöckel ◽

A. Scherag

Keyword(s):

False Positive ◽

Complex Traits ◽

Association Studies ◽

A Priori ◽

Genome Wide Association ◽

P Values ◽

Genome Wide ◽

A Genome ◽

Mean Square Errors

Summary Background: Genome-wide association studies (GWAS) were highly successful in identifying new susceptibility loci of complex traits. Such studies usually start with genotyping fixed arrays of genetic markers in an initial sample. Out of these markers, some are selected which will be further genotyped in independentsamples. Due tothevery low a priori probability of a true positive association, the vast majority of all marker signals will turn out to be false positive. Thus, several methods to sort marker data have been proposed which will be evaluated here. Objectives: We compared statistical properties of ranking by p-values, q-values, the False Positive Report Probability (FPRP) and the Bayesian False-Discovery Probability (BFDP). Methods: We performed simulation studies for a genomic region derived from GWAS data sets and calculated descriptive statistics as well as mean square errors with regard to the true marker ranking. Additionally, we applied all measures to a GWAS for early onset extreme obesity superimposing a priori information on candidate genes. Results: Despite the known, more extreme probability results for traditional p-values, we observed that both p-values and the BFDP were more precise in reconstructing the “true” order of the markers in a region. In addition, the BFDP was useful to attenuate unexpected effects at a genome-wide scale. Conclusions: For the purpose of selecting markers from an initial GWAS and within the limits of this study, we recommend either ranking by p-values or the application of a full Bayesian approach for which the BFDP is a first approximation.

Exact p-values for large-scale single step genome-wide association, with an application for birth weight in American Angus

10.1101/555243 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ignacio Aguilar ◽

Andres Legarra ◽

Fernando Cardoso ◽

Yutaka Masuda ◽

Daniela Lourenco ◽

...

Keyword(s):

Association Studies ◽

Additive Genetic Variance ◽

Single Step ◽

Genome Wide Association ◽

P Value ◽

Complex Data ◽

P Values ◽

Genome Wide ◽

Formal Framework

ABSTRACTBACKGROUNDSingle Step GBLUP (SSGBLUP) is the most comprehensive method for genomic prediction. Point estimates of marker effects from SSGBLUP are often used for Genome Wide Association Studies (GWAS) without a formal framework of hypothesis testing. Our objective was to implement p-values for GWAS studies in the ssGBLUP framework, showing algorithms, computational procedures, and an application to a large beef cattle population.METHODSP-values were obtained based on the prediction error (co)variance for SNP, which uses the inverse of the coefficient matrix and formulas to compute SNP effects.RESULTSComputation of p-values took a negligible time for a dataset with almost 2 million animals in the pedigree and 1424 genotyped sires, and no inflation was observed. The SNP passing the Bonferroni threshold of 5.9 in the −log10 scale were the same as those that explained the highest proportion of additive genetic variance, but the latter was penalized (as GWAS signal) by low allele frequency.CONCLUSIONThe exact p-value for SSGWAS is a very general and efficient strategy for QTL detection and testing. It can be used in complex data sets such as used in animal breeding, where only a proportion of pedigreed animals are genotyped.

2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) ◽

Computing Empirical P-Values for Estimating Gene-Gene Interactions in Genome-Wide Association Studies: A Parallel Computing Approach

10.1109/pdp2018.2018.00071 ◽

2018 ◽

Author(s):

Valentina Giansant ◽

Daniele D'Agostino ◽

Carlo Maj ◽

Stefano Beretta ◽

Ivan Merelli

Keyword(s):

Parallel Computing ◽

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

P Values ◽

Genome Wide ◽

Computing Approach

Efficiency of genome-wide association study in open-pollinated populations

10.1101/050955 ◽

2016 ◽

Author(s):

José Marcelo Soriano Viana ◽

Gabriel Borges Mundim ◽

Fabyano Fonseca e Silva ◽

Antonio Augusto Franco Garcia

Keyword(s):

Sample Size ◽

False Positive ◽

Genome Wide Association Study ◽

Association Studies ◽

Inbred Lines ◽

Genome Wide Association ◽

Qtl Detection ◽

Genome Wide ◽

Inbred Populations

ABSTRACTGenome-wide association studies (GWAS) with plant species have employed inbred lines panels. Thus, to our knowledge, no information is available on theory and efficiency of GWAS in open-pollinated populations. Our objectives are to present quantitative genetics theory for GWAS, evaluate the relative efficiency of GWAS in non-inbred and inbred populations and in an inbred lines panel, and assess factors affecting GWAS, such as linkage disequilibrium (LD), sample size, and quantitative trait locus (QTL) heritability. Fifty samples of 400 individuals from populations with LD were simulated. Individuals were genotyped for 10,000 single nucleotide polymorphisms (SNPs) and phenotyped for traits with different degrees of dominance controlled by 10 QTLs and 90 minor genes. The average SNP density was 0.1 centiMorgan and the trait heritabilities were 0.4 and 0.8. We assessed GWAS efficiency based on the power of QTL detection, number of false-positive associations, bias in the estimated QTL position, and range of the significant SNPs for the same QTL. When the LD between a QTL and one or more SNPs is restricted to markers very close to or within the QTL, GWAS in open-pollinated populations can be highly efficient, depending mainly on QTL heritability and sample size. GWAS achieved the highest power of QTL detection, the smallest number of false-positive associations, and the lowest bias in the estimated QTL position for the inbred lines panel correcting for population structure. Under low QTL heritability and reduced sample size, GWAS is ineffective for non-inbred and inbred populations and for inbred lines panel.

Estimation of linkage disequilibrium levels and allele frequency distribution in crossbred Vrindavani cattle using 50K SNP data

PLoS ONE ◽

10.1371/journal.pone.0259572 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259572

Author(s):

Akansha Singh ◽

Amit Kumar ◽

Arnav Mehrotra ◽

Karthikeyan A. ◽

Ashwni Kumar Pandey ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Allele Frequency ◽

Minor Allele Frequency ◽

Association Studies ◽

Minor Allele ◽

Pairwise Distance ◽

Allele Frequency Distribution ◽

Effective Population ◽

Autosomal Snps

The objective of this study was to calculate the extent and decay of linkage disequilibrium (LD) in 96 crossbred Vrindavani cattle genotyped with Bovine SNP50K Bead Chip. After filtering, 43,821 SNPs were retained for final analysis, across 2500.3 Mb of autosome. A significant percentage of SNPs was having minor allele frequency of less than 0.20. The extent of LD between autosomal SNPs up to 10 Mb apart across the genome was measured using r2 statistic. The mean r2 value was 0.43, if pairwise distance of marker was less than10 kb and it decreased further to 0.21 for 25–50 kb markers distance. Further, the effect of minor allele frequency and sample size on LD estimate was investigated. The LD value decreased with the increase in inter-marker distance, and increased with the increase of minor allelic frequency. The estimated inbreeding coefficient and effective population size were 0.04, and 46 for present generation, which indicated small and unstable population of Vrindavani cattle. These findings suggested that a denser or breed specific SNP panel would be required to cover all genome of Vrindavani cattle for genome wide association studies (GWAS).