EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

We apply the statistical framework for genome-wide association studies (GWAS) to eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterise the structure of genetic data. We show that loci under selection can be detected in a structured population by using eigenvectors as phenotypes in a single-marker GWAS. We find LCT to be under selection between HapMap CEU-TSI cohorts, a finding that was replicated across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU-TSI cohort and among POPRES samples, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. We show that when determining the effect of a SNP on an eigenvector, three methods of single-marker regression of eigenvectors, best linear unbiased prediction of eigenvectors, and singular value decomposition of SNP data are equivalent to each other. We also demonstrate that estimated SNP effects on eigenvectors from a reference panel can be used to predict eigenvectors (the projected eigenvectors) in a target sample with high accuracy, particularly for the primary eigenvectors. Under this GWAS framework, ancestry informative markers and loci under selection can be identified, and population structure can be captured and easily interpreted. We have developed freely available software to facilitate the application of the methods (https://github.com/gc5k/GEAR/wiki/EigenGWAS).

Download Full-text

Use of the Multivariate Discriminant Analysis for Genome-Wide Association Studies in Cattle

Animals ◽

10.3390/ani10081300 ◽

2020 ◽

Vol 10 (8) ◽

pp. 1300 ◽

Cited By ~ 1

Author(s):

Elisabetta Manca ◽

Alberto Cesarani ◽

Giustino Gaspa ◽

Silvia Sorbolini ◽

Nicolò P.P. Macciotta ◽

...

Keyword(s):

Discriminant Analysis ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Stepwise Discriminant Analysis ◽

Genome Wide Association Studies ◽

Multivariate Method ◽

Genome Wide ◽

Single Marker ◽

Multivariate Gwas

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.

Download Full-text

GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btn516 ◽

2008 ◽

Vol 24 (23) ◽

pp. 2784-2785 ◽

Cited By ~ 119

Author(s):

Marit Holden ◽

Shiwei Deng ◽

Leszek Wojnowski ◽

Bettina Kulle

Keyword(s):

Association Studies ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set Enrichment ◽

Gene Set ◽

Snp Data ◽

Genome Wide

Download Full-text

Genome-wide association studies in elite varieties of German winter barley using single-marker and haplotype-based methods

Plant Breeding ◽

10.1111/pbr.12237 ◽

2015 ◽

Vol 134 (1) ◽

pp. 28-39 ◽

Cited By ~ 21

Author(s):

Inka Gawenda ◽

Patrick Thorwarth ◽

Torsten Günther ◽

Frank Ordon ◽

Karl J. Schmid

Keyword(s):

Association Studies ◽

Winter Barley ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Single Marker

Download Full-text

SNP data analysis in genome-wide association studies

10.14711/thesis-b1146292 ◽

2011 ◽

Author(s):

Can Yang

Keyword(s):

Data Analysis ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Snp Data ◽

Genome Wide

Download Full-text

networkGWAS: A network-based approach for genome-wide association studies in structured populations

10.1101/2021.11.11.468206 ◽

2021 ◽

Author(s):

Giulia Muzio ◽

Leslie O'Bray ◽

Laetitia Meng-Papaxanthos ◽

Juliane Klatt ◽

Karsten Borgwardt

Keyword(s):

Genetic Markers ◽

Complex Traits ◽

Multiple Testing ◽

Association Studies ◽

Search Space ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Testing Correction ◽

Genome Wide

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.

Download Full-text

Joint Genotype- and Ancestry-based Genome-wide Association Studies in Admixed Populations

10.1101/062554 ◽

2016 ◽

Cited By ~ 2

Author(s):

Piotr Szulc ◽

Malgorzata Bogdan ◽

Florian Frommlet ◽

Hua Tang

Keyword(s):

Linkage Disequilibrium ◽

Complex Traits ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Single Marker Analysis ◽

Marker Analysis ◽

Genome Wide ◽

Single Marker

AbstractIn Genome-Wide Association Studies (GWAS) genetic loci that influence complex traits are localized by inspecting associations between genotypes of genetic markers and the values of the trait of interest. On the other hand Admixture Mapping, which is performed in case of populations consisting of a recent mix of two ancestral groups, relies on the ancestry information at each locus (locus-specific ancestry).Recently it has been proposed to jointly model genotype and locus-specific ancestry within the framework of single marker tests. Here we extend this approach for population-based GWAS in the direction of multi marker models. A modified version of the Bayesian Information Criterion is developed for building a multi-locus model, which accounts for the differential correlation structure due to linkage disequilibrium and admixture linkage disequilibrium. Simulation studies and a real data example illustrate the advantages of this new approach compared to single-marker analysis and modern model selection strategies based on separately analyzing genotype and ancestry data, as well as to single-marker analysis combining genotypic and ancestry information. Depending on the signal strength our procedure automatically chooses whether genotypic or locus-specific ancestry markers are added to the model. This results in a good compromise between the power to detect causal mutations and the precision of their localization. The proposed method has been implemented in R and is available at http://www.math.uni.wroc.pl/~mbogdan/admixtures/.

Download Full-text

High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software

F1000Research ◽

10.12688/f1000research.4867.1 ◽

2014 ◽

Vol 3 ◽

pp. 200 ◽

Cited By ~ 11

Author(s):

Diego Fabregat-Traver ◽

Sodbo Zh. Sharapov ◽

Caroline Hayward ◽

Igor Rudan ◽

Harry Campbell ◽

...

Keyword(s):

High Performance ◽

Mixed Model ◽

Association Studies ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Multiple Trait ◽

Model Based ◽

Genome Wide

To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations,increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNUGPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.

Download Full-text