Use of the Multivariate Discriminant Analysis for Genome-Wide Association Studies in Cattle

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.

Download Full-text

Joint Genotype- and Ancestry-based Genome-wide Association Studies in Admixed Populations

10.1101/062554 ◽

2016 ◽

Cited By ~ 2

Author(s):

Piotr Szulc ◽

Malgorzata Bogdan ◽

Florian Frommlet ◽

Hua Tang

Keyword(s):

Linkage Disequilibrium ◽

Complex Traits ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Single Marker Analysis ◽

Marker Analysis ◽

Genome Wide ◽

Single Marker

AbstractIn Genome-Wide Association Studies (GWAS) genetic loci that influence complex traits are localized by inspecting associations between genotypes of genetic markers and the values of the trait of interest. On the other hand Admixture Mapping, which is performed in case of populations consisting of a recent mix of two ancestral groups, relies on the ancestry information at each locus (locus-specific ancestry).Recently it has been proposed to jointly model genotype and locus-specific ancestry within the framework of single marker tests. Here we extend this approach for population-based GWAS in the direction of multi marker models. A modified version of the Bayesian Information Criterion is developed for building a multi-locus model, which accounts for the differential correlation structure due to linkage disequilibrium and admixture linkage disequilibrium. Simulation studies and a real data example illustrate the advantages of this new approach compared to single-marker analysis and modern model selection strategies based on separately analyzing genotype and ancestry data, as well as to single-marker analysis combining genotypic and ancestry information. Depending on the signal strength our procedure automatically chooses whether genotypic or locus-specific ancestry markers are added to the model. This results in a good compromise between the power to detect causal mutations and the precision of their localization. The proposed method has been implemented in R and is available at http://www.math.uni.wroc.pl/~mbogdan/admixtures/.

Download Full-text

Genome-wide association studies in elite varieties of German winter barley using single-marker and haplotype-based methods

Plant Breeding ◽

10.1111/pbr.12237 ◽

2015 ◽

Vol 134 (1) ◽

pp. 28-39 ◽

Cited By ~ 21

Author(s):

Inka Gawenda ◽

Patrick Thorwarth ◽

Torsten Günther ◽

Frank Ordon ◽

Karl J. Schmid

Keyword(s):

Association Studies ◽

Winter Barley ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Single Marker

Download Full-text

Mixture model-based association analysis with case-control data in genome wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0022 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Fadhaa Ali ◽

Jian Zhang

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Hypothesis Test ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Based ◽

Genome Wide ◽

The Individual

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

A fast mrMLM algorithm for multi-locus genome-wide association studies

10.1101/341784 ◽

2018 ◽

Cited By ~ 23

Author(s):

Cox Lwaka Tamba ◽

Yuan-Ming Zhang

Keyword(s):

False Positive ◽

Statistical Power ◽

Association Studies ◽

False Positive Rate ◽

Real Data ◽

High Accuracy ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.

Download Full-text

Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.801113 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Zhian Yuan ◽

Yansu Wang ◽

Zhen Liang ◽

...

Keyword(s):

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Genome Wide Association ◽

Superior Performance ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

The Difference

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.

Download Full-text

Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies

Statistical Methods in Medical Research ◽

10.1177/0962280214551815 ◽

2014 ◽

Vol 26 (2) ◽

pp. 567-582 ◽

Cited By ~ 13

Author(s):

Zhongxue Chen ◽

Hon Keung Tony Ng ◽

Jing Li ◽

Qingzhong Liu ◽

Hanwen Huang

Keyword(s):

Single Nucleotide Polymorphisms ◽

X Chromosome ◽

Association Studies ◽

Statistical Tests ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide

In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.

Download Full-text

Controlling for background genetic effects using polygenic scores improves the power of genome-wide association studies

Scientific Reports ◽

10.1038/s41598-021-99031-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Declan Bennett ◽

Donal O’Shea ◽

John Ferguson ◽

Derek Morris ◽

Cathal Seoighe

Keyword(s):

Complex Traits ◽

Association Studies ◽

Real Data ◽

Genetic Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genotype Data ◽

Phenotype Prediction ◽

Genome Wide ◽

Polygenic Scores

AbstractOngoing increases in the size of human genotype and phenotype collections offer the promise of improved understanding of the genetics of complex diseases. In addition to the biological insights that can be gained from the nature of the variants that contribute to the genetic component of complex trait variability, these data bring forward the prospect of predicting complex traits and the risk of complex genetic diseases from genotype data. Here we show that advances in phenotype prediction can be applied to improve the power of genome-wide association studies. We demonstrate a simple and efficient method to model genetic background effects using polygenic scores derived from SNPs that are not on the same chromosome as the target SNP. Using simulated and real data we found that this can result in a substantial increase in the number of variants passing genome-wide significance thresholds. This increase in power to detect trait-associated variants also translates into an increase in the accuracy with which the resulting polygenic score predicts the phenotype from genotype data. Our results suggest that advances in methods for phenotype prediction can be exploited to improve the control of background genetic effects, leading to more accurate GWAS results and further improvements in phenotype prediction.

Download Full-text

Invited review: Genome-wide association analysis for quantitative traits in livestock – a selective review of statistical models and experimental designs

Archives Animal Breeding ◽

10.5194/aab-60-335-2017 ◽

2017 ◽

Vol 60 (3) ◽

pp. 335-346 ◽

Cited By ~ 17

Author(s):

Markus Schmid ◽

Jörn Bennewitz

Keyword(s):

Statistical Models ◽

Complex Traits ◽

Quantitative Traits ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Future Research ◽

Genome Wide Association Studies ◽

Livestock Breeding ◽

Genome Wide

Abstract. Quantitative or complex traits are controlled by many genes and environmental factors. Most traits in livestock breeding are quantitative traits. Mapping genes and causative mutations generating the genetic variance of these traits is still a very active area of research in livestock genetics. Since genome-wide and dense SNP panels are available for most livestock species, genome-wide association studies (GWASs) have become the method of choice in mapping experiments. Different statistical models are used for GWASs. We will review the frequently used single-marker models and additionally describe Bayesian multi-marker models. The importance of nonadditive genetic and genotype-by-environment effects along with GWAS methods to detect them will be briefly discussed. Different mapping populations are used and will also be reviewed. Whenever possible, our own real-data examples are included to illustrate the reviewed methods and designs. Future research directions including post-GWAS strategies are outlined.

Download Full-text

Novel Methods for Epistasis Detection in Genome-Wide Association Studies

10.1101/442749 ◽

2018 ◽

Cited By ~ 2

Author(s):

Lotfi Slim ◽

Clément Chatelain ◽

Chloé-Agathe Azencott ◽

Jean-Philippe Vert

Keyword(s):

Randomized Clinical Trials ◽

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

New Approach ◽

Pairwise Interactions ◽

Genome Wide ◽

Or Gene

More and more genome-wide association studies are being designed to uncover the full genetic basis of common diseases. Nonetheless, the resulting loci are often insufficient to fully recover the observed heritability. Epistasis, or gene-gene interaction, is one of many hypotheses put forward to explain this missing heritability. In the present work, we propose epiGWAS, a new approach for epistasis detection that identifies interactions between a target SNP and the rest of the genome. This contrasts with the classical strategy of epistasis detection through exhaustive pairwise SNP testing. We draw inspiration from causal inference in randomized clinical trials, which allows us to take into account linkage disequilibrium. EpiGWAS encompasses several methods, which we compare to state-of-the-art techniques for epistasis detection on simulated and real data. The promising results demonstrate empirically the benefits of EpiGWAS to identify pairwise interactions.

Download Full-text

EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations

10.1101/023457 ◽

2015 ◽

Cited By ~ 1

Author(s):

Guo-Bo Chen ◽

Sang Hong Lee ◽

Zhi-Xiang Zhu ◽

Beben Benyamin ◽

Matthew R Robinson

Keyword(s):

Association Studies ◽

Structured Populations ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Statistical Framework ◽

Snp Data ◽

Genome Wide ◽

Single Marker ◽

Marker Regression ◽

Value Decomposition

We apply the statistical framework for genome-wide association studies (GWAS) to eigenvector decomposition (EigenGWAS), which is commonly used in population genetics to characterise the structure of genetic data. We show that loci under selection can be detected in a structured population by using eigenvectors as phenotypes in a single-marker GWAS. We find LCT to be under selection between HapMap CEU-TSI cohorts, a finding that was replicated across European countries in the POPRES samples. HERC2 was also found to be differentiated between both the CEU-TSI cohort and among POPRES samples, reflecting the likely anthropological differences in skin and hair colour between northern and southern European populations. We show that when determining the effect of a SNP on an eigenvector, three methods of single-marker regression of eigenvectors, best linear unbiased prediction of eigenvectors, and singular value decomposition of SNP data are equivalent to each other. We also demonstrate that estimated SNP effects on eigenvectors from a reference panel can be used to predict eigenvectors (the projected eigenvectors) in a target sample with high accuracy, particularly for the primary eigenvectors. Under this GWAS framework, ancestry informative markers and loci under selection can be identified, and population structure can be captured and easily interpreted. We have developed freely available software to facilitate the application of the methods (https://github.com/gc5k/GEAR/wiki/EigenGWAS).

Download Full-text