Novel methods for epistasis detection in genome-wide association studies

More and more genome-wide association studies are being designed to uncover the full genetic basis of common diseases. Nonetheless, the resulting loci are often insufficient to fully recover the observed heritability. Epistasis, or gene-gene interaction, is one of many hypotheses put forward to explain this missing heritability. In the present work, we propose epiGWAS, a new approach for epistasis detection that identifies interactions between a target SNP and the rest of the genome. This contrasts with the classical strategy of epistasis detection through exhaustive pairwise SNP testing. We draw inspiration from causal inference in randomized clinical trials, which allows us to take into account linkage disequilibrium. EpiGWAS encompasses several methods, which we compare to state-of-the-art techniques for epistasis detection on simulated and real data. The promising results demonstrate empirically the benefits of EpiGWAS to identify pairwise interactions.

Download Full-text

Novel Methods for Epistasis Detection in Genome-Wide Association Studies

10.1101/442749 ◽

2018 ◽

Cited By ~ 2

Author(s):

Lotfi Slim ◽

Clément Chatelain ◽

Chloé-Agathe Azencott ◽

Jean-Philippe Vert

Keyword(s):

Randomized Clinical Trials ◽

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

New Approach ◽

Pairwise Interactions ◽

Genome Wide ◽

Or Gene

Download Full-text

Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.801113 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Zhian Yuan ◽

Yansu Wang ◽

Zhen Liang ◽

...

Keyword(s):

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Genome Wide Association ◽

Superior Performance ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

The Difference

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.

Download Full-text

Testing Gene-Gene Interactions Based on a Neighborhood Perspective in Genome-wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2021.801261 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yingjie Guo ◽

Honghong Cheng ◽

Zhian Yuan ◽

Zhen Liang ◽

Yang Wang ◽

...

Keyword(s):

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Statistical Test ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Wide Range ◽

The Difference

Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.

Download Full-text

Use of the Multivariate Discriminant Analysis for Genome-Wide Association Studies in Cattle

Animals ◽

10.3390/ani10081300 ◽

2020 ◽

Vol 10 (8) ◽

pp. 1300 ◽

Cited By ~ 1

Author(s):

Elisabetta Manca ◽

Alberto Cesarani ◽

Giustino Gaspa ◽

Silvia Sorbolini ◽

Nicolò P.P. Macciotta ◽

...

Keyword(s):

Discriminant Analysis ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Stepwise Discriminant Analysis ◽

Genome Wide Association Studies ◽

Multivariate Method ◽

Genome Wide ◽

Single Marker ◽

Multivariate Gwas

Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.

Download Full-text

Identification of Critical Core Genes of Sarcoma Based on Centrality Analysis of Networks Nodes

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3080 ◽

2020 ◽

Vol 10 (7) ◽

pp. 1776-1784

Author(s):

Shudong Wang ◽

Jixiao Wang ◽

Xinzeng Wang ◽

Yuanyuan Zhang ◽

Tao Yi

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Complex Diseases ◽

Enrichment Analysis ◽

Gene Interaction ◽

Core Gene ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Set ◽

Genome Wide

Genome-wide association studies (GWAS) are powerful tools for identifying pathogenic genes of complex diseases and revealing genetic structure of diseases. However, due to gene-to-gene interactions, only a part of the hereditary factors can be revealed. The meta-analysis based on GWAS can integrate gene expression data at multiple levels and reveal the complex relationship between genes. Therefore, we used meta-analysis to integrate GWAS data of sarcoma to establish complex networks and discuss their significant genes. Firstly, we established gene interaction networks based on the data of different subtypes of sarcoma to analyze the node centralities of genes. Secondly, we calculated the significant score of each gene according to the Staged Significant Gene Network Algorithm (SSGNA). Then, we obtained the critical gene set HYC of sarcoma by ranking the scores, and then combined Gene Ontology enrichment analysis and protein network analysis to further screen it. Finally, the critical core gene set Hcore containing 47 genes was obtained and validated by GEPIA analysis. Our method has certain generalization performance to the study of complex diseases with prior knowledge and it is a useful supplement to genome-wide association studies.

Download Full-text

Mixture model-based association analysis with case-control data in genome wide association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0022 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Fadhaa Ali ◽

Jian Zhang

Keyword(s):

Mixture Model ◽

Multiple Testing ◽

Hypothesis Test ◽

Association Studies ◽

Real Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Model Based ◽

Genome Wide ◽

The Individual

AbstractMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.

Download Full-text

Incorporating biological knowledge in the search for gene × gene interaction in genome-wide association studies

BMC Proceedings ◽

10.1186/1753-6561-3-s7-s81 ◽

2009 ◽

Vol 3 (S7) ◽

Cited By ~ 3

Author(s):

Alisa K Manning ◽

Julius Suh Ngwa ◽

Audrey E Hendricks ◽

Ching-Ti Liu ◽

Andrew D Johnson ◽

...

Keyword(s):

Association Studies ◽

Gene Interaction ◽

Genome Wide Association ◽

Biological Knowledge ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

A fast mrMLM algorithm for multi-locus genome-wide association studies

10.1101/341784 ◽

2018 ◽

Cited By ~ 23

Author(s):

Cox Lwaka Tamba ◽

Yuan-Ming Zhang

Keyword(s):

False Positive ◽

Statistical Power ◽

Association Studies ◽

False Positive Rate ◽

Real Data ◽

High Accuracy ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.

Download Full-text

Addressing the Challenges of Detecting Epistasis in Genome-Wide Association Studies of Common Human Diseases Using Biological Expert Knowledge

Handbook of Research on Computational and Systems Biology ◽

10.4018/978-1-60960-491-2.ch006 ◽

2011 ◽

pp. 128-147

Author(s):

Kristine A. Pattin ◽

Jason H. Moore

Keyword(s):

Expert Knowledge ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Protein Protein Interaction ◽

Genome Wide ◽

Technological Developments ◽

Or Gene ◽

High Dimensional Datasets

Recent technological developments in the field of genetics have given rise to an abundance of research tools, such as genome-wide genotyping, that allow researchers to conduct genome-wide association studies (GWAS) for detecting genetic variants that confer increased or decreased susceptibility to disease. However, discovering epistatic, or gene-gene, interactions in high dimensional datasets is a problem due to the computational complexity that results from the analysis of all possible combinations of single-nucleotide polymorphisms (SNPs). A recently explored approach to this problem employs biological expert knowledge, such as pathway or protein-protein interaction information, to guide an analysis by the selection or weighting of SNPs based on this knowledge. Narrowing the evaluation to gene combinations that have been shown to interact experimentally provides a biologically concise reason why those two genes may be detected together statistically. This chapter discusses the challenges of discovering epistatic interactions in GWAS and how biological expert knowledge can be used to facilitate genome-wide genetic studies.

Download Full-text

Two-Stage Procedures for the Identification of Gene × Environment and Gene × Gene Interactions in Genome-Wide Association Studies

Statistical Approaches to Gene X Environment Interactions for Complex Phenotypes ◽

10.7551/mitpress/9780262034685.003.0002 ◽

2016 ◽

Author(s):

Charles Kooperberg ◽

James Y. Dai ◽

Li Hsu

Keyword(s):

Association Studies ◽

Gene Interaction ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Two Stage ◽

Genetic Studies ◽

Gene Environment ◽

Genome Wide ◽

Sequencing Studies ◽

Generation Sequencing

Genome-wide association studies and next generation sequencing studies offer us an unprecedented opportunity to study the genetic etiology of diseases and other traits. Over the last few years, many replicated associations between SNPs and traits have been published. It is of particular interest to identify how genes may interact with environmental factors and other genes. In this chapter, we show that a two-stage approach, where in the first stage SNPs are screened for their potential to be involved in interactions, and interactions are then tested only among SNPs that pass the screening can greatly enhance power for detecting gene-environment and gene-gene interaction in large genetic studies compared to the tests without screening.

Download Full-text