scholarly journals Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies

Author(s):  
Yingjie Guo ◽  
Chenxi Wu ◽  
Zhian Yuan ◽  
Yansu Wang ◽  
Zhen Liang ◽  
...  

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.

2021 ◽  
Vol 12 ◽  
Author(s):  
Yingjie Guo ◽  
Honghong Cheng ◽  
Zhian Yuan ◽  
Zhen Liang ◽  
Yang Wang ◽  
...  

Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.


2014 ◽  
Vol 26 (2) ◽  
pp. 567-582 ◽  
Author(s):  
Zhongxue Chen ◽  
Hon Keung Tony Ng ◽  
Jing Li ◽  
Qingzhong Liu ◽  
Hanwen Huang

In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.


2018 ◽  
Author(s):  
Lotfi Slim ◽  
Clément Chatelain ◽  
Chloé-Agathe Azencott ◽  
Jean-Philippe Vert

More and more genome-wide association studies are being designed to uncover the full genetic basis of common diseases. Nonetheless, the resulting loci are often insufficient to fully recover the observed heritability. Epistasis, or gene-gene interaction, is one of many hypotheses put forward to explain this missing heritability. In the present work, we propose epiGWAS, a new approach for epistasis detection that identifies interactions between a target SNP and the rest of the genome. This contrasts with the classical strategy of epistasis detection through exhaustive pairwise SNP testing. We draw inspiration from causal inference in randomized clinical trials, which allows us to take into account linkage disequilibrium. EpiGWAS encompasses several methods, which we compare to state-of-the-art techniques for epistasis detection on simulated and real data. The promising results demonstrate empirically the benefits of EpiGWAS to identify pairwise interactions.


Endocrinology ◽  
2016 ◽  
Vol 157 (8) ◽  
pp. 3002-3008 ◽  
Author(s):  
Kayla A. Boortz ◽  
Kristen E. Syring ◽  
Chunhua Dai ◽  
Lynley D. Pound ◽  
James K. Oeser ◽  
...  

The glucose-6-phosphatase catalytic 2 (G6PC2) gene is expressed specifically in pancreatic islet beta cells. Genome-wide association studies have shown that single nucleotide polymorphisms in the G6PC2 gene are associated with variations in fasting blood glucose (FBG) but not fasting plasma insulin. Molecular analyses examining the functional effects of these single nucleotide polymorphisms demonstrate that elevated G6PC2 expression is associated with elevated FBG. Studies in mice complement these genome-wide association data and show that deletion of the G6pc2 gene lowers FBG without affecting fasting plasma insulin. This suggests that, together with glucokinase, G6PC2 forms a substrate cycle that determines the glucose sensitivity of insulin secretion. Because genome-wide association studies and mouse studies demonstrate that elevated G6PC2 expression raises FBG and because chronically elevated FBG is detrimental to human health, increasing the risk of type 2 diabetes, it is unclear why G6PC2 evolved. We show here that the synthetic glucocorticoid dexamethasone strongly induces human G6PC2 promoter activity and endogenous G6PC2 expression in isolated human islets. Acute treatment with dexamethasone selectively induces endogenous G6pc2 expression in 129SvEv but not C57BL/6J mouse pancreas and isolated islets. The difference is due to a single nucleotide polymorphism in the C57BL/6J G6pc2 promoter that abolishes glucocorticoid receptor binding. In 6-hour fasted, nonstressed 129SvEv mice, deletion of G6pc2 lowers FBG. In response to the stress of repeated physical restraint, which is associated with elevated plasma glucocorticoid levels, G6pc2 gene expression is induced and the difference in FBG between wild-type and knockout mice is enhanced. These data suggest that G6PC2 may have evolved to modulate FBG in response to stress.


2018 ◽  
Vol 20 (6) ◽  
pp. 2217-2223
Author(s):  
Zhiyu Hao ◽  
Li Jiang ◽  
Jin Gao ◽  
Jinhua Ye ◽  
Jingli Zhao ◽  
...  

Abstract Standard normal statistics, chi-squared statistics, Student’s t statistics and F statistics are used to map quantitative trait nucleotides for both small and large sample sizes. In genome-wide association studies (GWASs) of single-nucleotide polymorphisms (SNPs), the statistical distributions depend on both genetic effects and SNPs but are independent of SNPs under the null hypothesis of no genetic effects. Therefore, hypothesis testing when a nuisance parameter is present only under the alternative was introduced to quickly approximate the critical thresholds of these test statistics for GWASs. When only the statistical probabilities are available for high-throughput SNPs, the approximate critical thresholds can be estimated with chi-squared statistics, formulated by statistical probabilities with a degree of freedom of two. High similarities in the critical thresholds between the accurate and approximate estimations were demonstrated by extensive simulations and real data analysis.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242927
Author(s):  
Lotfi Slim ◽  
Clément Chatelain ◽  
Chloé-Agathe Azencott ◽  
Jean-Philippe Vert

More and more genome-wide association studies are being designed to uncover the full genetic basis of common diseases. Nonetheless, the resulting loci are often insufficient to fully recover the observed heritability. Epistasis, or gene-gene interaction, is one of many hypotheses put forward to explain this missing heritability. In the present work, we propose epiGWAS, a new approach for epistasis detection that identifies interactions between a target SNP and the rest of the genome. This contrasts with the classical strategy of epistasis detection through exhaustive pairwise SNP testing. We draw inspiration from causal inference in randomized clinical trials, which allows us to take into account linkage disequilibrium. EpiGWAS encompasses several methods, which we compare to state-of-the-art techniques for epistasis detection on simulated and real data. The promising results demonstrate empirically the benefits of EpiGWAS to identify pairwise interactions.


Author(s):  
Ting-Hao Chen ◽  
Chen-Cheng Yang ◽  
Kuei-Hau Luo ◽  
Chia-Yen Dai ◽  
Yao-Chung Chuang ◽  
...  

Aluminum (Al) toxicity is related to renal failure and the failure of other systems. Although there were some genome-wide association studies (GWAS) in Australia and England, there were no GWAS about Han Chinese to our knowledge. Thus, this research focused on using whole genomic genotypes from the Taiwan Biobank for exploring the association between Al concentrations in plasma and renal function. Participants, who underwent questionnaire interviews, biomarkers, and genotyping, were from the Taiwan Biobank database. Then, we measured their plasma Al concentrations with ICP-MS in the laboratory at Kaohsiung Medical University. We used this data to link genome-wide association (GWA) tests while looking for candidate genes and associated plasma Al concentration to renal function. Furthermore, we examined the path relationship between Single Nucleotide Polymorphisms (SNPs), Al concentrations, and estimated glomerular filtration rates (eGFR) through the mediation analysis with 3000 replication bootstraps. Following the principles of GWAS, we focused on three SNPs within the dipeptidyl peptidase-like protein 6 (DPP6) gene in chromosome 7, rs10224371, rs2316242, and rs10268004, respectively. The results of the mediation analysis showed that all of the selected SNPs have indirectly affected eGFR through a mediation of Al concentrations. Our analysis revealed the association between DPP6 SNPs, plasma Al concentrations, and eGFR. However, further longitudinal studies and research on mechanism are in need. Our analysis was still be the first study that explored the association between the DPP6, SNPs, and Al in plasma affecting eGFR.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Guomin Zhang ◽  
Rongsheng Wang ◽  
Juntao Ma ◽  
Hongru Gao ◽  
Lingwei Deng ◽  
...  

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.


Genes ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 608
Author(s):  
Yingjie Guo ◽  
Chenxi Wu ◽  
Maozu Guo ◽  
Xiaoyan Liu ◽  
Alon Keinan

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.


Sign in / Sign up

Export Citation Format

Share Document