Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy

Animals ◽

10.3390/ani8120239 ◽

2018 ◽

Vol 8 (12) ◽

pp. 239 ◽

Cited By ~ 4

Author(s):

Wengang Zhang ◽

Xue Gao ◽

Xinping Shi ◽

Bo Zhu ◽

Zezhao Wang ◽

...

Keyword(s):

Quantitative Trait ◽

Statistical Power ◽

Muscle Development ◽

Association Studies ◽

Simulated Data ◽

Principal Component ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Multiple Trait ◽

Gwas Analysis

Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.

Download Full-text

AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2015-0074 ◽

2016 ◽

Vol 15 (2) ◽

Cited By ~ 4

Author(s):

Mathieu Emily

Keyword(s):

Statistical Power ◽

Association Studies ◽

Case Control ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Biological Interpretation ◽

Limited Power ◽

Control Association

AbstractAmong the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained

Download Full-text

Investigation of gene–environment interactions in relation to tic severity

Journal of Neural Transmission ◽

10.1007/s00702-021-02396-y ◽

2021 ◽

Author(s):

Mohamed Abdulkadir ◽

Dongmei Yu ◽

Lisa Osiecki ◽

Robert A. King ◽

Thomas V. Fernandez ◽

...

Keyword(s):

Tourette Syndrome ◽

Association Studies ◽

Autism Spectrum ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Linear Regression Models ◽

Compulsive Disorder ◽

Gene Environment ◽

Tic Severity

AbstractTourette syndrome (TS) is a neuropsychiatric disorder with involvement of genetic and environmental factors. We investigated genetic loci previously implicated in Tourette syndrome and associated disorders in interaction with pre- and perinatal adversity in relation to tic severity using a case-only (N = 518) design. We assessed 98 single-nucleotide polymorphisms (SNPs) selected from (I) top SNPs from genome-wide association studies (GWASs) of TS; (II) top SNPs from GWASs of obsessive–compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD); (III) SNPs previously implicated in candidate-gene studies of TS; (IV) SNPs previously implicated in OCD or ASD; and (V) tagging SNPs in neurotransmitter-related candidate genes. Linear regression models were used to examine the main effects of the SNPs on tic severity, and the interaction effect of these SNPs with a cumulative pre- and perinatal adversity score. Replication was sought for SNPs that met the threshold of significance (after correcting for multiple testing) in a replication sample (N = 678). One SNP (rs7123010), previously implicated in a TS meta-analysis, was significantly related to higher tic severity. We found a gene–environment interaction for rs6539267, another top TS GWAS SNP. These findings were not independently replicated. Our study highlights the future potential of TS GWAS top hits in gene–environment studies.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

A nonparametric test for association with multiple loci in the retrospective case-control study

Statistical Methods in Medical Research ◽

10.1177/0962280219842892 ◽

2019 ◽

Vol 29 (2) ◽

pp. 589-602

Author(s):

Chan Wang ◽

Shufang Deng ◽

Leiming Sun ◽

Liming Li ◽

Yue-Qing Hu

Keyword(s):

Rare Variants ◽

Association Studies ◽

Nonparametric Test ◽

Case Control ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Retrospective Case ◽

Multiple Loci ◽

Common Diseases ◽

The Difference

The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance–covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.

Download Full-text

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies

Genes ◽

10.3390/genes9120608 ◽

2018 ◽

Vol 9 (12) ◽

pp. 608

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Alon Keinan

Keyword(s):

Correlation Coefficient ◽

Statistical Power ◽

Association Studies ◽

Gene Interaction ◽

P Value ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Real World Data ◽

Distance Correlation ◽

The Difference

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.

Download Full-text

Addressing the Missing Heritability Problem With the Help of Regulatory Features

Evolutionary Bioinformatics ◽

10.1177/1176934319860861 ◽

2019 ◽

Vol 15 ◽

pp. 117693431986086

Author(s):

Shan-Shan Dong ◽

Yan Guo ◽

Tie-Lin Yang

Keyword(s):

Target Genes ◽

Association Studies ◽

Complex Diseases ◽

Regulatory Elements ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Susceptibility Loci ◽

Missing Heritability ◽

Genome Wide ◽

Missing Heritability Problem

Genome-wide association studies (GWASs) have successfully identified thousands of susceptibility loci for human complex diseases. However, missing heritability is still a challenging problem. Considering most GWAS loci are located in regulatory elements, we recently developed a pipeline named functional disease-associated single-nucleotide polymorphisms (SNPs) prediction (FDSP), to predict novel susceptibility loci for complex diseases based on the interpretation of regulatory features and published GWAS results with machine learning. When applied to type 2 diabetes and hypertension, the predicted susceptibility loci by FDSP were proved to be capable of explaining additional heritability. In addition, potential target genes of the predicted positive SNPs were significantly enriched in disease-related pathways. Our results suggested that taking regulatory features into consideration might be a useful way to address the missing heritability problem. We hope FDSP could offer help for the identification of novel susceptibility loci for complex diseases.

Download Full-text

Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies

Frontiers in Genetics ◽

10.3389/fgene.2021.672304 ◽

2021 ◽

Vol 12 ◽

Author(s):

Felipe S. Kaibara ◽

Tânia K. de Araujo ◽

Patricia A. O. R. A. Araujo ◽

Marina K. M. Alvim ◽

Clarissa L. Yasuda ◽

...

Keyword(s):

Native Americans ◽

Statistical Power ◽

Association Studies ◽

Snp Array ◽

Absence Epilepsy ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Clonic Seizures ◽

Candidate Regions

Genetic generalized epilepsies (GGEs) include well-established epilepsy syndromes with generalized onset seizures: childhood absence epilepsy, juvenile myoclonic epilepsy (JME), juvenile absence epilepsy (JAE), myoclonic absence epilepsy, epilepsy with eyelid myoclonia (Jeavons syndrome), generalized tonic–clonic seizures, and generalized tonic–clonic seizures alone. Genome-wide association studies (GWASs) and exome sequencing have identified 48 single-nucleotide polymorphisms (SNPs) associated with GGE. However, these studies were mainly based on non-admixed, European, and Asian populations. Thus, it remains unclear whether these results apply to patients of other origins. This study aims to evaluate whether these previous results could be replicated in a cohort of admixed Brazilian patients with GGE. We obtained SNP-array data from 87 patients with GGE, compared with 340 controls from the BIPMed public dataset. We could directly access genotypes of 17 candidate SNPs, available in the SNP array, and the remaining 31 SNPs were imputed using the BEAGLE v5.1 software. We performed an association test by logistic regression analysis, including the first five principal components as covariates. Furthermore, to expand the analysis of the candidate regions, we also interrogated 14,047 SNPs that flank the candidate SNPs (1 Mb). The statistical power was evaluated in terms of odds ratio and minor allele frequency (MAF) by the genpwr package. Differences in SNP frequencies between Brazilian and Europeans, sub-Saharan African, and Native Americans were evaluated by a two-proportion Z-test. We identified nine flanking SNPs, located on eight candidate regions, which presented association signals that passed the Bonferroni correction (rs12726617; rs9428842; rs1915992; rs1464634; rs6459526; rs2510087; rs9551042; rs9888879; and rs8133217; p-values <3.55e–06). In addition, the two-proportion Z-test indicates that the lack of association of the remaining candidate SNPs could be due to different genomic backgrounds observed in admixed Brazilians. This is the first time that candidate SNPs for GGE are analyzed in an admixed Brazilian population, and we could successfully replicate the association signals in eight candidate regions. In addition, our results provide new insights on how we can account for population structure to improve risk stratification estimation in admixed individuals.

Download Full-text

CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS

10.1101/2021.11.05.467462 ◽

2021 ◽

Author(s):

Hector Roux de Bezieux ◽

Leandro Lima ◽

Fanny Perraudeau ◽

Arnaud Mary ◽

Sandrine Dudoit ◽

...

Keyword(s):

Statistical Power ◽

Association Studies ◽

Bacterial Species ◽

De Bruijn Graph ◽

Testable Hypothesis ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

A Genome ◽

De Bruijn ◽

Connected Subgraphs

Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on k-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through k-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic k-mers. These covariates are able to capture polymorphic genes as a single entity, improving k-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at https://github.com/HectorRDB/Caldera_Recomb.

Download Full-text

Genome-Phenome Linkages in Human Population Surveys, with Special Emphasis on the Health and Retirement Survey

Forum for Health Economics & Policy ◽

10.2202/1558-9544.1261 ◽

2011 ◽

Vol 14 (3) ◽

Cited By ~ 1

Author(s):

Burton Singer

Keyword(s):

Human Population ◽

Statistical Power ◽

Association Studies ◽

Large Population ◽

Causal Modeling ◽

Genome Wide Association Studies ◽

Population Surveys ◽

Entire Genome ◽

Gene Environment ◽

Genetic Contributions

We review a diversity of genome-wide association studies (GWAS) with particular emphasis on precision in specifying phenotypes. This implies that examination of any specific phenotype involves considering the likely genetic contributions to it from the entire genome. We consider a variety of phenotypes specifiable with data from the Health and Retirement Survey (HRS). However, evidence from other large population studies is also incorporated as part of the process of developing and refining pathway representations from the genome thru a hierarchy of intermediate endpoints to behavioral, cognitive, and economic phenotypes. Any causal modeling focused on genome-phenotype connections must, of necessity, include consideration of intermediate endpoints (endophenotypes) as mediators of such associations. We also discuss metabolic and gene expression consequences of gene-environment interactions as a next research step beyond GWAS, not only for HRS but also for an integrated set of human population surveys that can provide much more statistical power than any one of them used alone. A variety of concrete examples based on physiological, psychological, sociological, and economic outcomes are carried along throughout our discussion.

Download Full-text