Gene, Environment and Methylation (GEM): a tool suite to efficiently navigate large scale epigenome wide association studies and integrate genotype and interaction between genotype and environment

AbstractBackgroundThe identification of gene-gene and gene-environment interactions in genome-wide association studies is challenging due to the unknown nature of the interactions and the overwhelmingly large number of possible combinations. Classical logistic regression models are suitable to look for pre-defined interactions while more complex models, such as tree ensemble models, with the ability to detect any interactions have previously been difficult to interpret. However, with the development of methods for model explainability, it is now possible to interpret tree ensemble models with a strong theoretical ground and efficiently.ResultsWe propose a tree ensemble- and SHAP-based method for identifying as well as interpreting both gene-gene and gene-environment interactions on large-scale biobank data. A set of independent cross-validation runs are used to implicitly investigate the whole genome. We apply and evaluate the method using data from the UK Biobank with obesity as the phenotype. The results are in line with previous research on obesity as we identify top SNPs previously associated with obesity. We further demonstrate how to interpret and visualize interactions. The analysis suggests that the new method finds interactions between features that logistic regression models have difficulties in detecting.ConclusionsThe new method robustly detects interesting interactions, and can be applied to large-scale biobanks with high-dimensional data.

Download Full-text

Testing Gene-Environment Interaction in Large-Scale Case-Control Association Studies: Possible Choices and Comparisons

American Journal of Epidemiology ◽

10.1093/aje/kwr367 ◽

2011 ◽

Vol 175 (3) ◽

pp. 177-190 ◽

Cited By ~ 78

Author(s):

Bhramar Mukherjee ◽

Jaeil Ahn ◽

Stephen B. Gruber ◽

Nilanjan Chatterjee

Keyword(s):

Large Scale ◽

Association Studies ◽

Case Control ◽

Environment Interaction ◽

Gene Environment Interaction ◽

Gene Environment ◽

Control Association

Download Full-text

“Reports of My Death Were Greatly Exaggerated”: Behavior Genetics in the Postgenomic Era

Annual Review of Psychology ◽

10.1146/annurev-psych-052220-103822 ◽

2021 ◽

Vol 72 (1) ◽

pp. 37-60 ◽

Cited By ~ 1

Author(s):

K. Paige Harden

Keyword(s):

Large Scale ◽

Behavior Genetics ◽

Behavioral Genetics ◽

Association Studies ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

New Methods ◽

Gene Environment ◽

Meta Analyses ◽

And Behavior

Behavior genetics studies how genetic differences among people contribute to differences in their psychology and behavior. Here, I describe how the conclusions and methods of behavior genetics have evolved in the postgenomic era in which the human genome can be directly measured. First, I revisit the first law of behavioral genetics stating that everything is heritable, and I describe results from large-scale meta-analyses of twin data and new methods for estimating heritability using measured DNA. Second, I describe new methods in statistical genetics, including genome-wide association studies and polygenic score analyses. Third, I describe the next generation of work on gene × environment interaction, with a particular focus on how genetic influences vary across sociopolitical contexts and exogenous environments. Genomic technology has ushered in a golden age of new tools to address enduring questions about how genes and environments combine to create unique human lives.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab057 ◽

2021 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Statistical Power ◽

Quantitative Traits ◽

Mixed Model ◽

Association Studies ◽

Effective Sample Size ◽

Environment Interaction ◽

Uk Biobank ◽

Gene Environment Interaction ◽

Gene Environment ◽

The Uk

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Investigation of gene–environment interactions in relation to tic severity

Journal of Neural Transmission ◽

10.1007/s00702-021-02396-y ◽

2021 ◽

Author(s):

Mohamed Abdulkadir ◽

Dongmei Yu ◽

Lisa Osiecki ◽

Robert A. King ◽

Thomas V. Fernandez ◽

...

Keyword(s):

Tourette Syndrome ◽

Association Studies ◽

Autism Spectrum ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Linear Regression Models ◽

Compulsive Disorder ◽

Gene Environment ◽

Tic Severity

AbstractTourette syndrome (TS) is a neuropsychiatric disorder with involvement of genetic and environmental factors. We investigated genetic loci previously implicated in Tourette syndrome and associated disorders in interaction with pre- and perinatal adversity in relation to tic severity using a case-only (N = 518) design. We assessed 98 single-nucleotide polymorphisms (SNPs) selected from (I) top SNPs from genome-wide association studies (GWASs) of TS; (II) top SNPs from GWASs of obsessive–compulsive disorder (OCD), attention-deficit/hyperactivity disorder (ADHD), and autism spectrum disorder (ASD); (III) SNPs previously implicated in candidate-gene studies of TS; (IV) SNPs previously implicated in OCD or ASD; and (V) tagging SNPs in neurotransmitter-related candidate genes. Linear regression models were used to examine the main effects of the SNPs on tic severity, and the interaction effect of these SNPs with a cumulative pre- and perinatal adversity score. Replication was sought for SNPs that met the threshold of significance (after correcting for multiple testing) in a replication sample (N = 678). One SNP (rs7123010), previously implicated in a TS meta-analysis, was significantly related to higher tic severity. We found a gene–environment interaction for rs6539267, another top TS GWAS SNP. These findings were not independently replicated. Our study highlights the future potential of TS GWAS top hits in gene–environment studies.

Download Full-text

Optimized permutation testing for information theoretic measures of multi-gene interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04107-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

James M. Kunert-Graf ◽

Nikita A. Sakhanenko ◽

David J. Galas

Keyword(s):

Large Scale ◽

Permutation Test ◽

Association Studies ◽

Genome Wide Association Studies ◽

Permutation Testing ◽

Exact Test ◽

Information Theoretic ◽

Information Theoretic Measures ◽

Full Analysis ◽

Computational Bottleneck

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.

Download Full-text

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2016.08.012 ◽

2016 ◽

Vol 99 (4) ◽

pp. 791-801 ◽

Cited By ~ 48

Author(s):

Paul L. Auer ◽

Alex P. Reiner ◽

Gao Wang ◽

Hyun Min Kang ◽

Goncalo R. Abecasis ◽

...

Keyword(s):

Exome Sequencing ◽

Large Scale ◽

Association Studies ◽

Complex Trait ◽

Lessons Learned ◽

Sequencing Project ◽

Trait Association ◽

Exome Sequencing Project ◽

Scale Sequence

Download Full-text

Power comparison of Cochran-Armitage trend test against allelic and genotypic tests in large-scale case-control genetic association studies

Statistical Methods in Medical Research ◽

10.1177/0962280216683979 ◽

2016 ◽

Vol 27 (9) ◽

pp. 2657-2673 ◽

Cited By ~ 2

Author(s):

Mathieu Emily

Keyword(s):

Large Scale ◽

Association Studies ◽

Disease Model ◽

Trend Test ◽

Genome Wide Association Studies ◽

Power Functions ◽

Power Comparison ◽

Powerful Test ◽

Armitage Trend Test ◽

Mode Of Inheritance

The Cochran-Armitage trend test (CA) has become a standard procedure for association testing in large-scale genome-wide association studies (GWAS). However, when the disease model is unknown, there is no consensus on the most powerful test to be used between CA, allelic, and genotypic tests. In this article, we tackle the question of whether CA is best suited to single-locus scanning in GWAS and propose a power comparison of CA against allelic and genotypic tests. Our approach relies on the evaluation of the Taylor decompositions of non-centrality parameters, thus allowing an analytical comparison of the power functions of the tests. Compared to simulation-based comparison, our approach offers the advantage of simultaneously accounting for the multidimensionality of the set of features involved in power functions. Although power for CA depends on the sample size, the case-to-control ratio and the minor allelic frequency (MAF), our results first show that it is largely influenced by the mode of inheritance and a deviation from Hardy–Weinberg Equilibrium (HWE). Furthermore, when compared to other tests, CA is shown to be the most powerful test under a multiplicative disease model or when the single-nucleotide polymorphism largely deviates from HWE. In all other situations, CA lacks in power and differences can be substantial, especially for the recessive mode of inheritance. Finally, our results are illustrated by the comparison of the performances of the statistics in two genome scans.

Download Full-text

Sex differences in white adipose tissue expansion: emerging molecular mechanisms

Clinical Science ◽

10.1042/cs20210086 ◽

2021 ◽

Vol 135 (24) ◽

pp. 2691-2708

Author(s):

Simon T. Bond ◽

Anna C. Calkin ◽

Brian G. Drew

Keyword(s):

Sex Differences ◽

Sexual Dimorphism ◽

Gene Networks ◽

Large Scale ◽

Molecular Mechanisms ◽

Association Studies ◽

Tissue Expansion ◽

Economic Systems ◽

Genomic Association ◽

Males And Females

Abstract The escalating prevalence of individuals becoming overweight and obese is a rapidly rising global health problem, placing an enormous burden on health and economic systems worldwide. Whilst obesity has well described lifestyle drivers, there is also a significant and poorly understood component that is regulated by genetics. Furthermore, there is clear evidence for sexual dimorphism in obesity, where overall risk, degree, subtype and potential complications arising from obesity all differ between males and females. The molecular mechanisms that dictate these sex differences remain mostly uncharacterised. Many studies have demonstrated that this dimorphism is unable to be solely explained by changes in hormones and their nuclear receptors alone, and instead manifests from coordinated and highly regulated gene networks, both during development and throughout life. As we acquire more knowledge in this area from approaches such as large-scale genomic association studies, the more we appreciate the true complexity and heterogeneity of obesity. Nevertheless, over the past two decades, researchers have made enormous progress in this field, and some consistent and robust mechanisms continue to be established. In this review, we will discuss some of the proposed mechanisms underlying sexual dimorphism in obesity, and discuss some of the key regulators that influence this phenomenon.

Download Full-text