An Ancestry Based Approach for Detecting Interactions

IAbstractBackground:Epistasis and gene-environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene-environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies.Results:In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals respectively, identifying nine interactions that were significant at p < 5×10−8, we show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for low p-values (p < 1.8×10−6).Conclusion:We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.

Download Full-text

Estimating the effective sample size in association studies of quantitative traits

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab057 ◽

2021 ◽

Author(s):

Andrey Ziyatdinov ◽

Jihye Kim ◽

Dmitry Prokopenko ◽

Florian Privé ◽

Fabien Laporte ◽

...

Keyword(s):

Statistical Power ◽

Quantitative Traits ◽

Mixed Model ◽

Association Studies ◽

Effective Sample Size ◽

Environment Interaction ◽

Uk Biobank ◽

Gene Environment Interaction ◽

Gene Environment ◽

The Uk

Abstract The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.

Download Full-text

Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits

Scientific Reports ◽

10.1038/s41598-021-86871-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chao-Yu Guo ◽

Reng-Hong Wang ◽

Hsin-Chou Yang

Keyword(s):

Complex Traits ◽

Association Studies ◽

Association Test ◽

Whole Genome Sequence ◽

Environment Interaction ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Sequence Kernel Association Test ◽

Gene Environment ◽

Family Based

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.

Download Full-text

Simulation and Sensitivity Analysis and Cross-Validation, Demonstrating the Utility of Genteract GxE Discovery methods

10.1101/2020.11.25.396861 ◽

2020 ◽

Author(s):

Brody Holohan ◽

Raphael Laderman

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Cross Validation ◽

Environmental Changes ◽

Disease Incidence ◽

Limiting Factor ◽

Individual Response ◽

Analytical Technique ◽

Gene Environment ◽

Study Results

AbstractGene-environment interactions are at the heart of why many complex traits are not fully heritable, and why prediction of disease incidence and individual response to environmental changes based on genetics has been underwhelming in utility. Understanding these interactions is the primary limiting factor for the application of personalized medicine, but current methods are not well suited for dealing with complex traits that pose both a dimensionality and sparse data problem to unsupervised analysis methods. Genteract has developed a proprietary analytical technique that allows for detection and interpretation of GxEs regarding specific pairs of a single phenotype with a single environmental factor; these methods allow us to develop a platform that can be used to predict how individuals will respond to changes in their environment based on their genetics. To validate the methods we performed two types of testing: cross-validation against a dataset of clinical study results, and application of the methods in a simulated dataset. These tests enable a greater understanding of the methods’ utility, statistical power and predictive capabilities.

Download Full-text

Powerful Tukey's One Degree-of-Freedom Test for Detecting Gene-Gene and Gene-Environment Interactions

Cancer Informatics ◽

10.4137/cin.s17305 ◽

2015 ◽

Vol 14s2 ◽

pp. CIN.S17305 ◽

Cited By ~ 1

Author(s):

Yaping Wang ◽

Donghui Li ◽

Peng Wei

Keyword(s):

Statistical Power ◽

Association Studies ◽

Score Test ◽

Principal Component ◽

Case Control ◽

Degree Of Freedom ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Missing Heritability ◽

Gene Environment

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.

Download Full-text

Transcriptome Analysis in Domesticated Species: Challenges and Strategies

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29334 ◽

2015 ◽

Vol 9S4 ◽

pp. BBI.S29334 ◽

Cited By ~ 4

Author(s):

Jessica P. Hekman ◽

Jennifer L Johnson ◽

Anna V. Kukekova

Keyword(s):

Complex Traits ◽

Gene Networks ◽

Association Studies ◽

Cultural Value ◽

Genomic Research ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Rna Seq ◽

Genome Wide ◽

Genome Assemblies

Domesticated species occupy a special place in the human world due to their economic and cultural value. In the era of genomic research, domesticated species provide unique advantages for investigation of diseases and complex phenotypes. RNA sequencing, or RNA-seq, has recently emerged as a new approach for studying transcriptional activity of the whole genome, changing the focus from individual genes to gene networks. RNA-seq analysis in domesticated species may complement genome-wide association studies of complex traits with economic importance or direct relevance to biomedical research. However, RNA-seq studies are more challenging in domesticated species than in model organisms. These challenges are at least in part associated with the lack of quality genome assemblies for some domesticated species and the absence of genome assemblies for others. In this review, we discuss strategies for analyzing RNA-seq data, focusing particularly on questions and examples relevant to domesticated species.

Download Full-text

Gamete simulation improves polygenic transmission disequilibrium analysis

10.1101/2020.10.26.355602 ◽

2020 ◽

Author(s):

Jiawen Chen ◽

Jing You ◽

Zijie Zhao ◽

Zheng Ni ◽

Kunling Huang ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Autism Spectrum ◽

Genetic Maps ◽

Risk Scores ◽

Parental Genotype ◽

Genome Wide Association Studies ◽

Transmission Disequilibrium ◽

Polygenic Risk

AbstractPolygenic risk scores (PRS) derived from summary statistics of genome-wide association studies (GWAS) have enjoyed great popularity in human genetics research. Applied to population cohorts, PRS can effectively stratify individuals by risk group and has promising applications in early diagnosis and clinical intervention. However, our understanding of within-family polygenic risk is incomplete, in part because the small samples per family significantly limits power. Here, to address this challenge, we introduce ORIGAMI, a computational framework that uses parental genotype data to simulate offspring genomes. ORIGAMI uses state-of-the-art genetic maps to simulate realistic recombination events on phased parental genomes and allows quantifying the prospective PRS variability within each family. We quantify and showcase the substantially reduced yet highly heterogeneous PRS variation within families for numerous complex traits. Further, we incorporate within-family PRS variability to improve polygenic transmission disequilibrium test (pTDT). Through simulations, we demonstrate that modeling within-family risk substantially improves the statistical power of pTDT. Applied to 7,805 trios of autism spectrum disorder (ASD) probands and healthy parents, we successfully replicated previously reported over-transmission of ASD, educational attainment, and schizophrenia risk, and identified multiple novel traits with significant transmission disequilibrium. These results provided novel etiologic insights into the shared genetic basis of various complex traits and ASD.

Download Full-text

Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast

10.1101/2021.09.08.459513 ◽

2021 ◽

Author(s):

Alex N. Nguyen Ba ◽

Katherine R. Lawrence ◽

Artur Rego-Costa ◽

Shreyas Gopalakrishnan ◽

Daniel Temko ◽

...

Keyword(s):

Quantitative Trait Locus ◽

Qtl Mapping ◽

Quantitative Trait ◽

Complex Traits ◽

Large Scale ◽

Genetic Basis ◽

Association Studies ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Trait Locus

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.

Download Full-text

Across-cohort QC analyses of genome-wide association study summary statistics from complex traits

10.1101/033787 ◽

2015 ◽

Author(s):

Guo-Bo Chen ◽

Sang Hong Lee ◽

Matthew R Robinson ◽

Maciej Trzaskowski ◽

Zhi-Xiang Zhu ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

False Negative ◽

Genome Wide Association ◽

Effect Sizes ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Unknown Sample ◽

Genome Wide

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

Download Full-text

Geographic Confounding in Genome-Wide Association Studies

10.21203/rs.3.rs-362358/v1 ◽

2021 ◽

Author(s):

Abdel Abdellaoui ◽

Karin Verweij ◽

Michel G Nivard

Keyword(s):

Educational Attainment ◽

Social Stratification ◽

Complex Traits ◽

Association Studies ◽

Genetic Correlations ◽

Geographic Region ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Gene Environment ◽

Genome Wide

Abstract Gene-environment correlations can bias associations between genetic variants and complex traits in genome-wide association studies (GWASs). Here, we control for geographic sources of gene-environment correlation in GWASs on 56 complex traits (N = 69,772–271,457). Controlling for geographic region significantly decreases heritability signals for SES-related traits, most strongly for educational attainment and income, indicating that socio-economic differences between regions induce gene-environment correlations that become part of the polygenic signal. For most other complex traits investigated, genetic correlations with educational attainment and income are significantly reduced, most significantly for traits related to BMI, sedentary behavior, and substance use. Controlling for current address has greater impact on the polygenic signal than birth place, suggesting both active and passive sources of gene-environment correlations. Our results show that societal sources of social stratification that extend beyond families introduce regional-level gene-environment correlations that affect GWAS results.

Download Full-text

Modeling epistasis in mice and yeast using the proportion of two or more distinct genetic backgrounds: evidence for “polygenic epistasis”

10.1101/555383 ◽

2019 ◽

Cited By ~ 1

Author(s):

Christoph D. Rau ◽

Natalia M. Gonzales ◽

Joshua S. Bloom ◽

Danny Park ◽

Julien Ayroles ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Statistical Tests ◽

Inbred Mice ◽

Statistical Test ◽

Genetic Ancestry ◽

Human Populations ◽

Epistatic Interactions ◽

Variable Effect ◽

The Impact

AbstractBackgroundThe majority of quantitative genetic models used to map complex traits assume that alleles have similar effects across all individuals. Significant evidence suggests, however, that epistatic interactions modulate the impact of many alleles. Nevertheless, identifying epistatic interactions remains computationally and statistically challenging. In this work, we address some of these challenges by developing a statistical test for polygenic epistasis that determines whether the effect of an allele is altered by the global genetic ancestry proportion from distinct progenitors.ResultsWe applied our method to data from mice and yeast. For the mice, we observed 49 significant genotype-by-ancestry interaction associations across 14 phenotypes as well as over 1,400 Bonferroni-corrected genotype-by-ancestry interaction associations for mouse gene expression data. For the yeast, we observed 92 significant genotype-by-ancestry interactions across 38 phenotypes. Given this evidence of epistasis, we test for and observe evidence of rapid selection pressure on ancestry specific polymorphisms within one of the cohorts, consistent with epistatic selection.ConclusionsUnlike our prior work in human populations, we observe widespread evidence of ancestry-modified SNP effects, perhaps reflecting the greater divergence present in crosses using mice and yeast.Author SummaryMany statistical tests which link genetic markers in the genome to differences in traits rely on the assumption that the same polymorphism will have identical effects in different individuals. However, there is substantial evidence indicating that this is not the case. Epistasis is the phenomenon in which multiple polymorphisms interact with one another to amplify or negate each other’s effects on a trait. We hypothesized that individual SNP effects could be changed in a polygenic manner, such that the proportion of as genetic ancestry, rather than specific markers, might be used to capture epistatic interactions. Motivated by this possibility, we develop a new statistical test that allowed us to examine the genome to identify polymorphisms which have different effects depending on the ancestral makeup of each individual. We use our test in two different populations of inbred mice and a yeast panel and demonstrate that these sorts of variable effect polymorphisms exist in 14 different physical traits in mice and 38 phenotypes in yeast as well as in murine gene expression. We use the term “polygenic epistasis” to distinguish these interactions from the more conventional two- or multi-locus interactions.

Download Full-text