A unified method for rare variant analysis of gene-environment interactions

AbstractAdvanced technology in whole-genome sequencing has offered the opportunity to comprehensively investigate the genetic contribution, particularly rare variants, to complex traits. Many rare variants analysis methods have been developed to jointly model the marginal effect but methods to detect gene-environment (GE) interactions are underdeveloped. Identifying the modification effects of environmental factors on genetic risk poses a considerable challenge. To tackle this challenge, we develop a unified method to detect GE interactions of a set of rare variants using generalized linear mixed effect model. The proposed method can accommodate both binary and continuous traits in related or unrelated samples. Under this model, genetic main effects, sample relatedness and GE interactions are modeled as random effects. We adopt a kernel-based method to leverage the joint information across rare variants and implement variance component score tests to reduce the computational burden. Our simulation study shows that the proposed method maintains correct type I error rates and high power under various scenarios, such as differing the direction of main genotype and GE interaction effects and the proportion of causal variants in the model for both continuous and binary traits. We illustrate our method to test gene-based interaction with smoking on body mass index or overweight status in the Framingham Heart Study and replicate theCHRNB4gene association reported in previous large consortium meta-analysis of single nucleotide polymorphism (SNP)-smoking interaction. Our proposed set-based GE test is computationally efficient and is applicable to both binary and continuous phenotypes, while appropriately accounting for familial or cryptic relatedness.

Download Full-text

Comparison of haplotype-based tests for detecting gene–environment interactions with rare variants

Briefings in Bioinformatics ◽

10.1093/bib/bbz031 ◽

2019 ◽

Vol 21 (3) ◽

pp. 851-862 ◽

Cited By ~ 1

Author(s):

Charalampos Papachristou ◽

Swati Biswas

Keyword(s):

Lung Cancer ◽

Rare Variants ◽

Practical Interest ◽

Serum Triglyceride ◽

Error Rates ◽

Type I ◽

Rare Haplotype ◽

Data Set ◽

Cancer Data ◽

Gene Environment

Abstract Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene–environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene–environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype–smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.

Download Full-text

Scalable generalized linear mixed model for region-based association tests in large biobanks and cohorts

10.1101/583278 ◽

2019 ◽

Cited By ~ 5

Author(s):

Wei Zhou ◽

Zhangchen Zhao ◽

Jonas B. Nielsen ◽

Lars G. Fritsche ◽

Jonathon LeFaive ◽

...

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Rare Variants ◽

Error Rates ◽

Association Test ◽

Type I ◽

Sample Sizes ◽

Large Sample ◽

Genetic Components

AbstractWith very large sample sizes, population-based cohorts and biobanks provide an exciting opportunity to identify genetic components of complex traits. To analyze rare variants, gene or region-based multiple variant aggregate tests are commonly used to increase association test power. However, due to the substantial computation cost, existing region-based rare variant tests cannot analyze hundreds of thousands of samples while accounting for confounders, such as population stratification and sample relatedness. Here we propose a scalable generalized mixed model region-based association test that can handle large sample sizes and accounts for unbalanced case-control ratios for binary traits. This method, SAIGE-GENE, utilizes state-of-the-art optimization strategies to reduce computational and memory cost, and hence is applicable to exome-wide and genome-wide region-based analysis for hundreds of thousands of samples. Through the analysis of the HUNT study of 69,716 Norwegian samples and the UK Biobank data of 408,910 White British samples, we show that SAIGE-GENE can efficiently analyze large sample data (N > 400,000) with type I error rates well controlled.

Download Full-text

Testing gene-environment interactions for rare and/or common variants in sequencing association studies

10.1101/796540 ◽

2019 ◽

Author(s):

Zihan Zhao ◽

Jianjun Zhang ◽

Qiuying Sha ◽

Han Hao

Keyword(s):

Rare Variants ◽

Association Studies ◽

Error Rates ◽

Type I ◽

Common Variants ◽

Next Generation Sequencing Technology ◽

Gene Environment ◽

Variable Weight ◽

Protective Variants ◽

Weighted Combination

AbstractThe risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Advanced next generation sequencing technology makes identification of gene-environment (GE) interactions for both common and rare variants possible. However, most existing methods focus on testing the main effects of common and/or rare genetic variants. There are limited methods developed to test the effects of GE interactions for rare variants only or rare and common variants simultaneously. In this study, we develop novel approaches to test the effects of GE interactions of rare and/or common risk, and/or protective variants in sequencing association studies. We propose two approaches: 1) testing the effects of an optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing the effects of a weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, VW-TOW-GE). Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are GE interactions’ effects for rare risk and/or protective variants; VW-TOW-GE is more powerful when there are GE interactions’ effects for both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to the directions of effects of causal GE interactions. We demonstrate the applications of TOW-GE and VW-TOW-GE using an imputed data from the COPDGene Study.

Download Full-text

Sept8/SEPTIN8 involvement in cellular structure and kidney damage is identified by genetic mapping and a novel human tubule hypoxic model

Scientific Reports ◽

10.1038/s41598-021-81550-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregory R. Keele ◽

Jeremy W. Prokop ◽

Hong He ◽

Katie Holl ◽

John Littrell ◽

...

Keyword(s):

Complex Traits ◽

Genetic Model ◽

Association Studies ◽

Model Systems ◽

Linear Mixed Effect Model ◽

Genome Wide Association Studies ◽

Tubulointerstitial Injury ◽

Heritable Variation ◽

Mixed Effect

AbstractChronic kidney disease (CKD), which can ultimately progress to kidney failure, is influenced by genetics and the environment. Genes identified in human genome wide association studies (GWAS) explain only a small proportion of the heritable variation and lack functional validation, indicating the need for additional model systems. Outbred heterogeneous stock (HS) rats have been used for genetic fine-mapping of complex traits, but have not previously been used for CKD traits. We performed GWAS for urinary protein excretion (UPE) and CKD related serum biochemistries in 245 male HS rats. Quantitative trait loci (QTL) were identified using a linear mixed effect model that tested for association with imputed genotypes. Candidate genes were identified using bioinformatics tools and targeted RNAseq followed by testing in a novel in vitro model of human tubule, hypoxia-induced damage. We identified two QTL for UPE and five for serum biochemistries. Protein modeling identified a missense variant within Septin 8 (Sept8) as a candidate for UPE. Sept8/SEPTIN8 expression increased in HS rats with elevated UPE and tubulointerstitial injury and in the in vitro hypoxia model. SEPTIN8 is detected within proximal tubule cells in human kidney samples and localizes with acetyl-alpha tubulin in the culture system. After hypoxia, SEPTIN8 staining becomes diffuse and appears to relocalize with actin. These data suggest a role of SEPTIN8 in cellular organization and structure in response to environmental stress. This study demonstrates that integration of a rat genetic model with an environmentally induced tubule damage system identifies Sept8/SEPTIN8 and informs novel aspects of the complex gene by environmental interactions contributing to CKD risk.

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text

Trans-ethnic meta-analysis of rare variants in sequencing association studies

Biostatistics ◽

10.1093/biostatistics/kxz061 ◽

2019 ◽

Author(s):

Jingchunzi Shi ◽

Michael Boehnke ◽

Seunggeun Lee

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Meta Analysis ◽

Error Rates ◽

Efficient Estimation ◽

Gene Region ◽

Type I ◽

Test Statistic ◽

Different Populations

Summary Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.

Download Full-text

Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies

10.1101/552950 ◽

2019 ◽

Author(s):

Zilin Li ◽

Xihao Li ◽

Yaowu Liu ◽

Jincheng Shen ◽

Han Chen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variant ◽

Type I Error ◽

Rare Variants ◽

Error Rates ◽

Type I ◽

Whole Genome ◽

Rare Variant Association ◽

Dynamic Scan

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.

Download Full-text

Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies

10.1101/710574 ◽

2019 ◽

Author(s):

Jianjun Zhang ◽

Qiuying Sha ◽

Han Hao ◽

Shuanglin Zhang ◽

Xiaoyi Raymond Gao ◽

...

Keyword(s):

Association Studies ◽

Complex Diseases ◽

Error Rates ◽

Phenotypic Trait ◽

Type I ◽

Environment Interaction ◽

Multiple Traits ◽

Gene Environment ◽

Variable Weight ◽

Protective Variants

AbstractThe risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Data on multiple traits is often collected for many complex diseases in order to obtain a better understanding of the diseases. Examination of gene-environment interactions (GxEs) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease associated genes. Most existing methods focus on testing gene-environment interaction (GxE) for a single trait. In this study, we develop novel approaches to test GxEs for multiple traits in sequencing association studies. We first perform transformation of multiple traits by using either principle component analysis or standardization analysis. Then, we detect the effect of GxE for each transferred phenotypic trait using novel proposed tests: testing the effect of an optimallyweighted combination of GxE (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ the Fisher’s combination test to combine the p-values of TOW-GE and/or VW-TOW-GE. Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal GxEs. Application to the COPDGene Study demonstrates that our proposed methods are very powerful.

Download Full-text

On Robust Association Testing for Quantitative Traits and Rare Variants

G3 Genes|Genome|Genetics ◽

10.1534/g3.116.035485 ◽

2016 ◽

Vol 6 (12) ◽

pp. 3941-3950 ◽

Cited By ~ 9

Author(s):

Peng Wei ◽

Ying Cao ◽

Yiwei Zhang ◽

Zhiyuan Xu ◽

Il-Youp Kwak ◽

...

Keyword(s):

Type I Error ◽

Rare Variants ◽

Error Rates ◽

Routine Practice ◽

Type I ◽

Sequencing Data ◽

High Data ◽

Association Testing ◽

Sequencing Technologies ◽

Inflated Type

Abstract With the advance of sequencing technologies, it has become a routine practice to test for association between a quantitative trait and a set of rare variants (RVs). While a number of RV association tests have been proposed, there is a dearth of studies on the robustness of RV association testing for nonnormal distributed traits, e.g., due to skewness, which is ubiquitous in cohort studies. By extensive simulations, we demonstrate that commonly used RV tests, including sequence kernel association test (SKAT) and optimal unified SKAT (SKAT-O), are not robust to heavy-tailed or right-skewed trait distributions with inflated type I error rates; in contrast, the adaptive sum of powered score (aSPU) test is much more robust. Here we further propose a robust version of the aSPU test, called aSPUr. We conduct extensive simulations to evaluate the power of the tests, finding that for a larger number of RVs, aSPU is often more powerful than SKAT and SKAT-O, owing to its high data-adaptivity. We also compare different tests by conducting association analysis of triglyceride levels using the NHLBI ESP whole-exome sequencing data. The QQ plots for SKAT and SKAT-O were severely inflated (λ = 1.89 and 1.78, respectively), while those for aSPU and aSPUr behaved normally. Due to its relatively high robustness to outliers and high power of the aSPU test, we recommend its use complementary to SKAT and SKAT-O. If there is evidence of inflated type I error rate from the aSPU test, we would recommend the use of the more robust, but less powerful, aSPUr test.

Download Full-text

An evaluation of approaches for rare variant association analyses of binary traits in related samples

Scientific Reports ◽

10.1038/s41598-021-82547-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ming-Huei Chen ◽

Achilleas Pitsillides ◽

Qiong Yang

Keyword(s):

Logistic Regression ◽

Rare Variants ◽

Association Studies ◽

Family Relationship ◽

Genetic Association Studies ◽

Error Rates ◽

Ratio Test ◽

Type I ◽

Association Analyses ◽

Binary Traits

AbstractRecognizing that family data provide unique advantage of identifying rare risk variants in genetic association studies, many cohorts with related samples have gone through whole genome sequencing in large initiatives such as the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Analyzing rare variants poses challenges for binary traits in that some genotype categories may have few or no observed events, causing bias and inflation in commonly used methods. Several methods have recently been proposed to better handle rare variants while accounting for family relationship, but their performances have not been thoroughly evaluated together. Here we compare several existing approaches including SAIGE but not limited to related samples using simulations based on the Framingham Heart Study samples and genotype data from Illumina HumanExome BeadChip where rare variants are the majority. We found that logistic regression with likelihood ratio test applied to related samples was the only approach that did not have inflated type I error rates in both single variant test (SVT) and gene-based tests, followed by Firth logistic regression that had inflation in its direction insensitive gene-based test at prevalence 0.01 only, applied to either related or unrelated samples, though theoretically logistic regression and Firth logistic regression do not account for relatedness in samples. SAIGE had inflation in SVT at prevalence 0.1 or lower and the inflation was eliminated with a minor allele count filter of 5. As for power, there was no approach that outperformed others consistently among all single variant tests and gene-based tests.

Download Full-text