Testing gene-environment interactions for rare and/or common variants in sequencing association studies

AbstractThe risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Advanced next generation sequencing technology makes identification of gene-environment (GE) interactions for both common and rare variants possible. However, most existing methods focus on testing the main effects of common and/or rare genetic variants. There are limited methods developed to test the effects of GE interactions for rare variants only or rare and common variants simultaneously. In this study, we develop novel approaches to test the effects of GE interactions of rare and/or common risk, and/or protective variants in sequencing association studies. We propose two approaches: 1) testing the effects of an optimally weighted combination of GE interactions for rare variants (TOW-GE); 2) testing the effects of a weighted combination of GE interactions for both rare and common variants (variable weight TOW-GE, VW-TOW-GE). Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are GE interactions’ effects for rare risk and/or protective variants; VW-TOW-GE is more powerful when there are GE interactions’ effects for both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to the directions of effects of causal GE interactions. We demonstrate the applications of TOW-GE and VW-TOW-GE using an imputed data from the COPDGene Study.

Download Full-text

Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies

10.1101/710574 ◽

2019 ◽

Author(s):

Jianjun Zhang ◽

Qiuying Sha ◽

Han Hao ◽

Shuanglin Zhang ◽

Xiaoyi Raymond Gao ◽

...

Keyword(s):

Association Studies ◽

Complex Diseases ◽

Error Rates ◽

Phenotypic Trait ◽

Type I ◽

Environment Interaction ◽

Multiple Traits ◽

Gene Environment ◽

Variable Weight ◽

Protective Variants

AbstractThe risk of many complex diseases is determined by a complex interplay of genetic and environmental factors. Data on multiple traits is often collected for many complex diseases in order to obtain a better understanding of the diseases. Examination of gene-environment interactions (GxEs) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease associated genes. Most existing methods focus on testing gene-environment interaction (GxE) for a single trait. In this study, we develop novel approaches to test GxEs for multiple traits in sequencing association studies. We first perform transformation of multiple traits by using either principle component analysis or standardization analysis. Then, we detect the effect of GxE for each transferred phenotypic trait using novel proposed tests: testing the effect of an optimallyweighted combination of GxE (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ the Fisher’s combination test to combine the p-values of TOW-GE and/or VW-TOW-GE. Extensive simulation studies based on the Genetic Analysis Workshop 17 data show that the type I error rates of the proposed methods are well controlled. Compared to the existing interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common risk and protective variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal GxEs. Application to the COPDGene Study demonstrates that our proposed methods are very powerful.

Download Full-text

Testing an Optimally Weighted Combination of Common and/or Rare Variants with Multiple Traits

10.1101/281832 ◽

2018 ◽

Author(s):

Zhenchuan Wang ◽

Qiuying Sha ◽

Kui Zhang ◽

Shuanglin Zhang

Keyword(s):

Statistical Power ◽

Rare Variants ◽

Association Studies ◽

Error Rates ◽

Common Variant ◽

Joint Analysis ◽

Type I ◽

Multiple Traits ◽

Multiple Sequence ◽

Weighted Combination

AbstractJoint analysis of multiple traits has recently become popular since it can increase statistical power to detect genetic variants and there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single common variant. However, the variant-by-variant methods for common variant association studies may not be optimal for rare variant association studies due to the allelic heterogeneity as well as the extreme rarity of individual variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.

Download Full-text

Comparison of haplotype-based tests for detecting gene–environment interactions with rare variants

Briefings in Bioinformatics ◽

10.1093/bib/bbz031 ◽

2019 ◽

Vol 21 (3) ◽

pp. 851-862 ◽

Cited By ~ 1

Author(s):

Charalampos Papachristou ◽

Swati Biswas

Keyword(s):

Lung Cancer ◽

Rare Variants ◽

Practical Interest ◽

Serum Triglyceride ◽

Error Rates ◽

Type I ◽

Rare Haplotype ◽

Data Set ◽

Cancer Data ◽

Gene Environment

Abstract Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene–environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene–environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype–smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.

Download Full-text

Trans-ethnic meta-analysis of rare variants in sequencing association studies

Biostatistics ◽

10.1093/biostatistics/kxz061 ◽

2019 ◽

Author(s):

Jingchunzi Shi ◽

Michael Boehnke ◽

Seunggeun Lee

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Meta Analysis ◽

Error Rates ◽

Efficient Estimation ◽

Gene Region ◽

Type I ◽

Test Statistic ◽

Different Populations

Summary Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.

Download Full-text

Detecting association of rare and common variants by adaptive combination of P-values

Genetics Research ◽

10.1017/s0016672315000208 ◽

2015 ◽

Vol 97 ◽

Cited By ~ 2

Author(s):

YAJING ZHOU ◽

YONG WANG

Keyword(s):

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Common Variants ◽

Next Generation Sequencing Technology ◽

Adaptive Combination ◽

Genome Wide ◽

Wide Range ◽

Causal Variants ◽

Burden Tests

SummaryGenome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.

Download Full-text

An evaluation of approaches for rare variant association analyses of binary traits in related samples

Scientific Reports ◽

10.1038/s41598-021-82547-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ming-Huei Chen ◽

Achilleas Pitsillides ◽

Qiong Yang

Keyword(s):

Logistic Regression ◽

Rare Variants ◽

Association Studies ◽

Family Relationship ◽

Genetic Association Studies ◽

Error Rates ◽

Ratio Test ◽

Type I ◽

Association Analyses ◽

Binary Traits

AbstractRecognizing that family data provide unique advantage of identifying rare risk variants in genetic association studies, many cohorts with related samples have gone through whole genome sequencing in large initiatives such as the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Analyzing rare variants poses challenges for binary traits in that some genotype categories may have few or no observed events, causing bias and inflation in commonly used methods. Several methods have recently been proposed to better handle rare variants while accounting for family relationship, but their performances have not been thoroughly evaluated together. Here we compare several existing approaches including SAIGE but not limited to related samples using simulations based on the Framingham Heart Study samples and genotype data from Illumina HumanExome BeadChip where rare variants are the majority. We found that logistic regression with likelihood ratio test applied to related samples was the only approach that did not have inflated type I error rates in both single variant test (SVT) and gene-based tests, followed by Firth logistic regression that had inflation in its direction insensitive gene-based test at prevalence 0.01 only, applied to either related or unrelated samples, though theoretically logistic regression and Firth logistic regression do not account for relatedness in samples. SAIGE had inflation in SVT at prevalence 0.1 or lower and the inflation was eliminated with a minor allele count filter of 5. As for power, there was no approach that outperformed others consistently among all single variant tests and gene-based tests.

Download Full-text

A General Statistic to Test an Optimally Weighted Combination of Common and/or Rare Variants

10.1101/572115 ◽

2019 ◽

Author(s):

Jianjun Zhang ◽

Baolin Wu ◽

Qiuying Sha ◽

Shuanglin Zhang ◽

Xuexia Wang

Keyword(s):

Rare Variants ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Type I ◽

Genome Wide Association Studies ◽

Simulation Studies ◽

Genome Wide ◽

Combination Scheme ◽

Weighted Combination

AbstractBoth genome-wide association study and next generation sequencing data analyses are widely employed in order to identify disease susceptible common and/or rare genetic variants in many large scale genetic studies. Rare variants generally have large effects though they are hard to detect due to their low frequency. Currently, many existing statistical methods for rare variants association studies employ a weighted combination scheme, which usually puts subjective weights or suboptimal weights based on some ad hoc assumptions (e.g. ignoring dependence between rare variants). In this study, we analytically derive optimal weights for both common and rare variants and propose a General and novel approach to Test association between an Optimally Weighted combination of variants (G-TOW) in a gene or pathway for a continuous or dichotomous trait while easily adjusting for covariates. We conduct extensive simulation studies to evaluate the performance of G-TOW. Results of the simulation studies show that G-TOW has properly controlled type I error rates and it is the most powerful test among the methods we compared, when testing effects of either both rare and common variants or rare variants only. We also illustrate the effectiveness of G-TOW using the Genetic Analysis Workshop 17 (GAW17) data. In addition, we applied G-TOW and other competitive methods to test association for schizophrenia. The G-TOW have successfully verified genes FYN and VPS39 which are associated with schizophrenia reported in existing publications. Both of these genes are missed by the weighted sum statistic (WSS) and the sequence kernel association test (SKAT). G-TOW also showed much stronger significance (p-value=0.0037) than our previously developed method named Testing the effect of an Optimally Weighted combination of variants (TOW) (p-value=0.0143) on gene FYN. FYN is a member of the protein-tyrosine kinase oncogene family that phosphorylates glutamate metabotropic receptors and ionotropic N-methyl-d-aspartate (NMDA) receptors. NMDA modulates trafficking, subcellular distribution and function. It is involved in neuronal apoptosis, brain development and synaptic transmission and lower expression, which has been observed in the platelets of schizophrenic patients compared with controls. The application for schizophrenia indicates that G-TOW is a powerful tool in genome-wide association studies.

Download Full-text

A unified method for rare variant analysis of gene-environment interactions

10.1101/570226 ◽

2019 ◽

Author(s):

Elise Lim ◽

Han Chen ◽

Josée Dupuis ◽

Ching-Ti Liu

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Error Rates ◽

Advanced Technology ◽

Linear Mixed Effect Model ◽

Marginal Effect ◽

Type I ◽

Mixed Effect ◽

Gene Environment ◽

Unified Method

AbstractAdvanced technology in whole-genome sequencing has offered the opportunity to comprehensively investigate the genetic contribution, particularly rare variants, to complex traits. Many rare variants analysis methods have been developed to jointly model the marginal effect but methods to detect gene-environment (GE) interactions are underdeveloped. Identifying the modification effects of environmental factors on genetic risk poses a considerable challenge. To tackle this challenge, we develop a unified method to detect GE interactions of a set of rare variants using generalized linear mixed effect model. The proposed method can accommodate both binary and continuous traits in related or unrelated samples. Under this model, genetic main effects, sample relatedness and GE interactions are modeled as random effects. We adopt a kernel-based method to leverage the joint information across rare variants and implement variance component score tests to reduce the computational burden. Our simulation study shows that the proposed method maintains correct type I error rates and high power under various scenarios, such as differing the direction of main genotype and GE interaction effects and the proportion of causal variants in the model for both continuous and binary traits. We illustrate our method to test gene-based interaction with smoking on body mass index or overweight status in the Framingham Heart Study and replicate theCHRNB4gene association reported in previous large consortium meta-analysis of single nucleotide polymorphism (SNP)-smoking interaction. Our proposed set-based GE test is computationally efficient and is applicable to both binary and continuous phenotypes, while appropriately accounting for familial or cryptic relatedness.

Download Full-text

Abstract 367: Extreme High-Density Lipoprotein Cholesterol Genetics: An Assortment of Large and Small Polygenic Effects

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.367 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Jacqueline S Dron ◽

Jian Wang ◽

Cécile Low-Kam ◽

Sumeet A Khetarpal ◽

John F Robinson ◽

...

Keyword(s):

Large Scale ◽

Genetic Basis ◽

Rare Variants ◽

Association Studies ◽

Density Lipoprotein ◽

Copy Number Variations ◽

Genome Wide Association Studies ◽

Common Variants ◽

Targeted Next Generation Sequencing ◽

Common Genetic Variants

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.

Download Full-text

SMARCA2 common variant association and rare variant excess in Schizophrenia patients from an Algerian Trio Cohort

European Psychiatry ◽

10.1016/s0924-9338(11)73051-6 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 1346-1346

Author(s):

D. Benmessaoud ◽

A.-M. Lepagnol-Bestel ◽

M. Delepine ◽

J. Hager ◽

J.-M. Moalic ◽

...

Keyword(s):

Rare Variants ◽

Association Studies ◽

Common Variant ◽

Genome Wide Association Studies ◽

Common Variants ◽

Fisher Test ◽

Coding Regions ◽

Genome Wide ◽

Whole Exome ◽

Positive Evolution

Genome wide association studies (GWAS) of Schizophrenia (SZ) patients have identified common variants in ten genes including SMARCA2 (Koga et al., HMG, 2009). We found that the SZ-GWAS genes are part of an interacting network centered on SMARCA2 (Loe-Mie et al., HMG, 2010). Furthermore, SMARCA2 was found disrupted in SZ (Walsh et al., Science, 2008). SMARCA2 encodes the ATPase (BRM) of the SWI/SNF chromatin remodeling complex that is at the interface of genome and environmental adaptation.Taking advantage of an Algerian trio cohort of one hundred SZ patients (Benmessaoud et al., BMC Psychiatry, 2008), we replicated the association of SNP rs2296212 localized in exon 33, already shown associated in Koga study and resulting in D1546E amino acid change in the SMARCA2 protein. We studied SMARCA2 codons and found that exon 33 displays a signature of positive evolution in the primate lineage.Our working hypothesis is that the coding regions displaying positive selection are target of novel rare variants. To address this question, we sequenced two exons displaying positive evolution and one exon without evidence of positive evolution.We found (i) that rare variants are significantly in excess in SZ-patients compared to their parents (p = 0.038, Fisher test) and (ii) a higher proportion of rare variants in the primate-accelerated exons compared with the non-evolutionary exon in SZ-patients (p = 0.032, Fisher test).SMARCA2 exon sequencing and whole exome sequencing from patients harboring SNP rs2296212 common variant are under progress. Altogether, these results are expected to give new insights into the genetic architecture of SZ.

Download Full-text