scholarly journals Selection and explosive growth may hamper the performance of rare variant association tests

2015 ◽  
Author(s):  
Lawrence H. Uricchio ◽  
John S. Witte ◽  
Ryan D. Hernandez

Much recent debate has focused on the role of rare variants in complex phenotypes. However, it is well known that rare alleles can only contribute a substantial proportion of the phenotypic variance when they have much larger effect sizes than common variants, which is most easily explained by natural selection constraining trait-altering alleles to low frequency. It is also plausible that demographic events will influence the genetic architecture of complex traits. Unfortunately, most rare variant association tests do not explicitly model natural selection or non-equilibrium demography. Here, we develop a novel evolutionary model of complex traits. We perform numerical calculations and simulate phenotypes under this model using inferred human demographic and selection parameters. We show that rare variants only contribute substantially to complex traits under very strong assumptions about the relationship between effect size and selection strength. We then assess the performance of state-of-the-art rare variant tests using our simulations across a broad range of model parameters. Counterintuitively, we find that statistical power is lowest when rare variants make the greatest contribution to the additive variance, and that power is substantially lower under our model than previously studied models. While many empirical studies have attempted to identify causal loci using rare variant association methods, few have reported novel associations. Some authors have interpreted this to mean that rare variants contribute little to heritability, but our results show that an alternative explanation is that rare variant tests have less power than previously estimated.

2020 ◽  
Author(s):  
Hana Susak ◽  
Laura Serra-Saurina ◽  
Raquel Rabionet Janssen ◽  
Laura Domènech ◽  
Mattia Bosio ◽  
...  

AbstractRare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 75% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the ‘Rare Variant Genome Wide Association Study’ (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A.Data availability and implementationAll relevant data are within the manuscript and pipeline implementation on https://github.com/hanasusak/rvGWASAuthor summaryComplex diseases are characterized by being related to genetic factors and environmental factors such as air pollution, diet etc. that together define the susceptibility of each individual to develop a given disease. Much effort has been applied to advance the knowledge of the genetic bases of such diseases, specially in the discovery of frequent genetic variants in the population increasing disease risk. However, these variants usually explain a little part of the etiology of such diseases. Previous studies have shown that rare variants, i.e. variants present in less than 1% of the population, may explain the rest of the variability related to genetic aspects of the disease.Genome sequencing offers the opportunity to discover rare variants, but powerful statistical methods are needed to discriminate those variants that induce susceptibility to the disease. Here we have developed a powerful and flexible statistical approach for the detection of rare variants associated with a disease and we have integrated it into a computer tool that is easy and intuitive for the researchers and clinicians to use. We have shown that our approach outperformed other common statistical methods specially in a situation where these variants explain just a small part of the disease. The discovery of these rare variants will contribute to the knowledge of the molecular mechanism of complex diseases.


2019 ◽  
Vol 101 ◽  
Author(s):  
Lifeng Liu ◽  
Pengfei Wang ◽  
Jingbo Meng ◽  
Lili Chen ◽  
Wensheng Zhu ◽  
...  

Abstract In recent years, there has been an increasing interest in detecting disease-related rare variants in sequencing studies. Numerous studies have shown that common variants can only explain a small proportion of the phenotypic variance for complex diseases. More and more evidence suggests that some of this missing heritability can be explained by rare variants. Considering the importance of rare variants, researchers have proposed a considerable number of methods for identifying the rare variants associated with complex diseases. Extensive research has been carried out on testing the association between rare variants and dichotomous, continuous or ordinal traits. So far, however, there has been little discussion about the case in which both genotypes and phenotypes are ordinal variables. This paper introduces a method based on the γ-statistic, called OV-RV, for examining disease-related rare variants when both genotypes and phenotypes are ordinal. At present, little is known about the asymptotic distribution of the γ-statistic when conducting association analyses for rare variants. One advantage of OV-RV is that it provides a robust estimation of the distribution of the γ-statistic by employing the permutation approach proposed by Fisher. We also perform extensive simulations to investigate the numerical performance of OV-RV under various model settings. The simulation results reveal that OV-RV is valid and efficient; namely, it controls the type I error approximately at the pre-specified significance level and achieves greater power at the same significance level. We also apply OV-RV for rare variant association studies of diastolic blood pressure.


2021 ◽  
Vol 17 (2) ◽  
pp. e1007784
Author(s):  
Hana Susak ◽  
Laura Serra-Saurina ◽  
German Demidov ◽  
Raquel Rabionet ◽  
Laura Domènech ◽  
...  

Rare variants are thought to play an important role in the etiology of complex diseases and may explain a significant fraction of the missing heritability in genetic disease studies. Next-generation sequencing facilitates the association of rare variants in coding or regulatory regions with complex diseases in large cohorts at genome-wide scale. However, rare variant association studies (RVAS) still lack power when cohorts are small to medium-sized and if genetic variation explains a small fraction of phenotypic variance. Here we present a novel Bayesian rare variant Association Test using Integrated Nested Laplace Approximation (BATI). Unlike existing RVAS tests, BATI allows integration of individual or variant-specific features as covariates, while efficiently performing inference based on full model estimation. We demonstrate that BATI outperforms established RVAS methods on realistic, semi-synthetic whole-exome sequencing cohorts, especially when using meaningful biological context, such as functional annotation. We show that BATI achieves power above 70% in scenarios in which competing tests fail to identify risk genes, e.g. when risk variants in sum explain less than 0.5% of phenotypic variance. We have integrated BATI, together with five existing RVAS tests in the ‘Rare Variant Genome Wide Association Study’ (rvGWAS) framework for data analyzed by whole-exome or whole genome sequencing. rvGWAS supports rare variant association for genes or any other biological unit such as promoters, while allowing the analysis of essential functionalities like quality control or filtering. Applying rvGWAS to a Chronic Lymphocytic Leukemia study we identified eight candidate predisposition genes, including EHMT2 and COPS7A.


2021 ◽  
Author(s):  
Mary J. Emond ◽  
T.Eoin West

As genomic sequencing becomes more accurate and less costly, large cohorts and consortiums of cohorts are providing high power for rare variant association studies for many conditions.  When large sample sizes are not attainable and the phenotype under study is continuous, an extreme phenotypes design can provide high statistical power with a small to moderate sample size.   We extend the extreme phenotypes design to the dichotomous infectious disease outcome by sampling on extremes of the pathogenic exposure instead of sampling on extremes of phenotype.  We use a likelihood ratio test (LRT) to test the significance of association between infection status and presence of susceptibility rare variants.  More than 10 billion simulations are studied to assess the method.  The method results in high sample enrichment for rare variants affecting susceptibility.  Greater than 90% power to detect rare variant associations is attained in reasonable scenarios.  The ordinary case-control design requires orders of magnitude more samples to achieve the same power.  The Type I error rate of the LRT is accurate even for p-values < 10 -7 .  We find that erroroneous exposure assessment can lead to power loss more severe than excluding the observations with errors.   Nevertheless, careful sampling on exposure extremes can make a study feasible by providing adequate statistical power.  Limitations of this method are not unique to this design, and the power is never less than that of the ordinary case-control design.  The method applies without modification to other dichotomous outcomes that have strong association with a continuous covariate.


2019 ◽  
Author(s):  
Brent D. Davis ◽  
Jacqueline S. Dron ◽  
John F. Robinson ◽  
Robert A. Hegele ◽  
Dan J. Lizotte

AbstractRegion-based rare variant association analysis (RVAA) is a popular method to study rare genetic variation in large datasets, especially in the context of complex traits and diseases. Although this method shows great promise in increasing our understanding of the genetic architecture of complex phenotypes, performing a region-based RVAA can be challenging. The sequence kernel association test (SKAT) can be used to perform this analysis, but its inputs and modifiable parameters can be extremely overwhelming and may lead to results that are difficult to reproduce. We have developed a software package called “Exautomate” that contains the tools necessary to run a region-based RVAA using SKAT and is easy-to-use for any researcher, regardless of their previous bioinformatic experiences. In this report, we discuss the utilities of Exautomate and provide detailed examples of implementing our package. Importantly, we demonstrate a proof-of-principle analysis using a previously studied cohort of 313 familial hypercholesterolemia (FH) patients. Our results show an increased burden of rare variants in genes known to cause FH, thereby demonstrating a successful region-based RVAA using Exautomate. With our easy-to-use package, we hope researchers will be able to perform reproducible region-based RVAA to further our collective understanding behind the genetics of complex traits and diseases.


2015 ◽  
Vol 32 (4) ◽  
pp. 1080-1090 ◽  
Author(s):  
Ian H. Cheeseman ◽  
Marina McDew-White ◽  
Aung Pyae Phyo ◽  
Kanlaya Sriprawat ◽  
François Nosten ◽  
...  

2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Yoshiro Morimoto ◽  
Mihoko Shimada-Sugimoto ◽  
Takeshi Otowa ◽  
Shintaro Yoshida ◽  
Akira Kinoshita ◽  
...  

2021 ◽  
Author(s):  
Megan Null ◽  
Josée Dupuis ◽  
Christopher R. Gignoux ◽  
Audrey E. Hendricks

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.


2019 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Yaowu Liu ◽  
Jincheng Shen ◽  
Han Chen ◽  
...  

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.


2016 ◽  
Vol 98 ◽  
Author(s):  
YING ZHOU ◽  
YANGYANG CHENG ◽  
WENSHENG ZHU ◽  
QIAN ZHOU

SummaryMore and more rare genetic variants are being detected in the human genome, and it is believed that besides common variants, some rare variants also explain part of the phenotypic variance for human diseases. Due to the importance of rare variants, many statistical methods have been proposed to test for associations between rare variants and human traits. However, in existing studies, most methods only test for associations between multiple loci and one trait; therefore, the joint information of multiple traits has not been considered simultaneously and sufficiently. In this article, we present a study of testing for associations between rare variants and multiple traits, where trait value can be binary, ordinal, quantitative and/or any mixture of them. Based on the method of generalized Kendall's τ, a nonparametric method called NM-RV is proposed. A new kernel function for U-statistic, which could incorporate the information of each rare variant itself, is also presented and is expected to enhance the power of rare variant analysis. We further consider the asymptotic distribution of the proposed association test statistic. Our simulation work suggests that the proposed method is more powerful and robust than existing methods in testing for associations between rare variants and multiple traits, especially for multivariate ordinal traits.


Sign in / Sign up

Export Citation Format

Share Document