scholarly journals Testing Rare-Variant Association without Calling Genotypes Allows for Systematic Differences in Sequencing between Cases and Controls

2015 ◽  
Author(s):  
Yi-Juan Hu ◽  
Peizhou Liao ◽  
Henry Richard Johnston ◽  
Andrew Allen ◽  
Glen Satten

Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, when testing the association between rare variants and traits of interest, the current practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths or on different platforms. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available.

2019 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Yaowu Liu ◽  
Jincheng Shen ◽  
Han Chen ◽  
...  

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.


2016 ◽  
Vol 98 ◽  
Author(s):  
YING ZHOU ◽  
YANGYANG CHENG ◽  
WENSHENG ZHU ◽  
QIAN ZHOU

SummaryMore and more rare genetic variants are being detected in the human genome, and it is believed that besides common variants, some rare variants also explain part of the phenotypic variance for human diseases. Due to the importance of rare variants, many statistical methods have been proposed to test for associations between rare variants and human traits. However, in existing studies, most methods only test for associations between multiple loci and one trait; therefore, the joint information of multiple traits has not been considered simultaneously and sufficiently. In this article, we present a study of testing for associations between rare variants and multiple traits, where trait value can be binary, ordinal, quantitative and/or any mixture of them. Based on the method of generalized Kendall's τ, a nonparametric method called NM-RV is proposed. A new kernel function for U-statistic, which could incorporate the information of each rare variant itself, is also presented and is expected to enhance the power of rare variant analysis. We further consider the asymptotic distribution of the proposed association test statistic. Our simulation work suggests that the proposed method is more powerful and robust than existing methods in testing for associations between rare variants and multiple traits, especially for multivariate ordinal traits.


2019 ◽  
Vol 101 ◽  
Author(s):  
Lifeng Liu ◽  
Pengfei Wang ◽  
Jingbo Meng ◽  
Lili Chen ◽  
Wensheng Zhu ◽  
...  

Abstract In recent years, there has been an increasing interest in detecting disease-related rare variants in sequencing studies. Numerous studies have shown that common variants can only explain a small proportion of the phenotypic variance for complex diseases. More and more evidence suggests that some of this missing heritability can be explained by rare variants. Considering the importance of rare variants, researchers have proposed a considerable number of methods for identifying the rare variants associated with complex diseases. Extensive research has been carried out on testing the association between rare variants and dichotomous, continuous or ordinal traits. So far, however, there has been little discussion about the case in which both genotypes and phenotypes are ordinal variables. This paper introduces a method based on the γ-statistic, called OV-RV, for examining disease-related rare variants when both genotypes and phenotypes are ordinal. At present, little is known about the asymptotic distribution of the γ-statistic when conducting association analyses for rare variants. One advantage of OV-RV is that it provides a robust estimation of the distribution of the γ-statistic by employing the permutation approach proposed by Fisher. We also perform extensive simulations to investigate the numerical performance of OV-RV under various model settings. The simulation results reveal that OV-RV is valid and efficient; namely, it controls the type I error approximately at the pre-specified significance level and achieves greater power at the same significance level. We also apply OV-RV for rare variant association studies of diastolic blood pressure.


2016 ◽  
Author(s):  
Hailiang Huang ◽  
Gina M. Peloso ◽  
Daniel Howrigan ◽  
Barbara Rakitsch ◽  
Carl Johann Simon-Gabriel ◽  
...  

AbstractRecent advances in genotyping and sequencing technologies have made detecting rare variants in large cohorts possible. Various analytic methods for associating disease to rare variants have been proposed, including burden tests, C-alpha and SKAT. Most of these methods, however, assume that samples come from a homogeneous population, which is not realistic for analyses of large samples. Not correcting for population stratification causes inflated p-values and false-positive associations. Here we propose a population-informed bootstrap resampling method that controls for population stratification (Bootstrat) in rare variant tests. In essence, the Bootstrat procedure uses genetic distance to create a phenotype probability for each sample. We show that this empirical approach can effectively correct for population stratification while maintaining statistical power comparable to established methods of controlling for population stratification. The Bootstrat scheme can be easily applied to existing rare variant testing methods with reasonable computational complexity.Author SummaryRecent technology advances have enabled large-scale analysis of rare variants, but properly testing rare variants remains a significant challenge as most rare variant testing methods assume a sample of homogenous ethnicity, an assumption often not true for large cohorts. Failure to account for this heterogeneity increases the type I error rate. Here we propose a bootstrap scheme applicable to most existing rare variant testing methods to control for population heterogeneity. This scheme uses a randomization layer to establish a null distribution of the test statistics while preserving the sample genetic relationships. The null distribution is then used to calculate an empirical p-value that accounts for population heterogeneity. We demonstrate how this scheme successfully controls the type I error rate without loss of statistical power.


2021 ◽  
pp. 096228022110082
Author(s):  
Yang Li ◽  
Wei Ma ◽  
Yichen Qin ◽  
Feifang Hu

Concerns have been expressed over the validity of statistical inference under covariate-adaptive randomization despite the extensive use in clinical trials. In the literature, the inferential properties under covariate-adaptive randomization have been mainly studied for continuous responses; in particular, it is well known that the usual two-sample t-test for treatment effect is typically conservative. This phenomenon of invalid tests has also been found for generalized linear models without adjusting for the covariates and are sometimes more worrisome due to inflated Type I error. The purpose of this study is to examine the unadjusted test for treatment effect under generalized linear models and covariate-adaptive randomization. For a large class of covariate-adaptive randomization methods, we obtain the asymptotic distribution of the test statistic under the null hypothesis and derive the conditions under which the test is conservative, valid, or anti-conservative. Several commonly used generalized linear models, such as logistic regression and Poisson regression, are discussed in detail. An adjustment method is also proposed to achieve a valid size based on the asymptotic results. Numerical studies confirm the theoretical findings and demonstrate the effectiveness of the proposed adjustment method.


Author(s):  
Zaheer Ahmed ◽  
Alberto Cassese ◽  
Gerard van Breukelen ◽  
Jan Schepers

AbstractWe present a novel method, REMAXINT, that captures the gist of two-way interaction in row by column (i.e., two-mode) data, with one observation per cell. REMAXINT is a probabilistic two-mode clustering model that yields two-mode partitions with maximal interaction between row and column clusters. For estimation of the parameters of REMAXINT, we maximize a conditional classification likelihood in which the random row (or column) main effects are conditioned out. For testing the null hypothesis of no interaction between row and column clusters, we propose a $$max-F$$ m a x - F test statistic and discuss its properties. We develop a Monte Carlo approach to obtain its sampling distribution under the null hypothesis. We evaluate the performance of the method through simulation studies. Specifically, for selected values of data size and (true) numbers of clusters, we obtain critical values of the $$max-F$$ m a x - F statistic, determine empirical Type I error rate of the proposed inferential procedure and study its power to reject the null hypothesis. Next, we show that the novel method is useful in a variety of applications by presenting two empirical case studies and end with some concluding remarks.


Biostatistics ◽  
2019 ◽  
Author(s):  
Jingchunzi Shi ◽  
Michael Boehnke ◽  
Seunggeun Lee

Summary Trans-ethnic meta-analysis is a powerful tool for detecting novel loci in genetic association studies. However, in the presence of heterogeneity among different populations, existing gene-/region-based rare variants meta-analysis methods may be unsatisfactory because they do not consider genetic similarity or dissimilarity among different populations. In response, we propose a score test under the modified random effects model for gene-/region-based rare variants associations. We adapt the kernel regression framework to construct the model and incorporate genetic similarities across populations into modeling the heterogeneity structure of the genetic effect coefficients. We use a resampling-based copula method to approximate asymptotic distribution of the test statistic, enabling efficient estimation of p-values. Simulation studies show that our proposed method controls type I error rates and increases power over existing approaches in the presence of heterogeneity. We illustrate our method by analyzing T2D-GENES consortium exome sequence data to explore rare variant associations with several traits.


2013 ◽  
Vol 52 (04) ◽  
pp. 351-359 ◽  
Author(s):  
M. O. Scheinhardt ◽  
A. Ziegler

Summary Background: Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. Objectives: In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O’Gorman, Can J Stat 1997; 25: 269 –279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267– 293; Szymczak et al., Stat Med 2013; 32: 524 – 537] for a wide range of distributions. Methods: We simulated two-sample scena -rios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. Results: All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test per -formed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. Conclusions: The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.


2018 ◽  
Vol 28 (9) ◽  
pp. 2868-2875
Author(s):  
Zhongxue Chen ◽  
Qingzhong Liu ◽  
Kai Wang

Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.


1982 ◽  
Vol 7 (3) ◽  
pp. 207-214 ◽  
Author(s):  
Jennifer J. Clinch ◽  
H. J. Keselman

The ANOVA, Welch, and Brown and Forsyth tests for mean equality were compared using Monte Carlo methods. The tests’ rates of Type I error and power were examined when populations were non-normal, variances were heterogeneous, and group sizes were unequal. The ANOVA F test was most affected by the assumption violations. The test proposed by Brown and Forsyth appeared, on the average, to be the “best” test statistic for testing an omnibus hypothesis of mean equality.


Sign in / Sign up

Export Citation Format

Share Document