scholarly journals Multi-SKAT: General framework to test multiple phenotype associations of rare variants

2017 ◽  
Author(s):  
Diptavo Dutta ◽  
Laura Scott ◽  
Michael Boehnke ◽  
Seunggeun Lee

In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait-associated variants within genes or regions of interest. Existing multi-phenotype tests for rare variants make specific assumptions about the patterns of association of underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here we develop a general framework for testing pleiotropic effects of rare variants based on multivariate kernel regression (Multi-SKAT). Multi-SKAT models effect sizes of variants on the phenotypes through a kernel matrix and performs a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi-SKAT framework. To increase power to detect association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed p-value across tests. To account for related individuals, our framework uses a random effects for the kinship matrix. Using simulated data and amino acid and exome-array data from the METSIM study, we show that Multi-SKAT can improve power over single-phenotype SKAT-O test and existing multiple phenotype tests, while maintaining type I error rate.


2019 ◽  
Author(s):  
Diptavo Dutta ◽  
Sarah A. Gagliano Taliun ◽  
Joshua S. Weinstock ◽  
Matthew Zawistowski ◽  
Carlo Sidore ◽  
...  

AbstractThe power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes and heterogeneity across studies. Additionally, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain type-I error rate at exome-wide level of 2.5×10−6. Further simulations under different models of association show that Meta-MultiSKAT can improve power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.



2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Farhad Hormozdiari ◽  
Junghyun Jung ◽  
Eleazar Eskin ◽  
Jong Wha J. Joo

AbstractIn standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.



Author(s):  
Luan L. Lee ◽  
Miguel G. Lizarraga ◽  
Natanael R. Gomes ◽  
Alessandro L. Koerich

This paper describes a prototype for Brazilian bankcheck recognition. The description is divided into three topics: bankcheck information extraction, digit amount recognition and signature verification. In bankcheck information extraction, our algorithms provide signature and digit amount images free of background patterns and bankcheck printed information. In digit amount recognition, we dealt with the digit amount segmentation and implementation of a complete numeral character recognition system involving image processing, feature extraction and neural classification. In signature verification, we designed and implemented a static signature verification system suitable for banking and commercial applications. Our signature verification algorithm is capable of detecting both simple, random and skilled forgeries. The proposed automatic bankcheck recognition prototype was intensively tested by real bankcheck data as well as simulated data providing the following performance results: for skilled forgeries, 4.7% equal error rate; for random forgeries, zero Type I error and 7.3% Type II error; for bankcheck numerals, 92.7% correct recognition rate.



2017 ◽  
Author(s):  
František Váša ◽  
Edward T. Bullmore ◽  
Ameera X. Patel

AbstractFunctional connectomes are commonly analysed as sparse graphs, constructed by thresholding cross-correlations between regional neurophysiological signals. Thresholding generally retains the strongest edges (correlations), either by retaining edges surpassing a given absolute weight, or by constraining the edge density. The latter (more widely used) method risks inclusion of false positive edges at high edge densities and exclusion of true positive edges at low edge densities. Here we apply new wavelet-based methods, which enable construction of probabilistically-thresholded graphs controlled for type I error, to a dataset of resting-state fMRI scans of 56 patients with schizophrenia and 71 healthy controls. By thresholding connectomes to fixed edge-specific P value, we found that functional connectomes of patients with schizophrenia were more dysconnected than those of healthy controls, exhibiting a lower edge density and a higher number of (dis)connected components. Furthermore, many participants’ connectomes could not be built up to the fixed edge densities commonly studied in the literature (~5-30%), while controlling for type I error. Additionally, we showed that the topological randomisation previously reported in the schizophrenia literature is likely attributable to “non-significant” edges added when thresholding connectomes to fixed density based on correlation. Finally, by explicitly comparing connectomes thresholded by increasing P value and decreasing correlation, we showed that probabilistically thresholded connectomes show decreased randomness and increased consistency across participants. Our results have implications for future analysis of functional connectivity using graph theory, especially within datasets exhibiting heterogenous distributions of edge weights (correlations), between groups or across participants.



2018 ◽  
Vol 28 (9) ◽  
pp. 2868-2875
Author(s):  
Zhongxue Chen ◽  
Qingzhong Liu ◽  
Kai Wang

Several gene- or set-based association tests have been proposed recently in the literature. Powerful statistical approaches are still highly desirable in this area. In this paper we propose a novel statistical association test, which uses information of the burden component and its complement from the genotypes. This new test statistic has a simple null distribution, which is a special and simplified variance-gamma distribution, and its p-value can be easily calculated. Through a comprehensive simulation study, we show that the new test can control type I error rate and has superior detecting power compared with some popular existing methods. We also apply the new approach to a real data set; the results demonstrate that this test is promising.



1996 ◽  
Vol 1 (1) ◽  
pp. 25-28 ◽  
Author(s):  
Martin A. Weinstock

Background: Accurate understanding of certain basic statistical terms and principles is key to critical appraisal of published literature. Objective: This review describes type I error, type II error, null hypothesis, p value, statistical significance, a, two-tailed and one-tailed tests, effect size, alternate hypothesis, statistical power, β, publication bias, confidence interval, standard error, and standard deviation, while including examples from reports of dermatologic studies. Conclusion: The application of the results of published studies to individual patients should be informed by an understanding of certain basic statistical concepts.



Author(s):  
Abhaya Indrayan

Background: Small P-values have been conventionally considered as evidence to reject a null hypothesis in empirical studies. However, there is widespread criticism of P-values now and the threshold we use for statistical significance is questioned.Methods: This communication is on contrarian view and explains why P-value and its threshold are still useful for ruling out sampling fluctuation as a source of the findings.Results: The problem is not with P-values themselves but it is with their misuse, abuse, and over-use, including the dominant role they have assumed in empirical results. False results may be mostly because of errors in design, invalid data, inadequate analysis, inappropriate interpretation, accumulation of Type-I error, and selective reporting, and not because of P-values per se.Conclusion: A threshold of P-values such as 0.05 for statistical significance is helpful in making a binary inference for practical application of the result. However, a lower threshold can be suggested to reduce the chance of false results. Also, the emphasis should be on detecting a medically significant effect and not zero effect.



2018 ◽  
Vol 28 (5) ◽  
pp. 1508-1522 ◽  
Author(s):  
Qianya Qi ◽  
Li Yan ◽  
Lili Tian

In testing differentially expressed genes between tumor and healthy tissues, data are usually collected in paired form. However, incomplete paired data often occur. While extensive statistical researches exist for paired data with incompleteness in both arms, hardly any recent work can be found on paired data with incompleteness in single arm. This paper aims to fill this gap by proposing some new methods, namely, P-value pooling methods and a nonparametric combination test. Simulation studies are conducted to investigate the performance of the proposed methods in terms of type I error and power at small to moderate sample sizes. A real data set from The Cancer Genome Atlas (TCGA) breast cancer study is analyzed using the proposed methods.



2021 ◽  
Author(s):  
Megan Null ◽  
Josée Dupuis ◽  
Christopher R. Gignoux ◽  
Audrey E. Hendricks

AbstractIdentification of rare variant associations is crucial to fully characterize the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirrors the distribution of rare variants and haplotype structure in real data. Additionally, importing real variant annotation enables in silico comparison of methods that focus on putative causal variants, such as rare variant association tests, and polygenic scoring methods. Existing simulation methods are either unable to employ real variant annotation or severely under- or over-estimate the number of singletons and doubletons reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real variant annotations. We highlight RAREsim’s utility across various genetic regions, sample sizes, ancestries, and variant classes.



2019 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Yaowu Liu ◽  
Jincheng Shen ◽  
Han Chen ◽  
...  

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.



Sign in / Sign up

Export Citation Format

Share Document