Interpretable network-guided epistasis detection

AbstractDetecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies (GWAIS) involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions, while keeping type I error controlled. Yet, mapping gene-interactions into testable SNP-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial. Here we compare three SNP-gene mappings (positional overlap, eQTL and proximity in 3D structure) and used the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a GWAS inflammatory bowel disease (IBD) dataset. Different configurations produced different results, highlighting that various mechanisms are implicated in IBD, while at the same time, results overlapped with known disease biology. Importantly, the proposed pipeline also differs from a conventional approach were no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.080030 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant ◽

Alexander F. Wilson

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

Multi-trait genome-wide analyses of the brain imaging phenotypes in UK Biobank

10.1101/758326 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chong Wu

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Type I ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Type I Error Rates ◽

Trait Association ◽

Genome Wide ◽

Inflation Factor

AbstractMany genetic variants identified in genome-wide association studies (GWAS) are associated with multiple, sometimes seemingly unrelated traits. This motivates multi-trait association analyses, which have successfully identified novel associated loci for many complex diseases. While appealing, most existing methods focus on analyzing a relatively small number of traits and may yield inflated Type I error rates when a large number of traits need to be analyzed jointly. As deep phenotyping data are becoming rapidly available, we develop a novel method, referred to as aMAT (adaptive multi-trait association test), for multi-trait analysis of any number of traits. We applied aMAT to GWAS summary statistics for a set of 58 volumetric imaging derived phenotypes from the UK Biobank. aMAT had a genomic inflation factor of 1.04, indicating the Type I error rates were well controlled. More important, aMAT identified 24 distinct risk loci, 13 of which were ignored by standard GWAS. In comparison, the competing methods either had a suspicious genomic inflation factor or identified much fewer risk loci. Finally, four additional sets of traits have been analyzed and provided similar conclusions.

Download Full-text

QCAT: testing causality of variants using only summary association statistics

10.1101/072355 ◽

2016 ◽

Author(s):

Donghyung Lee ◽

T. Bernard Bigdeli ◽

Vladimir I. Vladimirov ◽

Ayman H. Fanous ◽

Silviu-Alin Bacanu

Keyword(s):

Fine Mapping ◽

Type I Error ◽

Association Studies ◽

High Linkage Disequilibrium ◽

Causal Snps ◽

Type I ◽

Genome Wide ◽

Multiple Regions ◽

Z Scores ◽

Causal Variants

ABSTRACTGenome-wide and, very soon, sequencing association studies, might yield multiple regions harbouring interesting association signals. Given that each region encompasses numerous variants in high linkage disequilibrium, it is not clear which are i) truly causal or ii) just reasonably close to the causal ones. Researchers proposed many methods to predict, albeit not test, the causal SNPs in a region, a process commonly denoted as fine-mapping. Unfortunately, all existing fine-mapping methods output posterior causality probabilities assuming that causal SNPs are among those already measured in the study, or have been catalogued elsewhere. However, due to technological and computational obstacles in calling many types of genetic variants, such assumption is not realistic. We propose a novel method/software, denoted as Quasi-CAausality Test (QCAT), for testing (not just predicting) the causality of any catalogued genetic variant. QCAT i) makes no assumption that causal variants are among catalogued variants, and ii) makes use of easily available summary statistics from genetic studies, e.g. variant association Z-scores, to make statistical inferences. The proposed statistical test controls the type I error at or below the desired level. Its practical application to well-known smoking association signals provide some insightful results. The QCAT software is publically available at: http://dleelab.github.io/qcat/

Download Full-text

The effect of different sets of critical values on type I error rates in tiled regression for genome-wide association studies

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2016.10000871 ◽

2016 ◽

Vol 16 (2) ◽

pp. 111

Author(s):

Alexander F. Wilson ◽

Heejong Sung ◽

Jeremy A. Sabourin ◽

Alexa J.M. Sorant

Keyword(s):

Type I Error ◽

Association Studies ◽

Error Rates ◽

Critical Values ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

Download Full-text

ComPaSS-GWAS: A method to reduce type I error in genome-wide association studies when replication data are not available

Genetic Epidemiology ◽

10.1002/gepi.22168 ◽

2018 ◽

Vol 43 (1) ◽

pp. 102-111 ◽

Cited By ~ 4

Author(s):

Jeremy A. Sabourin ◽

Cheryl D. Cropp ◽

Heejong Sung ◽

Lawrence C. Brody ◽

Joan E. Bailey-Wilson ◽

...

Keyword(s):

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Multiple-Tissue Integrative Transcriptome-Wide Association Studies Discovered New Genes Associated With Amyotrophic Lateral Sclerosis

Frontiers in Genetics ◽

10.3389/fgene.2020.587243 ◽

2020 ◽

Vol 11 ◽

Author(s):

Lishun Xiao ◽

Zhongshang Yuan ◽

Siyi Jin ◽

Ting Wang ◽

Shuiping Huang ◽

...

Keyword(s):

Amyotrophic Lateral Sclerosis ◽

Type I Error ◽

Association Studies ◽

Type I ◽

Genome Wide Association Studies ◽

Combination Strategy ◽

New Genes ◽

Genome Wide ◽

Causal Genes ◽

Lateral Sclerosis

Genome-wide association studies (GWAS) have identified multiple causal genes associated with amyotrophic lateral sclerosis (ALS); however, the genetic architecture of ALS remains completely unknown and a large number of causal genes have yet been discovered. To full such gap in part, we implemented an integrative analysis of transcriptome-wide association study (TWAS) for ALS to prioritize causal genes with summary statistics from 80,610 European individuals and employed 13 GTEx brain tissues as reference transcriptome panels. The summary-level TWAS analysis with single brain tissue was first undertaken and then a flexible p-value combination strategy, called summary data-based Cauchy Aggregation TWAS (SCAT), was proposed to pool association signals from single-tissue TWAS analysis while protecting against highly positive correlation among tests. Extensive simulations demonstrated SCAT can produce well-calibrated p-value for the control of type I error and was often much more powerful to identify association signals across various scenarios compared with single-tissue TWAS analysis. Using SCAT, we replicated three ALS-associated genes (i.e., ATXN3, SCFD1, and C9orf72) identified in previous GWASs and discovered additional five genes (i.e., SLC9A8, FAM66D, TRIP11, JUP, and RP11-529H20.6) which were not reported before. Furthermore, we discovered the five associations were largely driven by genes themselves and thus might be new genes which were likely related to the risk of ALS. However, further investigations are warranted to verify these results and untangle the pathophysiological function of the genes in developing ALS.

Download Full-text

Power and type I error rate of false discovery rate approaches in genome-wide association studies

BMC Genetics ◽

10.1186/1471-2156-6-s1-s134 ◽

2005 ◽

Vol 6 (Suppl 1) ◽

pp. S134 ◽

Cited By ~ 58

Author(s):

Qiong Yang ◽

Jing Cui ◽

Irmarie Chazaro ◽

L Adrienne Cupples ◽

Serkalem Demissie

Keyword(s):

False Discovery Rate ◽

Error Rate ◽

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rate ◽

False Discovery ◽

Genome Wide

Download Full-text

Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data

Genes ◽

10.3390/genes12050736 ◽

2021 ◽

Vol 12 (5) ◽

pp. 736

Author(s):

Xiaotian Dai ◽

Guifang Fu ◽

Shaofei Zhao ◽

Yifei Zeng

Keyword(s):

Type I Error ◽

Association Studies ◽

Case Control ◽

Error Rates ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Learning Approaches ◽

Genome Wide ◽

Control Disease

Despite the fact that imbalance between case and control groups is prevalent in genome-wide association studies (GWAS), it is often overlooked. This imbalance is getting more significant and urgent as the rapid growth of biobanks and electronic health records have enabled the collection of thousands of phenotypes from large cohorts, in particular for diseases with low prevalence. The unbalanced binary traits pose serious challenges to traditional statistical methods in terms of both genomic selection and disease prediction. For example, the well-established linear mixed models (LMM) yield inflated type I error rates in the presence of unbalanced case-control ratios. In this article, we review multiple statistical approaches that have been developed to overcome the inaccuracy caused by the unbalanced case-control ratio, with the advantages and limitations of each approach commented. In addition, we also explore the potential for applying several powerful and popular state-of-the-art machine-learning approaches, which have not been applied to the GWAS field yet. This review paves the way for better analysis and understanding of the unbalanced case-control disease data in GWAS.

Download Full-text

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

10.1101/2020.10.09.333146 ◽

2020 ◽

Author(s):

Wenjian Bi ◽

Wei Zhou ◽

Rounak Dey ◽

Bhramar Mukherjee ◽

Joshua N Sampson ◽

...

Keyword(s):

Mixed Model ◽

Type I Error ◽

Association Studies ◽

Error Rates ◽

Genome Wide Association ◽

Alternative Methods ◽

Type I ◽

Genome Wide Association Studies ◽

Type I Error Rates ◽

Genome Wide

AbstractIn genome-wide association studies (GWAS), ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, due to the lack of analysis tools, methods designed for binary and quantitative traits have often been used inappropriately to analyze categorical phenotypes, which produces inflated type I error rates or is less powerful. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, Proportional Odds Logistic Mixed Model (POLMM). POLMM is demonstrated to be computationally efficient to analyze large datasets with hundreds of thousands of genetic related samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than other alternative methods. We applied POLMM to 258 ordinal categorical phenotypes on array-genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which 424 variants (7.2%) are rare variants with MAF < 0.01.

Download Full-text

Operating Characteristics of the Rank-Based Inverse Normal Transformation for Quantitative Trait Analysis in Genome-Wide Association Studies

10.1101/635706 ◽

2019 ◽

Cited By ~ 2

Author(s):

Zachary R. McCaw ◽

Jacqueline M. Lane ◽

Richa Saxena ◽

Susan Redline ◽

Xihong Lin

Keyword(s):

Type I Error ◽

Association Studies ◽

Genome Wide Association ◽

Type I ◽

Genome Wide Association Studies ◽

Operating Characteristics ◽

Association Tests ◽

Genome Wide ◽

Normal Transformation ◽

Normally Distributed

SummaryQuantitative traits analyzed in Genome-Wide Association Studies (GWAS) are often non-normally distributed. For such traits, association tests based on standard linear regression are subject to reduced power and inflated type I error in finite samples. Applying the rank-based Inverse Normal Transformation (INT) to non-normally distributed traits has become common practice in GWAS. However, the different variations on INT-based association testing have not been formally defined, and guidance is lacking on when to use which approach. In this paper, we formally define and systematically compare the direct (D-INT) and indirect (I-INT) INT-based association tests. We discuss their assumptions, underlying generative models, and connections. We demonstrate that the relative powers of D-INT and I-INT depend on the underlying data generating process. Since neither approach is uniformly most powerful, we combine them into an adaptive omnibus test (O-INT). O-INT is robust to model misspecification, protects the type I error, and is well powered against a wide range of non-normally distributed traits. Extensive simulations were conducted to examine the finite sample operating characteristics of these tests. Our results demonstrate that, for non-normally distributed traits, INT-based tests outperform the standard untransformed association test (UAT), both in terms of power and type I error rate control. We apply the proposed methods to GWAS of spirometry traits in the UK Biobank. O-INT has been implemented in the R package RNOmni, which is available on CRAN.

Download Full-text