Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait

Jie-Huei Wang; Yi-Hau Chen

doi:10.1093/bioinformatics/btaa017

Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait

Bioinformatics ◽

10.1093/bioinformatics/btaa017 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2763-2769

Author(s):

Jie-Huei Wang ◽

Yi-Hau Chen

Keyword(s):

Partial Correlation ◽

Association Studies ◽

B Cell Lymphoma ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Interaction Screening ◽

Relationship Of

Abstract Motivation In gene expression and genome-wide association studies, the identification of interaction effects is an important and challenging issue owing to its ultrahigh-dimensional nature. In particular, contaminated data and right-censored survival outcome make the associated feature screening even challenging. Results In this article, we propose an inverse probability-of-censoring weighted Kendall’s tau statistic to measure association of a survival trait with biomarkers, as well as a Kendall’s partial correlation statistic to measure the relationship of a survival trait with an interaction variable conditional on the main effects. The Kendall’s partial correlation is then used to conduct interaction screening. Simulation studies under various scenarios are performed to compare the performance of our proposal with some commonly available methods. In the real data application, we utilize our proposed method to identify epistasis associated with the clinical survival outcomes of non-small-cell lung cancer, diffuse large B-cell lymphoma and lung adenocarcinoma patients. Both simulation and real data studies demonstrate that our method performs well and outperforms existing methods in identifying main and interaction biomarkers. Availability and implementation R-package ‘IPCWK’ is available to implement this method, together with a reference manual describing how to perform the ‘IPCWK’ package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Mixed Logistic Regression in Genome-Wide Association Studies

10.1101/2020.01.17.910109 ◽

2020 ◽

Author(s):

Jacqueline Milet ◽

Hervé Perdry

Keyword(s):

Logistic Regression ◽

Linear Models ◽

Association Studies ◽

Score Test ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mixed Linear Models ◽

Genome Wide

AbstractMotivationMixed linear models (MLM) have been widely used to account for population structure in case-control genome-wide association studies, the status being analyzed as a quantitative phenotype. Chen et al. proved that this method is inappropriate and proposed a score test for the mixed logistic regression (MLR). However this test does not allow an estimation of the variants’ effects.ResultsWe propose two computationally efficient methods to estimate the variants’ effects. Their properties are evaluated on two simulations sets, and compared with other methods (MLM, logistic regression). MLR performs the best in all circumstances. The variants’ effects are well evaluated by our methods, with a moderate bias when the effect sizes are large. Additionally, we propose a stratified QQ-plot, enhancing the diagnosis of p-values inflation or deflation, when population strata are not clearly identified in the sample.AvailabilityAll methods are implemented in the R package milorGWAS available at https://github.com/genostats/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

RAISS: robust and accurate imputation from summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btz466 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4837-4839 ◽

Cited By ~ 1

Author(s):

Hanna Julienne ◽

Huwenbo Shi ◽

Bogdan Pasaniuc ◽

Hugues Aschard

Keyword(s):

Effect Size ◽

Association Studies ◽

Real Data ◽

Supplementary Information ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Small Effect Size ◽

Python Package

Abstract Motivation Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. Results We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. Availability and implementation The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

deTS: tissue-specific enrichment analysis to decode tissue specificity

Bioinformatics ◽

10.1093/bioinformatics/btz138 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3842-3845 ◽

Cited By ~ 8

Author(s):

Guangsheng Pei ◽

Yulin Dai ◽

Zhongming Zhao ◽

Peilin Jia

Keyword(s):

Expression Profiles ◽

Association Studies ◽

Gene Expression Profiles ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Genome Wide ◽

Specific Regulation

Abstract Motivation Diseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles. Results We present deTS, an R package, to conduct tissue-specific enrichment analysis with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association studies data and cancer expression data showed that deTS could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies. Availability and implementation https://github.com/bsml320/deTS and CRAN https://cran.r-project.org/web/packages/deTS/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CASMAP: detection of statistically significant combinations of SNPs in association mapping

Bioinformatics ◽

10.1093/bioinformatics/bty1020 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2680-2682 ◽

Cited By ~ 3

Author(s):

Felipe Llinares-López ◽

Laetitia Papaxanthos ◽

Damian Roqueiro ◽

Dean Bodenham ◽

Karsten Borgwardt

Keyword(s):

Association Mapping ◽

Pattern Mining ◽

Association Studies ◽

R Package ◽

Higher Order ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Epistatic Interactions ◽

Significant Pattern ◽

Genome Wide

Abstract Summary Combinatorial association mapping aims to assess the statistical association of higher-order interactions of genetic markers with a phenotype of interest. This article presents combinatorial association mapping (CASMAP), a software package that leverages recent advances in significant pattern mining to overcome the statistical and computational challenges that have hindered combinatorial association mapping. CASMAP can be used to perform region-based association studies and to detect higher-order epistatic interactions of genetic variants. Most importantly, unlike other existing significant pattern mining-based tools, CASMAP allows for the correction of categorical covariates such as age or gender, making it suitable for genome-wide association studies. Availability and implementation The R and Python packages can be downloaded from our GitHub repository http://github.com/BorgwardtLab/CASMAP. The R package is also available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HiGwas: how to compute longitudinal GWAS data in population designs

Bioinformatics ◽

10.1093/bioinformatics/btaa294 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4222-4224

Author(s):

Zhong Wang ◽

Nating Wang ◽

Zilu Wang ◽

Libo Jiang ◽

Yaqun Wang ◽

...

Keyword(s):

Data Analysis ◽

Complex Traits ◽

Association Studies ◽

Computer Software ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Significance Level

Abstract Summary Genome-wide association studies (GWAS), particularly designed with thousands and thousands of single-nucleotide polymorphisms (SNPs) (big p) genotyped on tens of thousands of subjects (small n), are encountered by a major challenge of p ≪ n. Although the integration of longitudinal information can significantly enhance a GWAS’s power to comprehend the genetic architecture of complex traits and diseases, an additional challenge is generated by an autocorrelative process. We have developed several statistical models for addressing these two challenges by implementing dimension reduction methods and longitudinal data analysis. To make these models computationally accessible to applied geneticists, we wrote an R package of computer software, HiGwas, designed to analyze longitudinal GWAS datasets. Functions in the package encompass single SNP analyses, significance-level adjustment, preconditioning and model selection for a high-dimensional set of SNPs. HiGwas provides the estimates of genetic parameters and the confidence intervals of these estimates. We demonstrate the features of HiGwas through real data analysis and vignette document in the package. Availability and implementation https://github.com/wzhy2000/higwas. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data

Bioinformatics ◽

10.1093/bioinformatics/btz333 ◽

2019 ◽

Vol 35 (14) ◽

pp. i427-i435 ◽

Cited By ~ 3

Author(s):

Héctor Climente-González ◽

Chloé-Agathe Azencott ◽

Samuel Kaski ◽

Makoto Yamada

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Association Studies ◽

Real Data ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Model Free ◽

Computational Overhead ◽

Single Cell Rna Sequencing ◽

Non Linear

AbstractMotivationFinding non-linear relationships between biomolecules and a biological outcome is computationally expensive and statistically challenging. Existing methods have important drawbacks, including among others lack of parsimony, non-convexity and computational overhead. Here we propose block HSIC Lasso, a non-linear feature selector that does not present the previous drawbacks.ResultsWe compare block HSIC Lasso to other state-of-the-art feature selection techniques in both synthetic and real data, including experiments over three common types of genomic data: gene-expression microarrays, single-cell RNA sequencing and genome-wide association studies. In all cases, we observe that features selected by block HSIC Lasso retain more information about the underlying biology than those selected by other techniques. As a proof of concept, we applied block HSIC Lasso to a single-cell RNA sequencing experiment on mouse hippocampus. We discovered that many genes linked in the past to brain development and function are involved in the biological differences between the types of neurons.Availability and implementationBlock HSIC Lasso is implemented in the Python 2/3 package pyHSICLasso, available on PyPI. Source code is available on GitHub (https://github.com/riken-aip/pyHSICLasso).Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text