MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations

Bioinformatics ◽

10.1093/bioinformatics/btaa957 ◽

2020 ◽

Author(s):

Sisheng Liu ◽

Jinpeng Liu ◽

Yanqi Xie ◽

Tingting Zhai ◽

Eugene W Hinderer ◽

...

Keyword(s):

Mutation Rate ◽

De Novo ◽

R Package ◽

Supplementary Information ◽

Driver Mutations ◽

Mutual Exclusivity ◽

Statistical Framework ◽

Gene Sets ◽

Genome Wide ◽

Background Mutation Rate

Abstract Motivation Cancer somatic driver mutations associated with genes within a pathway often show a mutually exclusive pattern across a cohort of patients. This mutually exclusive mutational signal has been frequently used to distinguish driver from passenger mutations and to investigate relationships among driver mutations. Current methods for de novo discovery of mutually exclusive mutational patterns are limited because the heterogeneity in background mutation rate can confound mutational patterns, and the presence of highly mutated genes can lead to spurious patterns. In addition, most methods only focus on a limited number of pre-selected genes and are unable to perform genome-wide analysis due to computational inefficiency. Results We introduce a statistical framework, MEScan, for accurate and efficient mutual exclusivity analysis at the genomic scale. Our framework contains a fast and powerful statistical test for mutual exclusivity with adjustment of the background mutation rate and impact of highly mutated genes, and a multi-step procedure for genome-wide screening with the control of false discovery rate. We demonstrate that MEScan more accurately identifies mutually exclusive gene sets than existing methods and is at least two orders of magnitude faster than most methods. By applying MEScan to data from four different cancer types and pan-cancer, we have identified several biologically meaningful mutually exclusive gene sets. Availability and implementation MEScan is available as an R package at https://github.com/MarkeyBBSRF/MEScan. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Connecting mathematical models to genomes: joint estimation of model parameters and genome-wide marker effects on these parameters

Bioinformatics ◽

10.1093/bioinformatics/btaa129 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3169-3176 ◽

Cited By ~ 1

Author(s):

Akio Onogi

Keyword(s):

Mathematical Models ◽

R Package ◽

Joint Estimation ◽

Supplementary Information ◽

Joint Analysis ◽

Accurate Estimation ◽

Model Parameters ◽

Statistical Framework ◽

Genome Wide ◽

Estimation Of Model Parameters

Abstract Motivation Parameters of mathematical models used in biology may be genotype-specific and regarded as new traits. Therefore, an accurate estimation of these parameters and the association mapping on the estimated parameters can lead to important findings regarding the genetic architecture of biological processes. In this study, a statistical framework for a joint analysis (JA) of model parameters and genome-wide marker effects on these parameters was proposed and evaluated. Results In the simulation analyses based on different types of mathematical models, the JA inferred the model parameters and identified the responsible genomic regions more accurately than the independent analysis (IA). The JA of real plant data provided interesting insights into photosensitivity, which were uncovered by the IA. Availability and implementation The statistical framework is provided by the R package GenomeBasedModel available at https://github.com/Onogi/GenomeBasedModel. All R and C++ scripts used in this study are also available at the site. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TFEA.ChIP: a tool kit for transcription factor binding site enrichment analysis capitalizing on ChIP-seq datasets

Bioinformatics ◽

10.1093/bioinformatics/btz573 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5339-5340 ◽

Cited By ~ 8

Author(s):

Laura Puente-Santamaria ◽

Wyeth W Wasserman ◽

Luis del Peso

Keyword(s):

Genomic Analysis ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Web Based ◽

Factor Binding Site ◽

Gene Sets ◽

Transcription Regulators ◽

Computational Identification ◽

On Chip

Abstract Summary The computational identification of the transcription factors (TFs) [more generally, transcription regulators, (TR)] responsible for the co-regulation of a specific set of genes is a common problem found in genomic analysis. Herein, we describe TFEA.ChIP, a tool that makes use of ChIP-seq datasets to estimate and visualize TR enrichment in gene lists representing transcriptional profiles. We validated TFEA.ChIP using a wide variety of gene sets representing signatures of genetic and chemical perturbations as input and found that the relevant TR was correctly identified in 126 of a total of 174 analyzed. Comparison with other TR enrichment tools demonstrates that TFEA.ChIP is an highly customizable package with an outstanding performance. Availability and implementation TFEA.ChIP is implemented as an R package available at Bioconductor https://www.bioconductor.org/packages/devel/bioc/html/TFEA.ChIP.html and github https://github.com/LauraPS1/TFEA.ChIP_downloads. A web-based GUI to the package is also available at https://www.iib.uam.es/TFEA.ChIP/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BICORN: An R package for integrative inference of de novo cis-regulatory modules

Scientific Reports ◽

10.1038/s41598-020-63043-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Xi Chen ◽

Jinghua Gu ◽

Andrew F. Neuwald ◽

Leena Hilakivi-Clarke ◽

Robert Clarke ◽

...

Keyword(s):

Gene Transcription ◽

Target Genes ◽

De Novo ◽

R Package ◽

Model Parameters ◽

Expression Data ◽

Regulatory Modules ◽

Genome Wide ◽

Context Specific ◽

Tf Gene

Abstract Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites, based on which cis-regulatory modules (CRMs) can be inferred. CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

De novo mutations across 1,465 diverse genomes reveal novel mutational insights and reductions in the Amish founder population

10.1101/553214 ◽

2019 ◽

Author(s):

Michael D. Kessler ◽

Douglas P. Loesch ◽

James A. Perry ◽

Nancy L. Heard-Costa ◽

Brian E. Cade ◽

...

Keyword(s):

Genetic Variation ◽

Mutation Rate ◽

De Novo ◽

Population Level ◽

Mutation Rates ◽

Founder Population ◽

Single Nucleotide ◽

Genome Wide ◽

Nucleotide Mutation ◽

Rare Genetic Variation

Abstractde novo Mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) program, we directly estimate and analyze DNM counts, rates, and spectra from 1,465 trios across an array of diverse human populations. Using the resulting call set of 86,865 single nucleotide DNMs, we find a significant positive correlation between local recombination rate and local DNM rate, which together can explain up to 35.5% of the genome-wide variation in population level rare genetic variation from 41K unrelated TOPMed samples. While genome-wide heterozygosity does correlate weakly with DNM count, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, interestingly, we do find significantly fewer DNMs in Amish individuals compared with other Europeans, even after accounting for parental age and sequencing center. Specifically, we find significant reductions in the number of T→C mutations in the Amish, which seems to underpin their overall reduction in DNMs. Finally, we calculate near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by non-additive genetic effects and/or the environment, and that a less mutagenic environment may be responsible for the reduced DNM rate in the Amish.SignificanceHere we provide one of the largest and most diverse human de novo mutation (DNM) call sets to date, and use it to quantify the genome-wide relationship between local mutation rate and population-level rare genetic variation. While we demonstrate that the human single nucleotide mutation rate is similar across numerous human ancestries and populations, we also discover a reduced mutation rate in the Amish founder population, which shows that mutation rates can shift rapidly. Finally, we find that variation in mutation rates is not heritable, which suggests that the environment may influence mutation rates more significantly than previously realized.

Download Full-text

gwasurvivr: an R package for genome wide survival analysis

10.1101/326033 ◽

2018 ◽

Author(s):

Abbas A Rizvi ◽

Ezgi Karaesmen ◽

Martin Morgan ◽

Leah Preus ◽

Junke Wang ◽

...

Keyword(s):

Survival Analysis ◽

Cox Model ◽

R Package ◽

Supplementary Information ◽

Parameter Estimates ◽

Survival Analyses ◽

Link Type ◽

Genome Wide ◽

Size Number ◽

Simple Interface

ABSTRACTSummaryTo address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points. We benchmarked gwasurvivr with other software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). gwasurvivr is significantly faster and shows better scalability as sample size, number of SNPs and number of covariates increases.Availability and implementationgwasurvivr, including source code, documentation, and vignette are available at: http://bioconductor.org/packages/gwasurvivrContactAbbas Rizvi, [email protected]; Lara E Sucheston-Campbell, [email protected] information: Supplementary data are available at https://github.com/suchestoncampbelllab/gwasurvivr_manuscript

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Destin: toolkit for single-cell analysis of chromatin accessibility

Bioinformatics ◽

10.1093/bioinformatics/btz141 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3818-3820 ◽

Cited By ~ 10

Author(s):

Eugene Urrutia ◽

Li Chen ◽

Haibo Zhou ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

New Technology ◽

R Package ◽

Chromatin Accessibility ◽

Supplementary Information ◽

Cell Type ◽

Statistical Framework ◽

Specific Association ◽

Accessible Chromatin

Abstract Summary Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci. Availability and implementation Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

bWGR: Bayesian whole-genome regression

Bioinformatics ◽

10.1093/bioinformatics/btz794 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alencar Xavier ◽

William M Muir ◽

Katy M Rainey

Keyword(s):

Bayesian Methods ◽

Expectation Maximization ◽

Complex Traits ◽

Hierarchical Models ◽

R Package ◽

Supplementary Information ◽

Whole Genome ◽

Regression Methods ◽

Genome Wide ◽

User Friendly

AbstractMotivationWhole-genome regressions methods represent a key framework for genome-wide prediction, cross-validation studies and association analysis. The bWGR offers a compendium of Bayesian methods with various priors available, allowing users to predict complex traits with different genetic architectures.ResultsHere we introduce bWGR, an R package that enables users to efficient fit and cross-validate Bayesian and likelihood whole-genome regression methods. It implements a series of methods referred to as the Bayesian alphabet under the traditional Gibbs sampling and optimized expectation-maximization. The package also enables fitting efficient multivariate models and complex hierarchical models. The package is user-friendly and computational efficient.Availability and implementationbWGR is an R package available in the CRAN repository. It can be installed in R by typing: install.packages(‘bWGR’).Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text