Connecting mathematical models to genomes: joint estimation of model parameters and genome-wide marker effects on these parameters

Akio Onogi

doi:10.1093/bioinformatics/btaa129

Connecting mathematical models to genomes: joint estimation of model parameters and genome-wide marker effects on these parameters

Bioinformatics ◽

10.1093/bioinformatics/btaa129 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3169-3176 ◽

Cited By ~ 1

Author(s):

Akio Onogi

Keyword(s):

Mathematical Models ◽

R Package ◽

Joint Estimation ◽

Supplementary Information ◽

Joint Analysis ◽

Accurate Estimation ◽

Model Parameters ◽

Statistical Framework ◽

Genome Wide ◽

Estimation Of Model Parameters

Abstract Motivation Parameters of mathematical models used in biology may be genotype-specific and regarded as new traits. Therefore, an accurate estimation of these parameters and the association mapping on the estimated parameters can lead to important findings regarding the genetic architecture of biological processes. In this study, a statistical framework for a joint analysis (JA) of model parameters and genome-wide marker effects on these parameters was proposed and evaluated. Results In the simulation analyses based on different types of mathematical models, the JA inferred the model parameters and identified the responsible genomic regions more accurately than the independent analysis (IA). The JA of real plant data provided interesting insights into photosensitivity, which were uncovered by the IA. Availability and implementation The statistical framework is provided by the R package GenomeBasedModel available at https://github.com/Onogi/GenomeBasedModel. All R and C++ scripts used in this study are also available at the site. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations

Bioinformatics ◽

10.1093/bioinformatics/btaa957 ◽

2020 ◽

Author(s):

Sisheng Liu ◽

Jinpeng Liu ◽

Yanqi Xie ◽

Tingting Zhai ◽

Eugene W Hinderer ◽

...

Keyword(s):

Mutation Rate ◽

De Novo ◽

R Package ◽

Supplementary Information ◽

Driver Mutations ◽

Mutual Exclusivity ◽

Statistical Framework ◽

Gene Sets ◽

Genome Wide ◽

Background Mutation Rate

Abstract Motivation Cancer somatic driver mutations associated with genes within a pathway often show a mutually exclusive pattern across a cohort of patients. This mutually exclusive mutational signal has been frequently used to distinguish driver from passenger mutations and to investigate relationships among driver mutations. Current methods for de novo discovery of mutually exclusive mutational patterns are limited because the heterogeneity in background mutation rate can confound mutational patterns, and the presence of highly mutated genes can lead to spurious patterns. In addition, most methods only focus on a limited number of pre-selected genes and are unable to perform genome-wide analysis due to computational inefficiency. Results We introduce a statistical framework, MEScan, for accurate and efficient mutual exclusivity analysis at the genomic scale. Our framework contains a fast and powerful statistical test for mutual exclusivity with adjustment of the background mutation rate and impact of highly mutated genes, and a multi-step procedure for genome-wide screening with the control of false discovery rate. We demonstrate that MEScan more accurately identifies mutually exclusive gene sets than existing methods and is at least two orders of magnitude faster than most methods. By applying MEScan to data from four different cancer types and pan-cancer, we have identified several biologically meaningful mutually exclusive gene sets. Availability and implementation MEScan is available as an R package at https://github.com/MarkeyBBSRF/MEScan. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modular Dynamic Biomolecular Modelling: The Unification of Stoichiometry, Thermodynamics, Kinetics and Data

10.1101/2021.03.24.436792 ◽

2021 ◽

Author(s):

Peter J. Gawthrop ◽

Michael Pan ◽

Edmund J. Crampin

Keyword(s):

Dynamic Models ◽

System Development ◽

Kinetic Modelling ◽

Simulation Models ◽

Bond Graph ◽

Model Parameters ◽

Biomolecular Systems ◽

Genome Wide ◽

Estimation Of Model Parameters ◽

Genome Scale

AbstractRenewed interest in dynamic simulation models of biomolecular systems has arisen from advances in genome-wide measurement and applications of such models in biotechnology and synthetic biology. In particular, genome-scale models of cellular metabolism beyond the steady state are required in order to represent transient and dynamic regulatory properties of the system. Development of such whole-cell models requires new modelling approaches. Here we propose the energy-based bond graph methodology, which integrates stoichiometric models with thermo-dynamic principles and kinetic modelling. We demonstrate how the bond graph approach intrinsically enforces thermodynamic constraints, provides a modular approach to modelling, and gives a basis for estimation of model parameters leading to dynamic models of biomolecular systems. The approach is illustrated using a well-established stoichiometric model of E. coli and published experimental data.

Download Full-text

bGWAS: an R package to perform Bayesian genome wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btaa549 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4374-4376

Author(s):

Ninon Mounier ◽

Zoltán Kutalik

Keyword(s):

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

R Package ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Biological Mechanisms ◽

Genome Wide ◽

Related Risk

Abstract Summary Increasing sample size is not the only strategy to improve discovery in Genome Wide Association Studies (GWASs) and we propose here an approach that leverages published studies of related traits to improve inference. Our Bayesian GWAS method derives informative prior effects by leveraging GWASs of related risk factors and their causal effect estimates on the focal trait using multivariable Mendelian randomization. These prior effects are combined with the observed effects to yield Bayes Factors, posterior and direct effects. The approach not only increases power, but also has the potential to dissect direct and indirect biological mechanisms. Availability and implementation bGWAS package is freely available under a GPL-2 License, and can be accessed, alongside with user guides and tutorials, from https://github.com/n-mounier/bGWAS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BICORN: An R package for integrative inference of de novo cis-regulatory modules

Scientific Reports ◽

10.1038/s41598-020-63043-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Xi Chen ◽

Jinghua Gu ◽

Andrew F. Neuwald ◽

Leena Hilakivi-Clarke ◽

Robert Clarke ◽

...

Keyword(s):

Gene Transcription ◽

Target Genes ◽

De Novo ◽

R Package ◽

Model Parameters ◽

Expression Data ◽

Regulatory Modules ◽

Genome Wide ◽

Context Specific ◽

Tf Gene

Abstract Genome-wide transcription factor (TF) binding signal analyses reveal co-localization of TF binding sites, based on which cis-regulatory modules (CRMs) can be inferred. CRMs play a key role in understanding the cooperation of multiple TFs under specific conditions. However, the functions of CRMs and their effects on nearby gene transcription are highly dynamic and context-specific and therefore are challenging to characterize. BICORN (Bayesian Inference of COoperative Regulatory Network) builds a hierarchical Bayesian model and infers context-specific CRMs based on TF-gene binding events and gene expression data for a particular cell type. BICORN automatically searches for a list of candidate CRMs based on the input TF bindings at regulatory regions associated with genes of interest. Applying Gibbs sampling, BICORN iteratively estimates model parameters of CRMs, TF activities, and corresponding regulation on gene transcription, which it models as a sparse network of functional CRMs regulating target genes. The BICORN package is implemented in R (version 3.4 or later) and is publicly available on the CRAN server at https://cran.r-project.org/web/packages/BICORN/index.html.

Download Full-text

Multi-SNP mediation intersection-union test

Bioinformatics ◽

10.1093/bioinformatics/btz285 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4724-4729 ◽

Cited By ~ 4

Author(s):

Wujuan Zhong ◽

Cassandra N Spracklen ◽

Karen L Mohlke ◽

Xiaojing Zheng ◽

Jason Fine ◽

...

Keyword(s):

Association Studies ◽

R Package ◽

Alternative Methods ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Mediation Effects ◽

Coding Regions ◽

Genome Wide ◽

Plasma Adiponectin Level ◽

Intersection Union Test

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

gwasurvivr: an R package for genome wide survival analysis

10.1101/326033 ◽

2018 ◽

Author(s):

Abbas A Rizvi ◽

Ezgi Karaesmen ◽

Martin Morgan ◽

Leah Preus ◽

Junke Wang ◽

...

Keyword(s):

Survival Analysis ◽

Cox Model ◽

R Package ◽

Supplementary Information ◽

Parameter Estimates ◽

Survival Analyses ◽

Link Type ◽

Genome Wide ◽

Size Number ◽

Simple Interface

ABSTRACTSummaryTo address the limited software options for performing survival analyses with millions of SNPs, we developed gwasurvivr, an R/Bioconductor package with a simple interface for conducting genome wide survival analyses using VCF (outputted from Michigan or Sanger imputation servers), IMPUTE2 or PLINK files. To decrease the number of iterations needed for convergence when optimizing the parameter estimates in the Cox model we modified the R package survival; covariates in the model are first fit without the SNP, and those parameter estimates are used as initial points. We benchmarked gwasurvivr with other software capable of conducting genome wide survival analysis (genipe, SurvivalGWAS_SV, and GWASTools). gwasurvivr is significantly faster and shows better scalability as sample size, number of SNPs and number of covariates increases.Availability and implementationgwasurvivr, including source code, documentation, and vignette are available at: http://bioconductor.org/packages/gwasurvivrContactAbbas Rizvi, [email protected]; Lara E Sucheston-Campbell, [email protected] information: Supplementary data are available at https://github.com/suchestoncampbelllab/gwasurvivr_manuscript

Download Full-text

VarGen: an R package for disease-associated variant discovery and annotation

Bioinformatics ◽

10.1093/bioinformatics/btz930 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2626-2627

Author(s):

Corentin Molitor ◽

Matt Brember ◽

Fady Mohareb

Keyword(s):

Association Studies ◽

Genetic Disorders ◽

R Package ◽

Tissue Expression ◽

Mendelian Inheritance ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Variant Discovery ◽

Genome Wide ◽

High Quality Information

Abstract Summary Over the past decade, there has been an exponential increase in the amount of disease-related genomic data available in public databases. However, this high-quality information is spread across independent sources and researchers often need to access these separately. Hence, there is a growing need for tools that gather and compile this information in an easy and automated manner. Here, we present ‘VarGen’, an easy-to-use, customizable R package that fetches, annotates and rank variants related to diseases and genetic disorders, using a collection public databases (viz. Online Mendelian Inheritance in Man, the Functional Annotation of the Mammalian genome 5, the Genotype-Tissue Expression and the Genome Wide Association Studies catalog). This package is also capable of annotating these variants to identify the most impactful ones. We expect that this tool will benefit the research of variant-disease relationships. Availability and implementation VarGen is open-source and freely available via GitHub: https://github.com/MCorentin/VarGen. The software is implemented as an R package and is supported on Linux, MacOS and Windows. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Destin: toolkit for single-cell analysis of chromatin accessibility

Bioinformatics ◽

10.1093/bioinformatics/btz141 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3818-3820 ◽

Cited By ~ 10

Author(s):

Eugene Urrutia ◽

Li Chen ◽

Haibo Zhou ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

New Technology ◽

R Package ◽

Chromatin Accessibility ◽

Supplementary Information ◽

Cell Type ◽

Statistical Framework ◽

Specific Association ◽

Accessible Chromatin

Abstract Summary Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci. Availability and implementation Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Joint Estimation of Model Parameters and Outlier Effects in Time Series

Journal of the American Statistical Association ◽

10.2307/2290724 ◽

1993 ◽

Vol 88 (421) ◽

pp. 284 ◽

Cited By ~ 129

Author(s):

Chung Chen ◽

Lon-Mu Liu

Keyword(s):

Time Series ◽

Joint Estimation ◽

Model Parameters ◽

Estimation Of Model Parameters

Download Full-text

bWGR: Bayesian whole-genome regression

Bioinformatics ◽

10.1093/bioinformatics/btz794 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alencar Xavier ◽

William M Muir ◽

Katy M Rainey

Keyword(s):

Bayesian Methods ◽

Expectation Maximization ◽

Complex Traits ◽

Hierarchical Models ◽

R Package ◽

Supplementary Information ◽

Whole Genome ◽

Regression Methods ◽

Genome Wide ◽

User Friendly

AbstractMotivationWhole-genome regressions methods represent a key framework for genome-wide prediction, cross-validation studies and association analysis. The bWGR offers a compendium of Bayesian methods with various priors available, allowing users to predict complex traits with different genetic architectures.ResultsHere we introduce bWGR, an R package that enables users to efficient fit and cross-validate Bayesian and likelihood whole-genome regression methods. It implements a series of methods referred to as the Bayesian alphabet under the traditional Gibbs sampling and optimized expectation-maximization. The package also enables fitting efficient multivariate models and complex hierarchical models. The package is user-friendly and computational efficient.Availability and implementationbWGR is an R package available in the CRAN repository. It can be installed in R by typing: install.packages(‘bWGR’).Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text