blockcluster: An R Package for Model-Based Co-Clustering

AbstractThe study of diversification largely relies on model-based approaches, estimating rates of speciation and extinction from phylogenetic trees. While a plethora of different models exist – all with different features, strengths and weaknesses – there is increasing concern about the reliability of the inference we gain from them. Apart from simply finding the model with the best fit for the data, we should find ways to assess a model’s suitability to describe the data in an absolute sense. The R package BoskR implements a simple way of judging a model’s adequacy for a given phylogeny using metrics for tree shape, assuming that a model is inadequate for a phylogeny if it produces trees that are consistently dissimilar in shape from the tree that should be analyzed. Tree shape is assessed via metrics derived from the tree’s modified graph Laplacian spectrum, as provided by RPANDA. We exemplify the use of the method using simulated and empirical example phylogenies. BoskR was mostly able to correctly distinguish trees simulated under clearly different models and revealed that not all models are adequate for the empirical example trees. We believe the metrics of tree shape to be an intuitive and relevant means of assessing diversification model adequacy. Furthermore, by implementing the approach in an openly available R package, we enable and encourage researchers to adopt adequacy testing into their workflow.

Download Full-text

MIMOSA2: A metabolic network-based tool for inferring mechanism-supported relationships in microbiome-metabolome data

10.1101/2021.09.14.459910 ◽

2021 ◽

Author(s):

Cecilia Noecker ◽

Alexander Eng ◽

Elhanan Borenstein

Keyword(s):

Web Application ◽

Ground Truth ◽

Metabolic Model ◽

R Package ◽

Data Types ◽

Metabolomics Data ◽

Model Based ◽

Technological Developments ◽

Reference Databases ◽

Reference Knowledge

Motivation: Recent technological developments have facilitated an expansion of microbiome-metabolome studies, in which a set of microbiome samples are assayed using both genomic and metabolomic technologies to characterize the composition of microbial taxa and the concentrations of various metabolites. A common goal of many of these studies is to identify microbial features (species or genes) that contribute to differences in metabolite levels across samples. Previous work indicated that integrating these datasets with reference knowledge on microbial metabolic capacities may enable more precise and confident inference of such microbe-metabolite links. Results: We present MIMOSA2, an R package and web application for model-based integrative analysis of microbiome-metabolome datasets. MIMOSA2 uses reference databases to construct a community metabolic model based on microbiome data and uses this model to predict differences in metabolite levels across samples. These predictions are compared with metabolomics data to identify putative microbiome-governed metabolites and specific taxonomic contributors to metabolite variation. MIMOSA2 supports various input data types and can be customized to incorporate user-defined metabolic pathways. We demonstrate MIMOSA2's ability to identify ground truth microbial mechanisms in simulation datasets, and compare its results with experimentally inferred mechanisms in a dataset describing honeybee gut microbiota. Overall, MIMOSA2 combines reference databases, a validated statistical framework, and a user-friendly interface to facilitate modeling and evaluating relationships between members of the microbiota and their metabolic products. Availability and Implementation: MIMOSA2 is implemented in R under the GNU General Public License v3.0 and is freely available as a web server and R package from www.borensteinlab.com/software_MIMOSA2.html.

Download Full-text

MatTransMix: an R Package for Matrix Model-Based Clustering and Parsimonious Mixture Modeling

Journal of Classification ◽

10.1007/s00357-021-09401-9 ◽

2021 ◽

Author(s):

Xuwen Zhu ◽

Shuchismita Sarkar ◽

Volodymyr Melnykov

Keyword(s):

Matrix Model ◽

R Package ◽

Mixture Modeling ◽

Model Based Clustering ◽

Model Based

Download Full-text

Model-based boosting in R: a hands-on tutorial using the R package mboost

Computational Statistics ◽

10.1007/s00180-012-0382-5 ◽

2012 ◽

Vol 29 (1-2) ◽

pp. 3-35 ◽

Cited By ~ 78

Author(s):

Benjamin Hofner ◽

Andreas Mayr ◽

Nikolay Robinzonov ◽

Matthias Schmid

Keyword(s):

R Package ◽

Model Based ◽

Hands On

Download Full-text

Fast Hierarchical Bayesian Analysis of Population Structure

10.1101/454355 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gerry Tonkin-Hill ◽

John A. Lees ◽

Stephen D. Bentley ◽

Simon D.W. Frost ◽

Jukka Corander

Keyword(s):

Dirichlet Process ◽

Phylogenetic Trees ◽

Marginal Likelihood ◽

Simulated Data ◽

R Package ◽

Multilocus Genotype ◽

Dirichlet Process Mixture ◽

Model Based Clustering ◽

Hierarchical Bayesian Analysis ◽

Model Based

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

Download Full-text

Model-based clustering with mclust R package: Multivariate assessment of mathematics performance of students in Qatar

10.36334/modsim.2021.a1.alzahrani ◽

2021 ◽

Keyword(s):

R Package ◽

Mathematics Performance ◽

Model Based Clustering ◽

Model Based

Download Full-text

BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data

The R Journal ◽

10.32614/rj-2017-022 ◽

2017 ◽

Vol 9 (1) ◽

pp. 403 ◽

Cited By ~ 5

Author(s):

Panagiotis Papastamoulis ◽

Magnus Rattray

Keyword(s):

Binary Data ◽

R Package ◽

Model Based Clustering ◽

Model Based ◽

Multivariate Binary Data

Download Full-text

phyr: An R package for phylogenetic species-distribution modelling in ecological communities

10.1101/2020.02.17.952317 ◽

2020 ◽

Author(s):

Daijiang Li ◽

Russell Dinnage ◽

Lucas Nell ◽

Matthew R. Helmus ◽

Anthony Ives

Keyword(s):

Community Composition ◽

Species Distribution ◽

Species Distribution Models ◽

R Package ◽

Bipartite Network ◽

List Type ◽

Ecological Communities ◽

Phylogenetic Species ◽

Distribution Models ◽

Model Based

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.

Download Full-text