Model-based boosting in R: a hands-on tutorial using the R package mboost

AbstractThe study of diversification largely relies on model-based approaches, estimating rates of speciation and extinction from phylogenetic trees. While a plethora of different models exist – all with different features, strengths and weaknesses – there is increasing concern about the reliability of the inference we gain from them. Apart from simply finding the model with the best fit for the data, we should find ways to assess a model’s suitability to describe the data in an absolute sense. The R package BoskR implements a simple way of judging a model’s adequacy for a given phylogeny using metrics for tree shape, assuming that a model is inadequate for a phylogeny if it produces trees that are consistently dissimilar in shape from the tree that should be analyzed. Tree shape is assessed via metrics derived from the tree’s modified graph Laplacian spectrum, as provided by RPANDA. We exemplify the use of the method using simulated and empirical example phylogenies. BoskR was mostly able to correctly distinguish trees simulated under clearly different models and revealed that not all models are adequate for the empirical example trees. We believe the metrics of tree shape to be an intuitive and relevant means of assessing diversification model adequacy. Furthermore, by implementing the approach in an openly available R package, we enable and encourage researchers to adopt adequacy testing into their workflow.

Download Full-text

blockcluster: An R Package for Model-Based Co-Clustering

Journal of Statistical Software ◽

10.18637/jss.v076.i09 ◽

2017 ◽

Vol 76 (9) ◽

Cited By ~ 6

Author(s):

Parmeet Singh Bhatia ◽

Serge Iovleff ◽

Gérard Govaert

Keyword(s):

R Package ◽

Model Based

Download Full-text

MIMOSA2: A metabolic network-based tool for inferring mechanism-supported relationships in microbiome-metabolome data

10.1101/2021.09.14.459910 ◽

2021 ◽

Author(s):

Cecilia Noecker ◽

Alexander Eng ◽

Elhanan Borenstein

Keyword(s):

Web Application ◽

Ground Truth ◽

Metabolic Model ◽

R Package ◽

Data Types ◽

Metabolomics Data ◽

Model Based ◽

Technological Developments ◽

Reference Databases ◽

Reference Knowledge

Motivation: Recent technological developments have facilitated an expansion of microbiome-metabolome studies, in which a set of microbiome samples are assayed using both genomic and metabolomic technologies to characterize the composition of microbial taxa and the concentrations of various metabolites. A common goal of many of these studies is to identify microbial features (species or genes) that contribute to differences in metabolite levels across samples. Previous work indicated that integrating these datasets with reference knowledge on microbial metabolic capacities may enable more precise and confident inference of such microbe-metabolite links. Results: We present MIMOSA2, an R package and web application for model-based integrative analysis of microbiome-metabolome datasets. MIMOSA2 uses reference databases to construct a community metabolic model based on microbiome data and uses this model to predict differences in metabolite levels across samples. These predictions are compared with metabolomics data to identify putative microbiome-governed metabolites and specific taxonomic contributors to metabolite variation. MIMOSA2 supports various input data types and can be customized to incorporate user-defined metabolic pathways. We demonstrate MIMOSA2's ability to identify ground truth microbial mechanisms in simulation datasets, and compare its results with experimentally inferred mechanisms in a dataset describing honeybee gut microbiota. Overall, MIMOSA2 combines reference databases, a validated statistical framework, and a user-friendly interface to facilitate modeling and evaluating relationships between members of the microbiota and their metabolic products. Availability and Implementation: MIMOSA2 is implemented in R under the GNU General Public License v3.0 and is freely available as a web server and R package from www.borensteinlab.com/software_MIMOSA2.html.

Download Full-text

MatTransMix: an R Package for Matrix Model-Based Clustering and Parsimonious Mixture Modeling

Journal of Classification ◽

10.1007/s00357-021-09401-9 ◽

2021 ◽

Author(s):

Xuwen Zhu ◽

Shuchismita Sarkar ◽

Volodymyr Melnykov

Keyword(s):

Matrix Model ◽

R Package ◽

Mixture Modeling ◽

Model Based Clustering ◽

Model Based

Download Full-text

Comparing the Use of Two Different Model Approaches on Students’ Understanding of DNA Models

Education Sciences ◽

10.3390/educsci9020115 ◽

2019 ◽

Vol 9 (2) ◽

pp. 115 ◽

Cited By ~ 4

Author(s):

Julia Mierdel ◽

Franz X. Bogner

Keyword(s):

Scientific Literacy ◽

Positive Impact ◽

Ninth Graders ◽

Scientific Models ◽

Multiple Models ◽

Model Based ◽

Hands On ◽

Dna Models ◽

Models In Science ◽

School Laboratory

As effective methods to foster students’ understanding of scientific models in science education are needed, increased reflection on thinking about models is regarded as a relevant competence associated with scientific literacy. Our study focuses on the influence of model-based approaches (modeling vs. model viewing) in an out-of-school laboratory module on the students’ understanding of scientific models. A mixed method design examines three subsections of the construct: (1) students’ reasoning about multiple models in science, (2) students’ understanding of models as exact replicas, and (3) students’ understanding of the changing nature of models. There were 293 ninth graders from Bavarian grammar schools that participated in our hands-on module using creative model-based tasks. An open-ended test item evaluated the students’ understanding of “multiple models” (MM). We defined five categories with a majority of students arguing that the individuality of DNA structure leads to various DNA models (modelers = 36.3%, model viewers = 41.1%). Additionally, when applying two subscales of the quantitative instrument Students’ Understanding of Models in Science (SUMS) at three testing points (before, after, and delayed-after participation), a short- and mid-term decrease for the subscale “models as exact replicas” (ER) appeared, while mean scores increased short- and mid-term for the subscale “the changing nature of models” (CNM). Despite the lack of differences between the two approaches, a positive impact of model-based learning on students’ understanding of scientific models was observed.

Download Full-text

Fast Hierarchical Bayesian Analysis of Population Structure

10.1101/454355 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gerry Tonkin-Hill ◽

John A. Lees ◽

Stephen D. Bentley ◽

Simon D.W. Frost ◽

Jukka Corander

Keyword(s):

Dirichlet Process ◽

Phylogenetic Trees ◽

Marginal Likelihood ◽

Simulated Data ◽

R Package ◽

Multilocus Genotype ◽

Dirichlet Process Mixture ◽

Model Based Clustering ◽

Hierarchical Bayesian Analysis ◽

Model Based

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

Download Full-text

Model-based clustering with mclust R package: Multivariate assessment of mathematics performance of students in Qatar

10.36334/modsim.2021.a1.alzahrani ◽

2021 ◽

Keyword(s):

R Package ◽

Mathematics Performance ◽

Model Based Clustering ◽

Model Based

Download Full-text

BayesBinMix: an R Package for Model Based Clustering of Multivariate Binary Data

The R Journal ◽

10.32614/rj-2017-022 ◽

2017 ◽

Vol 9 (1) ◽

pp. 403 ◽

Cited By ~ 5

Author(s):

Panagiotis Papastamoulis ◽

Magnus Rattray

Keyword(s):

Binary Data ◽

R Package ◽

Model Based Clustering ◽

Model Based ◽

Multivariate Binary Data

Download Full-text

phyr: An R package for phylogenetic species-distribution modelling in ecological communities

10.1101/2020.02.17.952317 ◽

2020 ◽

Author(s):

Daijiang Li ◽

Russell Dinnage ◽

Lucas Nell ◽

Matthew R. Helmus ◽

Anthony Ives

Keyword(s):

Community Composition ◽

Species Distribution ◽

Species Distribution Models ◽

R Package ◽

Bipartite Network ◽

List Type ◽

Ecological Communities ◽

Phylogenetic Species ◽

Distribution Models ◽

Model Based

SummaryModel-based approaches are increasingly popular in ecological studies. A good example of this trend is the use of joint species distribution models to ask questions about ecological communities. However, most current applications of model-based methods do not include phylogenies despite the well-known importance of phylogenetic relationships in shaping species distributions and community composition. In part, this is due to lack of accessible tools allowing ecologists to fit phylogenetic species distribution models easily.To fill this gap, the R package phyr (pronounced fire) implements a suite of metrics, comparative methods and mixed models that use phylogenies to understand and predict community composition and other ecological and evolutionary phenomena. The phyr workhorse functions are implemented in C++ making all calculations and model estimations fast.phyr can fit a variety of models such as phylogenetic joint-species distribution models, spatiotemporal-phylogenetic autocorrelation models, and phylogenetic trait-based bipartite network models. phyr also estimates phylogenetically independent trait correlations with measurement error to test for adaptive syndromes and performs fast calculations of common alpha and beta phylogenetic diversity metrics. All phyr methods are united under Brownian motion or Ornstein-Uhlenbeck models of evolution and phylogenetic terms are modelled as phylogenetic covariance matrices.The functions and model formula syntax we propose in phyr serves as a simple and unified framework that ignites the use of phylogenies to address a variety of ecological questions.

Download Full-text

Poisson regression for linguists: A tutorial introduction to modeling count data with brms

10.31219/osf.io/93kaf ◽

2021 ◽

Author(s):

Bodo Winter ◽

Paul - Christian Bürkner

Keyword(s):

Logistic Regression ◽

Poisson Distribution ◽

Upper Bound ◽

Count Data ◽

Poisson Regression ◽

R Package ◽

Canonical Distribution ◽

Discourse Particles ◽

Hands On ◽

Case Markers

Count data is prevalent in many different areas of linguistics, such as when counting words, syntactic constructions, discourse particles, case markers, or speech errors. The Poisson distribution is the canonical distribution for characterizing count data with no or unknown upper bound. Whereas logistic regression is very common in linguistics, Poisson regression is little known. This tutorial introduces readers to foundational concepts needed for Poisson regression, followed by a hands-on tutorial using the R package brms. We discuss a dataset where Catalan and Korean speakers change the frequency of their co-speech gestures as a function of politeness contexts. This dataset also involves exposure variables (the incorporation of time to deal with unequal intervals) and overdispersion (excess variance). Altogether, we hope that more linguists will consider Poisson regression for the analysis of count data.

Download Full-text