hilldiv: an R package for the integral analysis of diversity based on Hill numbers

AbstractHill numbers provide a powerful framework for measuring, comparing and partitioning the diversity of biological systems as characterised using high throughput DNA sequencing approaches. In order to facilitate the implementation of Hill numbers into such analyses, whether focusing on diet reconstruction, microbial community profiling or more general ecosystem characterisation analyses, we present a new R package. ‘Hilldiv’ provides a set of functions to assist analysis of diversity based on Hill numbers, using count tables (e.g. OTU, ASV) and associated phylogenetic trees as inputs. Multiple functionalities of the library are introduced, including diversity measurement, diversity profile plotting, diversity comparison between samples and groups, multi-level diversity partitioning and (dis)similarity measurement. All of these are grounded in abundance-based and incidence-based Hill numbers, and can accommodate phylogenetic or functional correlation among OTUs or ASVs. The package can be installed from CRAN or Github, and tutorials and example scripts can be found in the package’s page (https://github.com/anttonalberdi/hilldiv).

Download Full-text

RPANDA : an R package for macroevolutionary analyses on phylogenetic trees

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12526 ◽

2016 ◽

Vol 7 (5) ◽

pp. 589-597 ◽

Cited By ~ 126

Author(s):

Hélène Morlon ◽

Eric Lewitus ◽

Fabien L. Condamine ◽

Marc Manceau ◽

Julien Clavel ◽

...

Keyword(s):

Phylogenetic Trees ◽

R Package

Download Full-text

cati: an R package using functional traits to detect and quantify multi-level community assembly processes

Ecography ◽

10.1111/ecog.01433 ◽

2015 ◽

Vol 39 (7) ◽

pp. 699-708 ◽

Cited By ~ 33

Author(s):

Adrien Taudiere ◽

Cyrille Violle

Keyword(s):

Functional Traits ◽

Community Assembly ◽

R Package ◽

Multi Level

Download Full-text

Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

Molecular Biology and Evolution ◽

10.1093/molbev/msz240 ◽

2019 ◽

Vol 37 (2) ◽

pp. 599-603 ◽

Cited By ~ 25

Author(s):

Li-Gen Wang ◽

Tommy Tsan-Yuk Lam ◽

Shuangbin Xu ◽

Zehan Dai ◽

Lang Zhou ◽

...

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

R Package ◽

External Data ◽

Input And Output ◽

Evolutionary Context ◽

Tree Data ◽

Downstream Analysis ◽

Different Sources ◽

Associated Data

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

Download Full-text

BoskR – Testing Adequacy of Diversification Models Using Tree Shape

10.1101/2020.12.21.423829 ◽

2020 ◽

Cited By ~ 1

Author(s):

Orlando Schwery ◽

Brian C. O’Meara

Keyword(s):

Phylogenetic Trees ◽

Graph Laplacian ◽

R Package ◽

Laplacian Spectrum ◽

Tree Shape ◽

Model Adequacy ◽

Model Based ◽

Absolute Sense ◽

Best Fit

AbstractThe study of diversification largely relies on model-based approaches, estimating rates of speciation and extinction from phylogenetic trees. While a plethora of different models exist – all with different features, strengths and weaknesses – there is increasing concern about the reliability of the inference we gain from them. Apart from simply finding the model with the best fit for the data, we should find ways to assess a model’s suitability to describe the data in an absolute sense. The R package BoskR implements a simple way of judging a model’s adequacy for a given phylogeny using metrics for tree shape, assuming that a model is inadequate for a phylogeny if it produces trees that are consistently dissimilar in shape from the tree that should be analyzed. Tree shape is assessed via metrics derived from the tree’s modified graph Laplacian spectrum, as provided by RPANDA. We exemplify the use of the method using simulated and empirical example phylogenies. BoskR was mostly able to correctly distinguish trees simulated under clearly different models and revealed that not all models are adequate for the empirical example trees. We believe the metrics of tree shape to be an intuitive and relevant means of assessing diversification model adequacy. Furthermore, by implementing the approach in an openly available R package, we enable and encourage researchers to adopt adequacy testing into their workflow.

Download Full-text

speciesgeocodeR: An R package for linking species occurrences, user-defined regions and phylogenetic trees for biogeography, ecology and evolution

10.1101/032755 ◽

2015 ◽

Cited By ~ 6

Author(s):

Alexander Zizka ◽

Alexandre Antonelli

Keyword(s):

Data Quality ◽

Phylogenetic Trees ◽

Large Scale ◽

Data Cleaning ◽

R Package ◽

Species Occurrence ◽

Occurrence Data ◽

User Friendly ◽

Species Occurrences

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.

Download Full-text

PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

10.7287/peerj.preprints.1609v1 ◽

2015 ◽

Author(s):

Timothy G Stephens ◽

Debashish Bhattacharya ◽

Mark A Ragan ◽

Cheong Xin Chan

Keyword(s):

Phylogenetic Trees ◽

R Package ◽

Command Line ◽

Flexible Tool ◽

Command Line Tool ◽

Whole Tree

A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, such as robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for sorting phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa, with customisable options to assess clades within the context of the whole tree. PhySortR is a command-line tool that is freely available, highly scalable, and easily automatable.

Download Full-text

BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12199 ◽

2014 ◽

Vol 5 (7) ◽

pp. 701-707 ◽

Cited By ~ 404

Author(s):

Daniel L. Rabosky ◽

Michael Grundler ◽

Carlos Anderson ◽

Pascal Title ◽

Jeff J. Shi ◽

...

Keyword(s):

Phylogenetic Trees ◽

Evolutionary Dynamics ◽

R Package

Download Full-text

ratematrix: An R package for studying evolutionary integration among several traits on phylogenetic trees

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12826 ◽

2017 ◽

Vol 8 (12) ◽

pp. 1920-1927 ◽

Cited By ~ 22

Author(s):

Daniel S. Caetano ◽

Luke J. Harmon

Keyword(s):

Phylogenetic Trees ◽

R Package

Download Full-text

Fast Hierarchical Bayesian Analysis of Population Structure

10.1101/454355 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gerry Tonkin-Hill ◽

John A. Lees ◽

Stephen D. Bentley ◽

Simon D.W. Frost ◽

Jukka Corander

Keyword(s):

Dirichlet Process ◽

Phylogenetic Trees ◽

Marginal Likelihood ◽

Simulated Data ◽

R Package ◽

Multilocus Genotype ◽

Dirichlet Process Mixture ◽

Model Based Clustering ◽

Hierarchical Bayesian Analysis ◽

Model Based

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

Download Full-text

Towards a unifying diversity-area relationship (DAR) of species- and gene-diversity

10.1101/2020.05.16.099861 ◽

2020 ◽

Author(s):

Zhanshan (Sam) Ma ◽

Aaron M. Ellison

Keyword(s):

Gene Diversity ◽

Temporal Distribution ◽

Operational Taxonomic Unit ◽

Population Diversity ◽

Human Gut ◽

Individual Level ◽

Successful Case ◽

Species Area ◽

Diversity Profile ◽

Hill Numbers

AbstractAimThe microbiome as a biogeographic entity can be investigated, to the minimum, from two perspectives: one is the spatial/temporal distribution of species (or any level of the operational taxonomic unit or OTU) diversity, and another is the spatial/temporal distribution of metagenomic gene diversity. Both are necessary for comprehensive understanding of the taxonomical, ecological, evolutionary and functional aspects of the microbiome biogeography. Here we propose to investigate the metagenomic diversity-area relationship (m-DAR), which is a transformation of the species-DAR (s-DAR) that extended the classic SAR (species-area relationship) by replacing the species richness with general species diversity measured in Hill numbers.InnovationThe m-DAR and s-DAR, using the same mathematical models, offer a unifying tool for investigating the biogeography of microbiome from ecological, metagenomic and functional perspectives. Specifically, we investigate m-DAR of the human gut metagenome in terms of the MG (metagenomic gene) and MFGC (metagenome functional gene cluster) respectively, by sketching out the DAR-profile, PDO (pair-wise diversity overlap) profile, MAD (maximal accrual diversity) profile, and RIP (ratio of individual- to population-diversity) profile at each scale. These profiles constitute our unifying DAR toolset and can be applied to any microbiomes beyond the human gut microbiome.Main conclusionsWe demonstrate the construction and applications of the m-DAR and its associated four profiles with six large datasets of the human gut metagenomes including three microbiome-associated diseases (obesity, diabetes, IBD) and their healthy controls, supported with randomization tests to determine the differences between healthy and diseased treatments in their m-DAT parameters. Theoretically, our study presents a successful case to demonstrate the feasibility of unifying systematic biogeography vs. evolutionary biogeography, of an inclusive biogeography of plants, animal and microbes. Practically, our approach offers an important tool for investigating the spatial scaling of human metagenome diversity in a population (cohort) and its relationship with individual-level diversity.

Download Full-text