RPANDA
            : an R package for macroevolutionary analyses on phylogenetic trees

AbstractHill numbers provide a powerful framework for measuring, comparing and partitioning the diversity of biological systems as characterised using high throughput DNA sequencing approaches. In order to facilitate the implementation of Hill numbers into such analyses, whether focusing on diet reconstruction, microbial community profiling or more general ecosystem characterisation analyses, we present a new R package. ‘Hilldiv’ provides a set of functions to assist analysis of diversity based on Hill numbers, using count tables (e.g. OTU, ASV) and associated phylogenetic trees as inputs. Multiple functionalities of the library are introduced, including diversity measurement, diversity profile plotting, diversity comparison between samples and groups, multi-level diversity partitioning and (dis)similarity measurement. All of these are grounded in abundance-based and incidence-based Hill numbers, and can accommodate phylogenetic or functional correlation among OTUs or ASVs. The package can be installed from CRAN or Github, and tutorials and example scripts can be found in the package’s page (https://github.com/anttonalberdi/hilldiv).

Download Full-text

Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

Molecular Biology and Evolution ◽

10.1093/molbev/msz240 ◽

2019 ◽

Vol 37 (2) ◽

pp. 599-603 ◽

Cited By ~ 25

Author(s):

Li-Gen Wang ◽

Tommy Tsan-Yuk Lam ◽

Shuangbin Xu ◽

Zehan Dai ◽

Lang Zhou ◽

...

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

R Package ◽

External Data ◽

Input And Output ◽

Evolutionary Context ◽

Tree Data ◽

Downstream Analysis ◽

Different Sources ◽

Associated Data

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

Download Full-text

BoskR – Testing Adequacy of Diversification Models Using Tree Shape

10.1101/2020.12.21.423829 ◽

2020 ◽

Cited By ~ 1

Author(s):

Orlando Schwery ◽

Brian C. O’Meara

Keyword(s):

Phylogenetic Trees ◽

Graph Laplacian ◽

R Package ◽

Laplacian Spectrum ◽

Tree Shape ◽

Model Adequacy ◽

Model Based ◽

Absolute Sense ◽

Best Fit

AbstractThe study of diversification largely relies on model-based approaches, estimating rates of speciation and extinction from phylogenetic trees. While a plethora of different models exist – all with different features, strengths and weaknesses – there is increasing concern about the reliability of the inference we gain from them. Apart from simply finding the model with the best fit for the data, we should find ways to assess a model’s suitability to describe the data in an absolute sense. The R package BoskR implements a simple way of judging a model’s adequacy for a given phylogeny using metrics for tree shape, assuming that a model is inadequate for a phylogeny if it produces trees that are consistently dissimilar in shape from the tree that should be analyzed. Tree shape is assessed via metrics derived from the tree’s modified graph Laplacian spectrum, as provided by RPANDA. We exemplify the use of the method using simulated and empirical example phylogenies. BoskR was mostly able to correctly distinguish trees simulated under clearly different models and revealed that not all models are adequate for the empirical example trees. We believe the metrics of tree shape to be an intuitive and relevant means of assessing diversification model adequacy. Furthermore, by implementing the approach in an openly available R package, we enable and encourage researchers to adopt adequacy testing into their workflow.

Download Full-text

speciesgeocodeR: An R package for linking species occurrences, user-defined regions and phylogenetic trees for biogeography, ecology and evolution

10.1101/032755 ◽

2015 ◽

Cited By ~ 6

Author(s):

Alexander Zizka ◽

Alexandre Antonelli

Keyword(s):

Data Quality ◽

Phylogenetic Trees ◽

Large Scale ◽

Data Cleaning ◽

R Package ◽

Species Occurrence ◽

Occurrence Data ◽

User Friendly ◽

Species Occurrences

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.

Download Full-text

PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

10.7287/peerj.preprints.1609v1 ◽

2015 ◽

Author(s):

Timothy G Stephens ◽

Debashish Bhattacharya ◽

Mark A Ragan ◽

Cheong Xin Chan

Keyword(s):

Phylogenetic Trees ◽

R Package ◽

Command Line ◽

Flexible Tool ◽

Command Line Tool ◽

Whole Tree

A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, such as robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for sorting phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa, with customisable options to assess clades within the context of the whole tree. PhySortR is a command-line tool that is freely available, highly scalable, and easily automatable.

Download Full-text

BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12199 ◽

2014 ◽

Vol 5 (7) ◽

pp. 701-707 ◽

Cited By ~ 404

Author(s):

Daniel L. Rabosky ◽

Michael Grundler ◽

Carlos Anderson ◽

Pascal Title ◽

Jeff J. Shi ◽

...

Keyword(s):

Phylogenetic Trees ◽

Evolutionary Dynamics ◽

R Package

Download Full-text

ratematrix: An R package for studying evolutionary integration among several traits on phylogenetic trees

Methods in Ecology and Evolution ◽

10.1111/2041-210x.12826 ◽

2017 ◽

Vol 8 (12) ◽

pp. 1920-1927 ◽

Cited By ~ 22

Author(s):

Daniel S. Caetano ◽

Luke J. Harmon

Keyword(s):

Phylogenetic Trees ◽

R Package

Download Full-text

Fast Hierarchical Bayesian Analysis of Population Structure

10.1101/454355 ◽

2018 ◽

Cited By ~ 1

Author(s):

Gerry Tonkin-Hill ◽

John A. Lees ◽

Stephen D. Bentley ◽

Simon D.W. Frost ◽

Jukka Corander

Keyword(s):

Dirichlet Process ◽

Phylogenetic Trees ◽

Marginal Likelihood ◽

Simulated Data ◽

R Package ◽

Multilocus Genotype ◽

Dirichlet Process Mixture ◽

Model Based Clustering ◽

Hierarchical Bayesian Analysis ◽

Model Based

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

Download Full-text

Bayesian inference of ancestral dates on bacterial phylogenetic trees

10.1101/347385 ◽

2018 ◽

Author(s):

Xavier Didelot ◽

Nicholas J Croucher ◽

Stephen D Bentley ◽

Simon R Harris ◽

Daniel J Wilson

Keyword(s):

Phylogenetic Trees ◽

Single Species ◽

R Package ◽

Bacterial Genomes ◽

Phylogenetic Methods ◽

Bacterial Genomics ◽

Wide Range ◽

Genomic Studies ◽

Dated Phylogeny ◽

Phylogenetic Method

ABSTRACTThe sequencing and comparative analysis of a collection of bacterial genomes from a single species or lineage of interest can lead to key insights into its evolution, ecology or epidemiology. The tool of choice for such a study is often to build a phylogenetic tree, and more specifically when possible a dated phylogeny, in which the dates of all common ancestors are estimated. Here we propose a new Bayesian methodology to construct dated phylogenies which is specifically designed for bacterial genomics. Unlike previous Bayesian methods aimed at building dated phylogenies, we consider that the phylogenetic relationships between the genomes have been previously evaluated using a standard phylogenetic method, which makes our methodology much faster and scalable. This two-steps approach also allows us to directly exploit existing phylogenetic methods that detect bacterial recombination, and therefore to account for the effect of recombination in the construction of a dated phylogeny. We analysed many simulated datasets in order to benchmark the performance of our approach in a wide range of situations. Furthermore, we present applications to three different real datasets from recent bacterial genomic studies. Our methodology is implemented in a R package called BactDating which is freely available for download at https://github.com/xavierdidelot/BactDating.

Download Full-text

On the automatic annotation of gene functions using observational data and phylogenetic trees

10.1101/2020.05.14.095687 ◽

2020 ◽

Author(s):

George G. Vega Yon ◽

Duncan C. Thomas ◽

John Morrison ◽

Huaiyu Mi ◽

Paul D. Thomas ◽

...

Keyword(s):

Gene Function ◽

Phylogenetic Trees ◽

Evolutionary Model ◽

Computational Prediction ◽

Gene Families ◽

R Package ◽

Biomedical Sciences ◽

Computationally Efficient ◽

Link Type ◽

Gene Functions

AbstractMotivationGene function annotation is important for a variety of downstream analyses of genetic data. Yet experimental characterization of function remains costly and slow, making computational prediction an important endeavor. In this paper we use a probabilistic evolutionary model built upon phylogenetic trees and experimental Gene Ontology functional annotations that allows automated prediction of function for unannotated genes.ResultsWe have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out validation, and we further validated some of the predictions in the experimental scientific literature.AvailabilityOur method has been implemented as an R package and it is available online at https://github.com/USCBiostats/aphylo. Code needed to reproduce the tables and figures can be found in https://github.com/USCbiostats/aphylo-simulations.Author summaryUnderstanding the individual role that genes play in life is a key issue in biomedical-sciences. While information regarding gene functions is continuously growing, the number of genes with unknown biological purpose is yet greater. Because of this, scientists have dedicated much of their time to build and design tools that automatically infer gene functions. In this paper, we present yet another attempt to do such. While very simple, our model of gene-function evolution has some key features that have the potential to generate an impact in the field: (a) compared to other methods, ours is highly-scalable, which means that it is possible to simultaneously analyze hundreds of what are known as gene-families, compromising thousands of genes, (b) supports our biological intuition as our model’s data-driven results coherently agree with what theory dictates regarding how gene-functions evolved, (c) notwithstanding its simplicity, the model’s prediction accuracy is comparable to other more complex alternatives, and (d) perhaps most importantly, it can be used to both support new annotations and to suggest areas in which existing annotations show inconsistencies that may indicate errors or controversies.

Download Full-text

RPANDA : an R package for macroevolutionary analyses on phylogenetic trees

hilldiv: an R package for the integral analysis of diversity based on Hill numbers

Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

BoskR – Testing Adequacy of Diversification Models Using Tree Shape

speciesgeocodeR: An R package for linking species occurrences, user-defined regions and phylogenetic trees for biogeography, ecology and evolution

PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

BAMMtools: an R package for the analysis of evolutionary dynamics on phylogenetic trees

ratematrix: An R package for studying evolutionary integration among several traits on phylogenetic trees

Fast Hierarchical Bayesian Analysis of Population Structure

Bayesian inference of ancestral dates on bacterial phylogenetic trees

On the automatic annotation of gene functions using observational data and phylogenetic trees