scholarly journals BoskR – Testing Adequacy of Diversification Models Using Tree Shape

Author(s):  
Orlando Schwery ◽  
Brian C. O’Meara

AbstractThe study of diversification largely relies on model-based approaches, estimating rates of speciation and extinction from phylogenetic trees. While a plethora of different models exist – all with different features, strengths and weaknesses – there is increasing concern about the reliability of the inference we gain from them. Apart from simply finding the model with the best fit for the data, we should find ways to assess a model’s suitability to describe the data in an absolute sense. The R package BoskR implements a simple way of judging a model’s adequacy for a given phylogeny using metrics for tree shape, assuming that a model is inadequate for a phylogeny if it produces trees that are consistently dissimilar in shape from the tree that should be analyzed. Tree shape is assessed via metrics derived from the tree’s modified graph Laplacian spectrum, as provided by RPANDA. We exemplify the use of the method using simulated and empirical example phylogenies. BoskR was mostly able to correctly distinguish trees simulated under clearly different models and revealed that not all models are adequate for the empirical example trees. We believe the metrics of tree shape to be an intuitive and relevant means of assessing diversification model adequacy. Furthermore, by implementing the approach in an openly available R package, we enable and encourage researchers to adopt adequacy testing into their workflow.

2018 ◽  
Author(s):  
Gerry Tonkin-Hill ◽  
John A. Lees ◽  
Stephen D. Bentley ◽  
Simon D.W. Frost ◽  
Jukka Corander

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.


2021 ◽  
Author(s):  
Orlando Schwery ◽  
Brian C. O’Meara

AbstractTo investigate how biodiversity arose, the field of macroevolution largely relies on model-based approaches to estimate rates of diversification and what factors influence them. The number of available models is rising steadily, facilitating the modeling of an increasing number of possible diversification dynamics, and multiple hypotheses relating to what fueled or stifled lineage accumulation within groups of organisms. However, growing concerns about unchecked biases and limitations in the employed models suggest the need for rigorous validation of methods used to infer. Here, we address two points: the practical use of model adequacy testing, and what model adequacy can tell us about the overall state of diversification models. Using a large set of empirical phylogenies, and a new approach to test models using aspects of tree shape, we test how a set of staple models performs with regards to adequacy. Patterns of adequacy are described across trees and models and causes for inadequacy – particularly if all models are inadequate – are explored. The findings make clear that overall, only few empirical phylogenies cannot be described by at least one model. However, finding that the best fitting of a set of models might not necessarily be adequate makes clear that adequacy testing should become a step in the standard procedures for diversification studies.


2019 ◽  
Vol 1 (1) ◽  
Author(s):  
D C Blackburn ◽  
G Giribet ◽  
D E Soltis ◽  
E L Stanley

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.


2019 ◽  
Author(s):  
Antton Alberdi ◽  
M Thomas P Gilbert

AbstractHill numbers provide a powerful framework for measuring, comparing and partitioning the diversity of biological systems as characterised using high throughput DNA sequencing approaches. In order to facilitate the implementation of Hill numbers into such analyses, whether focusing on diet reconstruction, microbial community profiling or more general ecosystem characterisation analyses, we present a new R package. ‘Hilldiv’ provides a set of functions to assist analysis of diversity based on Hill numbers, using count tables (e.g. OTU, ASV) and associated phylogenetic trees as inputs. Multiple functionalities of the library are introduced, including diversity measurement, diversity profile plotting, diversity comparison between samples and groups, multi-level diversity partitioning and (dis)similarity measurement. All of these are grounded in abundance-based and incidence-based Hill numbers, and can accommodate phylogenetic or functional correlation among OTUs or ASVs. The package can be installed from CRAN or Github, and tutorials and example scripts can be found in the package’s page (https://github.com/anttonalberdi/hilldiv).


2005 ◽  
Vol 480-481 ◽  
pp. 197-200
Author(s):  
Y. Sayad ◽  
A. Nouiri

An increasing of donor centres has been detected in n-InSb when it was submitted to anneal/quench with various annealing temperature (450 °C - 850 °C) and various annealing time (5 - 100 hours). A theoretical study of the kinetics of the conduction conversion of n-InSb at temperature annealing above 250 °C has been made. The present analysis indicates that the donor concentration increases with increasing of annealing time. In order to study this variation and to give a model for donor centres generated, a proposed model based on the simple kinetic is used to fit the variation of donor concentration as a function of annealing time. However, from the best fit of experimental data using the proposed model, the activation energy is determined.


2016 ◽  
Vol 7 (5) ◽  
pp. 589-597 ◽  
Author(s):  
Hélène Morlon ◽  
Eric Lewitus ◽  
Fabien L. Condamine ◽  
Marc Manceau ◽  
Julien Clavel ◽  
...  
Keyword(s):  

2019 ◽  
Vol 37 (2) ◽  
pp. 599-603 ◽  
Author(s):  
Li-Gen Wang ◽  
Tommy Tsan-Yuk Lam ◽  
Shuangbin Xu ◽  
Zehan Dai ◽  
Lang Zhou ◽  
...  

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.


2015 ◽  
Author(s):  
Alexander Zizka ◽  
Alexandre Antonelli

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.


2017 ◽  
Vol 76 (9) ◽  
Author(s):  
Parmeet Singh Bhatia ◽  
Serge Iovleff ◽  
Gérard Govaert
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document