The Probabilities of Trees and Cladograms under Ford’s α-Model

Ford’s α-model is one of the most popular random parametric models of bifurcating phylogenetic tree growth, having as specific instances both the uniform and the Yule models. Its general properties have been used to study the behavior of phylogenetic tree shape indices under the probability distribution it defines. But the explicit formulas provided by Ford for the probabilities of unlabeled trees and phylogenetic trees fail in some cases. In this paper we give correct explicit formulas for these probabilities.

Download Full-text

A MODEL OF MACROEVOLUTION AS A BRANCHING PROCESS BASED ON INNOVATIONS

Advances in Complex Systems ◽

10.1142/s0219525912500439 ◽

2012 ◽

Vol 15 (07) ◽

pp. 1250043 ◽

Cited By ~ 6

Author(s):

STEPHANIE KELLER-SCHMIDT ◽

KONSTANTIN KLEMM

Keyword(s):

Phylogenetic Trees ◽

Branching Process ◽

Average Distance ◽

Random Processes ◽

Mean Values ◽

Tree Shape ◽

Shape Indices ◽

The Mean ◽

Standard Deviations

We introduce a model for the evolution of species triggered by generation of novel features and exhaustive combination with other available traits. Under the assumption that innovations are rare, we obtain a bursty branching process of speciations. Analysis of the trees representing the branching history reveals structures qualitatively different from those of random processes. For a tree with n leaves generated by the introduced model, the average distance of leaves from root scales as ( log n)2 to be compared to log n for random branching. The mean values and standard deviations for the tree shape indices depth (Sackin index) and imbalance (Colless index) of the model are compatible with those of real phylogenetic trees from databases. Earlier models, such as the Aldous' branching (AB) model, show a larger deviation from data with respect to the shape indices.

Download Full-text

Species Selection Regime and Phylogenetic Tree Shape

Systematic Biology ◽

10.1093/sysbio/syz076 ◽

2020 ◽

Vol 69 (4) ◽

pp. 774-794 ◽

Cited By ~ 2

Author(s):

G Anthony Verboom ◽

Florian C Boucher ◽

David D Ackerly ◽

Lara M Wootton ◽

William A Freyman

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Gaussian Function ◽

Major Change ◽

Cape Floristic Region ◽

Species Selection ◽

Diversification Rate ◽

Extinction Rate ◽

Tree Shape ◽

Selection Regime

Abstract Species selection, the effect of heritable traits in generating between-lineage diversification rate differences, provides a valuable conceptual framework for understanding the relationship between traits, diversification, and phylogenetic tree shape. An important challenge, however, is that the nature of real diversification landscapes—curves or surfaces which describe the propensity of species-level lineages to diversify as a function of one or more traits—remains poorly understood. Here, we present a novel, time-stratified extension of the QuaSSE model in which speciation/extinction rate is specified as a static or temporally shifting Gaussian or skewed-Gaussian function of the diversification trait. We then use simulations to show that the generally imbalanced nature of real phylogenetic trees, as well as their generally greater than expected frequency of deep branching events, are typical outcomes when diversification is treated as a dynamic, trait-dependent process. Focusing on four basic models (Gaussian-speciation with and without background extinction; skewed-speciation; Gaussian-extinction), we also show that particular features of the species selection regime produce distinct tree shape signatures and that, consequently, a combination of tree shape metrics has the potential to reveal the species selection regime under which a particular lineage diversified. We evaluate this idea empirically by comparing the phylogenetic trees of plant lineages diversifying within climatically and geologically stable environments of the Greater Cape Floristic Region, with those of lineages diversifying in environments that have experienced major change through the Late Miocene-Pliocene. Consistent with our expectations, the trees of lineages diversifying in a dynamic context are less balanced, show a greater concentration of branching events close to the present, and display stronger diversification rate-trait correlations. We suggest that species selection plays an important role in shaping phylogenetic trees but recognize the need for an explicit probabilistic framework within which to assess the likelihoods of alternative diversification scenarios as explanations of a particular tree shape. [Cape flora; diversification landscape; environmental change; gamma statistic; species selection; time-stratified QuaSSE model; trait-dependent diversification; tree imbalance.]

Download Full-text

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Journal of Mathematical Biology ◽

10.1007/s00285-021-01567-5 ◽

2021 ◽

Vol 82 (1-2) ◽

Author(s):

Lena Collienne ◽

Alex Gavryushkin

Keyword(s):

Cancer Research ◽

Computational Complexity ◽

Phylogenetic Tree ◽

Shortest Path ◽

Phylogenetic Trees ◽

Shortest Paths ◽

Nearest Neighbour ◽

Tree Inference ◽

Subtree Prune And Regraft ◽

Comparison Algorithms

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

Download Full-text

Predicting the Impact of Describing New Species on Phylogenetic Patterns

Integrative Organismal Biology ◽

10.1093/iob/obz028 ◽

2019 ◽

Vol 1 (1) ◽

Cited By ~ 1

Author(s):

D C Blackburn ◽

G Giribet ◽

D E Soltis ◽

E L Stanley

Keyword(s):

New Species ◽

Phylogenetic Trees ◽

Branch Length ◽

Length Variation ◽

Tree Shape ◽

Branch Lengths ◽

Taxonomic History ◽

Ecological Patterns ◽

The Impact ◽

Incomplete Sampling

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Analysis of SARS-CoV-2 nucleocapsid protein sequence variations in ASEAN countries

Medical Journal of Indonesia ◽

10.13181/mji.oa.215304 ◽

2021 ◽

Author(s):

Mochammad Rajasa Mukti Negara ◽

Ita Krissanti ◽

Gita Widya Pradini

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Protein Sequences ◽

Reference Sequence ◽

N Protein ◽

Asean Country ◽

Sequence Variations ◽

Complete Sequences ◽

Asean Countries ◽

Global Initiative

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2 which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.

Download Full-text

INFERRING PHYLOGENETIC RELATIONSHIPS AVOIDING FORBIDDEN ROOTED TRIPLETS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006001709 ◽

2006 ◽

Vol 04 (01) ◽

pp. 59-74 ◽

Cited By ~ 20

Author(s):

YING-JUN HE ◽

TRINH N. D. HUYNH ◽

JESPER JANSSON ◽

WING-KIN SUNG

Keyword(s):

Approximation Algorithms ◽

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Phylogenetic Network ◽

Evolutionary Relationships ◽

Large Set ◽

Tree Network ◽

History Of ◽

Overlapping Sets

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set [Formula: see text] and none of the rooted triplets in another given set [Formula: see text]. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.

Download Full-text

Visualizing Speciation in Artificial Cichlid Fish

Artificial Life ◽

10.1162/artl.2006.12.2.243 ◽

2006 ◽

Vol 12 (2) ◽

pp. 243-257 ◽

Cited By ~ 3

Author(s):

Ross Clement

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Cichlid Fish ◽

Natural Consequence ◽

Open Problems ◽

Visualization System ◽

Low Level ◽

Wide Range ◽

Level Information ◽

History Of

The Cichlid Speciation Project (CSP) is an ALife simulation system for investigating open problems in the speciation of African cichlid fish. The CSP can be used to perform a wide range of experiments that show that speciation is a natural consequence of certain biological systems. A visualization system capable of extracting the history of speciation from low-level trace data and creating a phylogenetic tree has been implemented. Unlike previous approaches, this visualization system presents a concrete trace of speciation, rather than a summary of low-level information from which the viewer can make subjective decisions on how speciation progressed. The phylogenetic trees are a more objective visualization of speciation, and enable automated collection and summarization of the results of experiments. The visualization system is used to create a phylogenetic tree from an experiment that models sympatric speciation.

Download Full-text

The mean number of alleles in multigene families

Advances in Applied Probability ◽

10.1017/s0001867800024149 ◽

1992 ◽

Vol 24 (01) ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

G. A. Watterson

Keyword(s):

Probability Distribution ◽

Gene Conversion ◽

Random Sample ◽

Multigene Families ◽

Explicit Formulas ◽

Large Samples ◽

The Mean

The paper considers a random sample of r chromosomes, each having n genes subject to intrachromosomal gene conversion, and mutation. The probability distribution and moments for the number of alleles present is investigated, when the number, k, of possible alleles at each locus, is either finite or infinite. Explicit formulas are given for the mean numbers of alleles on r = 1, 2, or 3 chromosomes, which simplify previously known results. For fixed r, in the infinitely-many-alleles case, the mean number increases asymptotically like r θ log (n) as n→∞, where θ is a mutation parameter. But results for large samples remain elusive.

Download Full-text

Treeio: An R Package for Phylogenetic Tree Input and Output with Richly Annotated and Associated Data

Molecular Biology and Evolution ◽

10.1093/molbev/msz240 ◽

2019 ◽

Vol 37 (2) ◽

pp. 599-603 ◽

Cited By ~ 25

Author(s):

Li-Gen Wang ◽

Tommy Tsan-Yuk Lam ◽

Shuangbin Xu ◽

Zehan Dai ◽

Lang Zhou ◽

...

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

R Package ◽

External Data ◽

Input And Output ◽

Evolutionary Context ◽

Tree Data ◽

Downstream Analysis ◽

Different Sources ◽

Associated Data

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.

Download Full-text