New substitution models for rooting phylogenetic trees

The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made.

Download Full-text

Quartet Compatibility and the Quartet Graph

The Electronic Journal of Combinatorics ◽

10.37236/827 ◽

2008 ◽

Vol 15 (1) ◽

Cited By ~ 6

Author(s):

Stefan Grünewald ◽

Peter J. Humphries ◽

Charles Semple

Keyword(s):

Phylogenetic Tree ◽

Evolutionary Biology ◽

Phylogenetic Trees ◽

Chordal Graphs ◽

Minimum Number ◽

Edge Colourings ◽

Computational Difficulty ◽

Published Result

A collection ${\cal P}$ of phylogenetic trees is compatible if there exists a single phylogenetic tree that displays each of the trees in ${\cal P}$. Despite its computational difficulty, determining the compatibility of ${\cal P}$ is a fundamental task in evolutionary biology. Characterizations in terms of chordal graphs have been previously given for this problem as well as for the closely-related problems of (i) determining if ${\cal P}$ is definitive and (ii) determining if ${\cal P}$ identifies a phylogenetic tree. In this paper, we describe new characterizations of each of these problems in terms of edge colourings. Furthermore, making use of the tools that underlie these new characterizations, we also determine the minimum number of quartets required to identify an arbitrary phylogenetic tree, thus correcting a previously published result.

Download Full-text

Revticulate: An R framework for interaction with RevBayes

10.32942/osf.io/rsf85 ◽

2021 ◽

Author(s):

Caleb P. Charpentier ◽

April Wright

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Probability Distributions ◽

Reproducible Research ◽

Document Preparation ◽

Hierarchical Bayesian Models ◽

Data Types ◽

Phylogenetic Methods ◽

Rapid Generation ◽

R Functions

1: Phylogenetic methods are increasingly complex. Researchers need to make many choices about how to model different aspects of the data appropriately. It is increasingly common to deploy hierarchical Bayesian models in which different data types may be described by different processes. This necessitates tools to help users understand model assumptions more clearly.2: We describe the package \code{Revticulate}, which provides an R-based interface to the software RevBayes. RevBayes is a Bayesian phylogenetics program that implements an R-like computing language, but does not interface with R itself. Revticulate was designed to allow communication between an R session, and all of its associated capabilities, such as plotting and simulation, and a RevBayes session.3: Revticulate can be used to copy objects from RevBayes into R. We provide several usage examples demonstrating how objects, such as such as random variables drawn from probability distributions and phylogenetic trees, can be generated in RevBayes. We then show how these objects can be used with R's phylogenetic ecosystem to plot a phylogenetic tree, or with base R functions to simulate the behavior of a particular probability. 4: Revticulate is a broadly useful software. Revticulate can be used alongside popular document preparation packages, such as Knitr and pkgdown to generate attractive reports, tutorials, and websites. This means that researchers who are looking to communicate their work in RevBayes can do that very easily using Revticulate, enabling rapid generation of reproducible research outputs.

Download Full-text

Darwin's two competing phylogenetic trees: marsupials as ancestors or sister taxa?

Archives of Natural History ◽

10.3366/anh.2012.0091 ◽

2012 ◽

Vol 39 (2) ◽

pp. 217-233 ◽

Cited By ~ 4

Author(s):

J. David Archibald

Keyword(s):

Fossil Record ◽

Evolutionary Biology ◽

Phylogenetic Trees ◽

Charles Darwin ◽

The Other ◽

Charles Lyell ◽

Single Origin ◽

Molecular Studies ◽

Richard Owen ◽

First Time

Studies of the origin and diversification of major groups of plants and animals are contentious topics in current evolutionary biology. This includes the study of the timing and relationships of the two major clades of extant mammals – marsupials and placentals. Molecular studies concerned with marsupial and placental origin and diversification can be at odds with the fossil record. Such studies are, however, not a recent phenomenon. Over 150 years ago Charles Darwin weighed two alternative views on the origin of marsupials and placentals. Less than a year after the publication of On the origin of species, Darwin outlined these in a letter to Charles Lyell dated 23 September 1860. The letter concluded with two competing phylogenetic diagrams. One showed marsupials as ancestral to both living marsupials and placentals, whereas the other showed a non-marsupial, non-placental as being ancestral to both living marsupials and placentals. These two diagrams are published here for the first time. These are the only such competing phylogenetic diagrams that Darwin is known to have produced. In addition to examining the question of mammalian origins in this letter and in other manuscript notes discussed here, Darwin confronted the broader issue as to whether major groups of animals had a single origin (monophyly) or were the result of “continuous creation” as advocated for some groups by Richard Owen. Charles Lyell had held similar views to those of Owen, but it is clear from correspondence with Darwin that he was beginning to accept the idea of monophyly of major groups.

Download Full-text

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Journal of Mathematical Biology ◽

10.1007/s00285-021-01567-5 ◽

2021 ◽

Vol 82 (1-2) ◽

Author(s):

Lena Collienne ◽

Alex Gavryushkin

Keyword(s):

Cancer Research ◽

Computational Complexity ◽

Phylogenetic Tree ◽

Shortest Path ◽

Phylogenetic Trees ◽

Shortest Paths ◽

Nearest Neighbour ◽

Tree Inference ◽

Subtree Prune And Regraft ◽

Comparison Algorithms

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

Download Full-text

Novel metric for hyperbolic phylogenetic tree embeddings

Biology Methods and Protocols ◽

10.1093/biomethods/bpab006 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Hirotaka Matsumoto ◽

Takahiro Mimori ◽

Tsukasa Fukunaga

Keyword(s):

Phylogenetic Tree ◽

Hyperbolic Space ◽

Nearest Neighbor ◽

Cancer Genomics ◽

Geometric Approach ◽

Artificial Intelligence Research ◽

Phylogenetic Methods ◽

Novel Approach ◽

Multiple Trees ◽

Euclidean Embeddings

Abstract Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Genome-scale reconstructions to assess metabolic phylogeny and organism clustering

PLoS ONE ◽

10.1371/journal.pone.0240953 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0240953

Author(s):

Christian Schulz ◽

Eivind Almaas

Keyword(s):

Phylogenetic Trees ◽

Metabolic Networks ◽

Sulfur Metabolism ◽

Phylogenetic Analyses ◽

Tree Of Life ◽

Significant Heterogeneity ◽

Metabolic Reaction ◽

High Quality ◽

Conserved Genes ◽

Genome Scale

Approaches for systematizing information of relatedness between organisms is important in biology. Phylogenetic analyses based on sets of highly conserved genes are currently the basis for the Tree of Life. Genome-scale metabolic reconstructions contain high-quality information regarding the metabolic capability of an organism and are typically restricted to metabolically active enzyme-encoding genes. While there are many tools available to generate draft reconstructions, expert-level knowledge is still required to generate and manually curate high-quality genome-scale metabolic models and to fill gaps in their reaction networks. Here, we use the tool AutoKEGGRec to construct 975 genome-scale metabolic draft reconstructions encoded in the KEGG database without further curation. The organisms are selected across all three domains, and their metabolic networks serve as basis for generating phylogenetic trees. We find that using all reactions encoded, these metabolism-based comparisons give rise to a phylogenetic tree with close similarity to the Tree of Life. While this tree is quite robust to reasonable levels of noise in the metabolic reaction content of an organism, we find a significant heterogeneity in how much noise an organism may tolerate before it is incorrectly placed in the tree. Furthermore, by using the protein sequences for particular metabolic functions and pathway sets, such as central carbon-, nitrogen-, and sulfur-metabolism, as basis for the organism comparisons, we generate highly specific phylogenetic trees. We believe the generation of phylogenetic trees based on metabolic reaction content, in particular when focused on specific functions and pathways, could aid the identification of functionally important metabolic enzymes and be of value for genome-scale metabolic modellers and enzyme-engineers.

Download Full-text

Analysis of SARS-CoV-2 nucleocapsid protein sequence variations in ASEAN countries

Medical Journal of Indonesia ◽

10.13181/mji.oa.215304 ◽

2021 ◽

Author(s):

Mochammad Rajasa Mukti Negara ◽

Ita Krissanti ◽

Gita Widya Pradini

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Protein Sequences ◽

Reference Sequence ◽

N Protein ◽

Asean Country ◽

Sequence Variations ◽

Complete Sequences ◽

Asean Countries ◽

Global Initiative

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2 which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.

Download Full-text

Evolution: medicine’s most basic science

Oxford Textbook of Medicine ◽

10.1093/med/9780199204854.003.020102_update_002 ◽

2010 ◽

pp. 12-15 ◽

Cited By ~ 4

Author(s):

Randolph M. Nesse ◽

Richard Dawkins

Keyword(s):

Population Genetics ◽

Basic Science ◽

Evolutionary Biology ◽

Phylogenetic Trees ◽

Evolutionary Methods

The role of evolutionary biology as a basic science for medicine has been expanding rapidly. Some evolutionary methods are already widely applied in medicine, such as population genetics and methods for analysing phylogenetic trees. Newer applications come from seeking evolutionary as well as proximate explanations for disease. ...

Download Full-text

Evolution: Medicine’s most basic science

Oxford Textbook of Medicine ◽

10.1093/med/9780198746690.003.0008 ◽

2020 ◽

pp. 39-42

Author(s):

Randolph M. Nesse ◽

Richard Dawkins

Keyword(s):

Population Genetics ◽

Natural Selection ◽

Basic Science ◽

Evolutionary Biology ◽

Phylogenetic Trees ◽

The Body ◽

Clinical Implications ◽

Trade Offs ◽

The Cost

The role of evolutionary biology as a basic science for medicine is expanding rapidly. Some evolutionary methods are already widely applied in medicine, such as population genetics and methods for analysing phylogenetic trees. Newer applications come from seeking evolutionary as well as proximate explanations for disease. Traditional medical research is restricted to proximate studies of the body’s mechanism, but separate evolutionary explanations are needed for why natural selection has left many aspects of the body vulnerable to disease. There are six main possibilities: mismatch, infection, constraints, trade-offs, reproduction at the cost of health, and adaptive defences. Like other basic sciences, evolutionary biology has limited direct clinical implications, but it provides essential research methods, encourages asking new questions that foster a deeper understanding of disease, and provides a framework that organizes the facts of medicine.

Download Full-text