scholarly journals On Sackin's original proposal: The variance of the leaves' depths as a phylogenetic balance index

2019 ◽  
Author(s):  
Tomás Martínez Coronado ◽  
Arnau Mir ◽  
Francesc Rossello ◽  
Lucía Rotger

Abstract Background: The Sackin index S of a rooted phylogenetic tree, defined as the sum of its leaves' depths, is one of the most popular balance indices in phylogenetics, and Sackin's 1972 paper is usually cited as the source for this index. However, what Sackin actually proposed in his paper as a measure of the imbalance of a rooted tree was not the sum of its leaves' depths, but their "variation". This proposal was later implemented as the variance of the leaves' depths by Kirkpatrick and Slatkin, where moreover they posed the problem of finding a closed formula for its expected value under the Yule model. Nowadays, Sackin's original proposal seems to have passed into oblivion in the phylogenetics literature, replaced by the index bearing his name, which, in fact, was introduced a decade later by Sokal.Results: In this paper we study the properties of the variance of the leaves' depths, V, as a balance index. Firstly, we prove that the rooted trees with n leaves and maximum V value are exactly the combs with n leaves. But although V achieves its minimum value on every space BT_n of bifurcating rooted phylogenetic trees with n< 184 leaves at the so-called "maximally balanced trees" with n leaves, this property fails for almost every n>= 184. We provide then an algorithm that finds in O(n) time the trees in BT_n with minimum V value. Secondly, we obtain closed formulas for the expected V value of a bifurcating rooted tree with any number n of leaves under the Yule and the uniform models and, as a by-product of the computations leading to these formulas, we also obtain closed formulas for the variance of the Sackin index and the total cophenetic indexof a bifurcating rooted tree, as well as of their covariance, under the uniform model, thus filling this gap in the literature.Conclusions: The phylogenetics crowd has been wise in preferring as a balance index the sum S(T) of the leaves’ depths of a phylogenetic tree T over their variance V (T), because the latter does not seem to capture correctly the notion of balance of large bifurcating rooted trees. But for bifurcating trees up to 183 leaves, V is a valid and useful balance index.

2020 ◽  
Author(s):  
Tomás Martínez Coronado ◽  
Arnau Mir ◽  
Francesc Rossello ◽  
Lucía Rotger

Abstract Background. The Sackin index S of a rooted phylogenetic tree, defined as the sum of its leaves' depths, is one of the most popular balance indices in phylogenetics, and Sackin's 1972 paper is usually cited as the source for this index. However, what Sackin actually proposed in his paper as a measure of the imbalance of a rooted tree was not the sum of its leaves' depths, but their ``variation''. This proposal was later implemented as the variance of the leaves' depths by Kirkpatrick and Slatkin in 1993, where they also posed the problem of finding a closed formula for its expected value under the Yule model. Nowadays, Sackin's original proposal seems to have passed into oblivion in the phylogenetics literature, replaced by the index bearing his name, which, in fact, was introduced a decade later by Sokal. Results. In this paper we study the properties of the variance of the leaves' depths, V, as a balance index. Firstly, we prove that the rooted trees with $n$ leaves and maximum V value are exactly the combs with n leaves. But although V achieves its minimum value on every space of bifurcating rooted phylogenetic trees with at most 183 leaves at the so-called ``maximally balanced trees'' with n leaves, this property fails for almost every n larger than 184 We provide then an algorithm that finds the bifurcating rooted trees with n leaves and minimum V value in quasilinear time. Secondly, we obtain closed formulas for the expected V value of a bifurcating rooted tree with any number n of leaves under the Yule and the uniform models and, as a by-product of the computations leading to these formulas, we also obtain closed formulas for the variance under the uniform model of the Sackin index and the total cophenetic index of a bifurcating rooted tree, as well as of their covariance, thus filling this gap in the literature.


2020 ◽  
Author(s):  
Tomás Martínez Coronado ◽  
Arnau Mir ◽  
Francesc Rossello ◽  
Lucía Rotger

Abstract Background. The Sackin index S of a rooted phylogenetic tree, defined as the sum of its leaves' depths, is one of the most popular balance indices in phylogenetics, and Sackin's 1972 paper is usually cited as the source for this index. However, what Sackin actually proposed in his paper as a measure of the imbalance of a rooted tree was not the sum of its leaves' depths, but their ``variation''. This proposal was later implemented as the variance of the leaves' depths by Kirkpatrick and Slatkin in 1993, where they also posed the problem of finding a closed formula for its expected value under the Yule model. Nowadays, Sackin's original proposal seems to have passed into oblivion in the phylogenetics literature, replaced by the index bearing his name, which, in fact, was introduced a decade later by Sokal. Results. In this paper we study the properties of the variance of the leaves' depths, V, as a balance index. Firstly, we prove that the rooted trees with $n$ leaves and maximum V value are exactly the combs with n leaves. But although V achieves its minimum value on every space of bifurcating rooted phylogenetic trees with at most 183 leaves at the so-called ``maximally balanced trees'' with n leaves, this property fails for almost every n larger than 184 We provide then an algorithm that finds the bifurcating rooted trees with n leaves and minimum V value in quasilinear time. Secondly, we obtain closed formulas for the expected V value of a bifurcating rooted tree with any number n of leaves under the Yule and the uniform models and, as a by-product of the computations leading to these formulas, we also obtain closed formulas for the variance under the uniform model of the Sackin index and the total cophenetic index of a bifurcating rooted tree, as well as of their covariance, thus filling this gap in the literature.


2021 ◽  
Vol 82 (1-2) ◽  
Author(s):  
Lena Collienne ◽  
Alex Gavryushkin

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Author(s):  
Mochammad Rajasa Mukti Negara ◽  
Ita Krissanti ◽  
Gita Widya Pradini

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2  which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.


2006 ◽  
Vol 04 (01) ◽  
pp. 59-74 ◽  
Author(s):  
YING-JUN HE ◽  
TRINH N. D. HUYNH ◽  
JESPER JANSSON ◽  
WING-KIN SUNG

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set [Formula: see text] and none of the rooted triplets in another given set [Formula: see text]. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.


2006 ◽  
Vol 12 (2) ◽  
pp. 243-257 ◽  
Author(s):  
Ross Clement

The Cichlid Speciation Project (CSP) is an ALife simulation system for investigating open problems in the speciation of African cichlid fish. The CSP can be used to perform a wide range of experiments that show that speciation is a natural consequence of certain biological systems. A visualization system capable of extracting the history of speciation from low-level trace data and creating a phylogenetic tree has been implemented. Unlike previous approaches, this visualization system presents a concrete trace of speciation, rather than a summary of low-level information from which the viewer can make subjective decisions on how speciation progressed. The phylogenetic trees are a more objective visualization of speciation, and enable automated collection and summarization of the results of experiments. The visualization system is used to create a phylogenetic tree from an experiment that models sympatric speciation.


2019 ◽  
Vol 37 (2) ◽  
pp. 599-603 ◽  
Author(s):  
Li-Gen Wang ◽  
Tommy Tsan-Yuk Lam ◽  
Shuangbin Xu ◽  
Zehan Dai ◽  
Lang Zhou ◽  
...  

Abstract Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.


2013 ◽  
Vol 10 (3) ◽  
pp. 16-30 ◽  
Author(s):  
José Ignacio Requeno ◽  
José Manuel Colom

Summary Model checking, a generic and formal paradigm stemming from computer science based on temporal logics, has been proposed for the study of biological properties that emerge from the labeling of the states defined over the phylogenetic tree. This strategy allows us to use generic software tools already present in the industry. However, the performance of traditional model checking is penalized when scaling the system for large phylogenies. To this end, two strategies are presented here. The first one consists of partitioning the phylogenetic tree into a set of subgraphs each one representing a subproblem to be verified so as to speed up the computation time and distribute the memory consumption. The second strategy is based on uncoupling the information associated to each state of the phylogenetic tree (mainly, the DNA sequence) and exporting it to an external tool for the management of large information systems. The integration of all these approaches outperforms the results of monolithic model checking and helps us to execute the verification of properties in a real phylogenetic tree.


Zootaxa ◽  
2011 ◽  
Vol 2771 (1) ◽  
pp. 41 ◽  
Author(s):  
SARP KAYA ◽  
BATTAL CIPLAK

Among the Anatolian Tettigoniinae (Orthoptera, Tettigoniidae) the genera Anterastes, Koroglus, Sureyaella and Rhacocleis are distinguishable from the others by presence of one pair of spurs on the apico-ventral end of hind tibiae. The last two can be easily distinguished from the others by several distinct features, but the separation of the first two from each other is problematic. A new species described here provided opportunity of re-evaluating their taxonomy. The new species Anterastes antecessor sp. n. is described based on morphology, male calling song and genetic data. The taxonomy of Anterastes and Koroglus is rectified based on phylogentic hypotheses obtained from representative 16S rDNA haplotypes. Sureyaella bella, Parapholidoptera signata and Bolua turkiyae are used as out groups in different combinations to obtain a more stable phylogeny. Although analyses with different outgroups suggested the same topology, the phylogenetic tree with outgroups Parapholidoptera signata and Bolua turkiyae resulted with the highest bootstrap supports to the branches. Phylogenetic trees suggested the following relationships for the ingroup species; (A. antecessor sp. n. + ((Koroglus disparalatus + A. uludaghensis) + (A. turcicus + (A. niger + (A. ucari + A. babadaghi))) + ((A. tolunayi + (A. serbicus + A. antitauricus + A. burri)))). Considering the phylogenetic hypotheses and characters used in previous publications, Koroglus is put in synonymy with Anterastes, and a new combination is suggested for the only species of the former Anterastes disparalatus comb. n. A short remark is given about the characters used in the generic taxonomy of the group.


Sign in / Sign up

Export Citation Format

Share Document