phylogeny estimation
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 17)

H-INDEX

24
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Robert J Asher ◽  
Martin R Smith

Abstract An unprecedented amount of evidence now illuminates the phylogeny of living mammals and birds on the Tree of Life. We use this tree to measure phylogenetic value of data typically used in paleontology (bones and teeth) from six datasets derived from five published studies. We ask three interrelated questions: 1) Can these data adequately reconstruct known parts of the Tree of Life? 2) Is accuracy generally similar for studies using morphology, or do some morphological datasets perform better than others? 3) Does the loss of non-fossilizable data cause taxa to occur in misleadingly basal positions? Adding morphology to DNA datasets usually increases congruence of resulting topologies to the well corroborated tree, but this varies among morphological datasets. Extant taxa with a high proportion of missing morphological characters can greatly reduce phylogenetic resolution when analyzed together with fossils. Attempts to ameliorate this by deleting extant taxa missing morphology are prone to decreased accuracy due to long-branch artefacts. We find no evidence that fossilization causes extinct taxa to incorrectly appear at or near topologically basal branches. Morphology comprises the evidence held in common by living taxa and fossils, and phylogenetic analysis of fossils greatly benefits from inclusion of molecular and morphological data sampled for living taxa, whatever methods are used for phylogeny estimation.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Xilin Yu ◽  
Thien Le ◽  
Sarah A. Christensen ◽  
Erin K. Molloy ◽  
Tandy Warnow

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS.


2021 ◽  
Author(s):  
Xilin Yu ◽  
Thien Le ◽  
Sarah A. Christensen ◽  
Erin K. Molloy ◽  
Tandy Warnow

Abstract One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a ``supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP -hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS


2021 ◽  
Author(s):  
Andreas Rempel ◽  
Roland Wittler

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]


Author(s):  
Siarhei A. Dabravolski

Abstract Background Nitroreductases are a family of evolutionarily related proteins catalyzing the reduction of nitro-substituted compounds. Nitroreductases are widespread enzymes, but nearly all modern research and practical application have been concentrated on the bacterial proteins, mainly nitroreductases of Escherichia coli. The main aim of this study is to describe the phylogenic distribution of the nitroreductases in the photosynthetic eukaryotes (Viridiplantae) to highlight their structural similarity and areas for future research and application. Results This study suggests that homologs of nitroreductase proteins are widely presented also in Viridiplantae. Maximum likelihood phylogenetic tree reconstruction method and comparison of the structural models suggest close evolutional relation between cyanobacterial and Viridiplantae nitroreductases. Conclusions This study provides the first attempt to understand the evolution of nitroreductase protein family in Viridiplantae. Our phylogeny estimation and preservation of the chloroplasts/mitochondrial localization indicate the evolutional origin of the plant nitroreductases from the cyanobacterial endosymbiont. A defined high level of the similarity on the structural level suggests conservancy also for the functions. Directions for the future research and industrial application of the Viridiplantae nitroreductases are discussed.


Author(s):  
Vladimir Smirnov ◽  
Tandy Warnow

Abstract Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We compare two basic approaches: (1) computing an alignment on the full dataset and then computing a maximum likelihood tree on the alignment, or (2) constructing an alignment and tree on the full length sequences and then using phylogenetic placement to add the remaining sequences (which will generally be fragmentary) into the tree. We explore these two approaches on a range of simulated datasets, each with 1000 sequences and varying in rates of evolution, and two biological datasets. Our study shows some striking performance differences between methods, especially when there is substantial sequence length heterogeneity and high rates of evolution. We find in particular that using UPP to align sequences and RAxML to compute a tree on the alignment provides the best accuracy, substantially outperforming trees computed using phylogenetic placement methods. We also find that FastTree has poor accuracy on alignments containing fragmentary sequences. Overall, our study provides insights into the literature comparing different methods and pipelines for phylogenetic estimation, and suggests directions for future method development. [Phylogeny estimation, sequence length heterogeneity, phylogenetic placement.]


Author(s):  
Xilin Yu ◽  
Thien Le ◽  
Sarah A. Christensen ◽  
Erin K. Molloy ◽  
Tandy Warnow

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS.


2019 ◽  
Author(s):  
Tasfia Zahin ◽  
Md. Hasin Abrar ◽  
Mizanur Rahman ◽  
Tahrina Tasnim ◽  
Md. Shamsuzzoha Bayzid ◽  
...  

AbstractPhylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.


Sign in / Sign up

Export Citation Format

Share Document