phylogeny estimation Latest Research Papers

Abstract An unprecedented amount of evidence now illuminates the phylogeny of living mammals and birds on the Tree of Life. We use this tree to measure phylogenetic value of data typically used in paleontology (bones and teeth) from six datasets derived from five published studies. We ask three interrelated questions: 1) Can these data adequately reconstruct known parts of the Tree of Life? 2) Is accuracy generally similar for studies using morphology, or do some morphological datasets perform better than others? 3) Does the loss of non-fossilizable data cause taxa to occur in misleadingly basal positions? Adding morphology to DNA datasets usually increases congruence of resulting topologies to the well corroborated tree, but this varies among morphological datasets. Extant taxa with a high proportion of missing morphological characters can greatly reduce phylogenetic resolution when analyzed together with fossils. Attempts to ameliorate this by deleting extant taxa missing morphology are prone to decreased accuracy due to long-branch artefacts. We find no evidence that fossilization causes extinct taxa to incorrectly appear at or near topologically basal branches. Morphology comprises the evidence held in common by living taxa and fossils, and phylogenetic analysis of fossils greatly benefits from inclusion of molecular and morphological data sampled for living taxa, whatever methods are used for phylogeny estimation.

Download Full-text

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00189-2 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Mcmc Methods ◽

Supertree Method ◽

Phylogeny Estimation ◽

Source Form ◽

Life On Earth

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS.

Download Full-text

Using Robinson-Foulds Supertrees in Divide-and-Conquer Phylogeny Estimation

10.21203/rs.3.rs-174421/v1 ◽

2021 ◽

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Greedy Heuristic ◽

Mcmc Methods ◽

Np Hard ◽

Phylogeny Estimation ◽

Source Form

Abstract One of the Grand Challenges in Science is the construction of the Tree of Life , an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP -hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a ``supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP -hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS

Download Full-text

SANS serif: alignment-free, whole-genome based phylogenetic reconstruction

10.1101/2020.12.31.424643 ◽

2021 ◽

Author(s):

Andreas Rempel ◽

Roland Wittler

Keyword(s):

Phylogenetic Tree ◽

Source Code ◽

Phylogenetic Reconstruction ◽

Whole Genome ◽

Link Type ◽

Alignment Free ◽

Phylogeny Estimation

AbstractSummarySANS serif is a novel software for alignment-free, whole-genome based phylogeny estimation that follows a pangenomic approach to efficiently calculate a set of splits in a phylogenetic tree or network.Availability and ImplementationImplemented in C++ and supported on Linux, MacOS, and Windows. The source code is freely available for download at https://gitlab.ub.uni-bielefeld.de/gi/[email protected]

Download Full-text

CD-MAWS: An Alignment-free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3136792 ◽

2021 ◽

pp. 1-1

Author(s):

Naser Anjum ◽

Raian Latif Nabil ◽

Rakibul Islam Rafi ◽

Shamsuzzoha Bayzid ◽

M. Saifur Rahman

Keyword(s):

Estimation Method ◽

Alignment Free ◽

Phylogeny Estimation ◽

Cosine Distance

Download Full-text

Evolutionary aspects of the Viridiplantae nitroreductases

Journal of Genetic Engineering and Biotechnology ◽

10.1186/s43141-020-00073-3 ◽

2020 ◽

Vol 18 (1) ◽

Cited By ~ 1

Author(s):

Siarhei A. Dabravolski

Keyword(s):

Structural Similarity ◽

Future Research ◽

Reconstruction Method ◽

Structural Level ◽

Bacterial Proteins ◽

Photosynthetic Eukaryotes ◽

Phylogeny Estimation ◽

High Level ◽

Related Proteins ◽

Evolutionary Aspects

Abstract Background Nitroreductases are a family of evolutionarily related proteins catalyzing the reduction of nitro-substituted compounds. Nitroreductases are widespread enzymes, but nearly all modern research and practical application have been concentrated on the bacterial proteins, mainly nitroreductases of Escherichia coli. The main aim of this study is to describe the phylogenic distribution of the nitroreductases in the photosynthetic eukaryotes (Viridiplantae) to highlight their structural similarity and areas for future research and application. Results This study suggests that homologs of nitroreductase proteins are widely presented also in Viridiplantae. Maximum likelihood phylogenetic tree reconstruction method and comparison of the structural models suggest close evolutional relation between cyanobacterial and Viridiplantae nitroreductases. Conclusions This study provides the first attempt to understand the evolution of nitroreductase protein family in Viridiplantae. Our phylogeny estimation and preservation of the chloroplasts/mitochondrial localization indicate the evolutional origin of the plant nitroreductases from the cyanobacterial endosymbiont. A defined high level of the similarity on the structural level suggests conservancy also for the functions. Directions for the future research and industrial application of the Viridiplantae nitroreductases are discussed.

Download Full-text

Phylogeny Estimation Given Sequence Length Heterogeneity

Systematic Biology ◽

10.1093/sysbio/syaa058 ◽

2020 ◽

Cited By ~ 1

Author(s):

Vladimir Smirnov ◽

Tandy Warnow

Keyword(s):

Method Development ◽

Large Datasets ◽

Full Length ◽

Sequence Length ◽

Phylogenetic Placement ◽

Sequencing Technologies ◽

Rates Of Evolution ◽

Biological Studies ◽

Phylogeny Estimation ◽

Estimation Sequence

Abstract Phylogeny estimation is a major step in many biological studies, and has many well known challenges. With the dropping cost of sequencing technologies, biologists now have increasingly large datasets available for use in phylogeny estimation. Here we address the challenge of estimating a tree given large datasets with a combination of full-length sequences and fragmentary sequences, which can arise due to a variety of reasons, including sample collection, sequencing technologies, and analytical pipelines. We compare two basic approaches: (1) computing an alignment on the full dataset and then computing a maximum likelihood tree on the alignment, or (2) constructing an alignment and tree on the full length sequences and then using phylogenetic placement to add the remaining sequences (which will generally be fragmentary) into the tree. We explore these two approaches on a range of simulated datasets, each with 1000 sequences and varying in rates of evolution, and two biological datasets. Our study shows some striking performance differences between methods, especially when there is substantial sequence length heterogeneity and high rates of evolution. We find in particular that using UPP to align sequences and RAxML to compute a tree on the alignment provides the best accuracy, substantially outperforming trees computed using phylogenetic placement methods. We also find that FastTree has poor accuracy on alignments containing fragmentary sequences. Overall, our study provides insights into the literature comparing different methods and pipelines for phylogenetic estimation, and suggests directions for future method development. [Phylogeny estimation, sequence length heterogeneity, phylogenetic placement.]

Download Full-text

Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

10.1101/2020.05.16.099895 ◽

2020 ◽

Cited By ~ 1

Author(s):

Xilin Yu ◽

Thien Le ◽

Sarah A. Christensen ◽

Erin K. Molloy ◽

Tandy Warnow

Keyword(s):

Optimization Problems ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Tree Of Life ◽

Divide And Conquer ◽

Greedy Heuristic ◽

Mcmc Methods ◽

Supertree Method ◽

Phylogeny Estimation ◽

Source Form

AbstractOne of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS. Exact-RFS-2 and GreedyRFS are available in open source form on Github at github.com/yuxilin51/GreedyRFS.

Download Full-text

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

10.1101/2019.12.13.875526 ◽

2019 ◽

Author(s):

Tasfia Zahin ◽

Md. Hasin Abrar ◽

Mizanur Rahman ◽

Tahrina Tasnim ◽

Md. Shamsuzzoha Bayzid ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Computational Complexity ◽

Maximum Likelihood ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Free ◽

Phylogeny Estimation ◽

Statistical Approaches

AbstractPhylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.

Download Full-text

phylogeny estimation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling

Phylogenetic Signal and Bias in Paleontology

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

Using Robinson-Foulds Supertrees in Divide-and-Conquer Phylogeny Estimation

SANS serif: alignment-free, whole-genome based phylogenetic reconstruction

CD-MAWS: An Alignment-free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets

Evolutionary aspects of the Viridiplantae nitroreductases

Phylogeny Estimation Given Sequence Length Heterogeneity

Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

Export Citation Format

phylogeny estimationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling

Phylogenetic Signal and Bias in Paleontology

Using Robinson-Foulds supertrees in divide-and-conquer phylogeny estimation

Using Robinson-Foulds Supertrees in Divide-and-Conquer Phylogeny Estimation

SANS serif: alignment-free, whole-genome based phylogenetic reconstruction

CD-MAWS: An Alignment-free Phylogeny Estimation Method Using Cosine Distance on Minimal Absent Word Sets

Evolutionary aspects of the Viridiplantae nitroreductases

Phylogeny Estimation Given Sequence Length Heterogeneity

Advancing Divide-and-Conquer Phylogeny Estimation using Robinson-Foulds Supertrees

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

phylogeny estimation
Recently Published Documents