scholarly journals Two C++ Libraries for Counting Trees on a Phylogenetic Terrace

2017 ◽  
Author(s):  
R. Biczok ◽  
P. Bozsoky ◽  
P. Eisenmann ◽  
J. Ernst ◽  
T. Ribizel ◽  
...  

AbstractMotivationThe presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al, (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the unavailability of an efficient library implementation to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace.ResultsIn our bioinformatics programming practical we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating the trees on a terrace. Both implementations yield exactly the same results and are more than one order of magnitude faster and require one order of magnitude less memory than a previous 3rd party python implementation.AvailabilityThe source codes are available under GNU GPL at https://github.com/[email protected]

2020 ◽  
Vol 69 (5) ◽  
pp. 1016-1032 ◽  
Author(s):  
Chi Zhang ◽  
John P Huelsenbeck ◽  
Fredrik Ronquist

Abstract Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as “convergence”) and in estimating the correct proportions of the different types of them (known as “mixing”). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical data sets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these data sets, ranging in size from 357 to 934 taxa and from 1740 to 5681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account. [Bayesian phylogenetic inference; MCMC; parsimony; tree proposal.]


2019 ◽  
Author(s):  
Chi Zhang ◽  
John P. Huelsenbeck ◽  
Fredrik Ronquist

AbstractSampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as ‘convergence’) and in estimating the correct proportions of the different types of them (known as ‘mixing’). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical datasets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these datasets, ranging in size from 357 to 934 taxa and from 1,740 to 5,681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account.


2019 ◽  
Vol 69 (2) ◽  
pp. 280-293 ◽  
Author(s):  
Chris Whidden ◽  
Brian C Claywell ◽  
Thayer Fisher ◽  
Andrew F Magee ◽  
Mathieu Fourment ◽  
...  

Abstract Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, “likelihood” of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method “phylogenetic topographer” (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.


2002 ◽  
Vol 51 (5) ◽  
pp. 740-753 ◽  
Author(s):  
Richard E. Miller ◽  
Thomas R. Buckley ◽  
Paul S. Manos

2016 ◽  
Vol 1 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Basant K. Tiwary

Background/Aims: A recent duplication of the gene encoding SLIT-ROBO Rho GTPase-activating protein 2 (SRGAP2) in the primate lineage has been proposed to be associated with the human-specific extraordinary development of intelligence. There is no report regarding the role of the SRGAP2 gene in the expression of neural traits indicating intelligence in mammals. Methods: A phylogenetic tree of the SRGAP2 gene from 11 mammals was reconstructed using MrBayes. The evolution of neural traits along the branches of the phylogenetic tree was modeled in the BayesTraits, and the dN/dS ratio (i.e. the ratio between the number of nonsynonymous substitutions per nonsynonymous site and the number of synonymous substitutions per synonymous site) was estimated using the codon-based maximum likelihood method (CODEML) in PAML (phylogenetic analysis by maximum likelihood). Results: Two neural traits, namely brain mass and the number of cortical neurons, showed statistical dependency on the underlying evolutionary history of the SRGAP2 gene in mammals. A significant positive correlation between the increase in cortical neurons and the rate of nucleotide substitutions in the SRGAP2 gene was observed concomitantly with a significant negative correlation between the increase in cortical neurons and the rate of nonsynonymous substitutions in the gene. The SRGAP2 gene appears to be under intense pressure of purifying selection in all mammalian lineages under stringent functional constraint. Conclusion: This work indicates a key role of the SRGAP2 gene in the rapid expansion of neurons in the brain cortex, thereby facilitating the evolution of remarkable intelligence in mammals.


2020 ◽  
Vol 37 (12) ◽  
pp. 3632-3641
Author(s):  
Alina F Leuchtenberger ◽  
Stephen M Crotty ◽  
Tamara Drucks ◽  
Heiko A Schmidt ◽  
Sebastian Burgstaller-Muehlbacher ◽  
...  

Abstract Maximum likelihood and maximum parsimony are two key methods for phylogenetic tree reconstruction. Under certain conditions, each of these two methods can perform more or less efficiently, resulting in unresolved or disputed phylogenies. We show that a neural network can distinguish between four-taxon alignments that were evolved under conditions susceptible to either long-branch attraction or long-branch repulsion. When likelihood and parsimony methods are discordant, the neural network can provide insight as to which tree reconstruction method is best suited to the alignment. When applied to the contentious case of Strepsiptera evolution, our method shows robust support for the current scientific view, that is, it places Strepsiptera with beetles, distant from flies.


Sign in / Sign up

Export Citation Format

Share Document