scholarly journals Markov Katana: a Novel Method for Bayesian Resampling of Parameter Space Applied to Phylogenetic Trees

2018 ◽  
Author(s):  
Stephen T. Pollard ◽  
Kenji Fukushima ◽  
Zhengyuan O. Wang ◽  
Todd A. Castoe ◽  
David D. Pollock

ABSTRACTPhylogenetic inference requires a means to search phylogenetic tree space. This is usually achieved using progressive algorithms that propose and test small alterations in the current tree topology and branch lengths. Current programs search tree topology space using branch-swapping algorithms, but proposals do not discriminate well between swaps likely to succeed or fail. When applied to datasets with many taxa, the huge number of possible topologies slows these programs dramatically. To overcome this, we developed a statistical approach for proposal generation in Bayesian analysis, and evaluated its applicability for the problem of searching phylogenetic tree space. The general idea of the approach, which we call ‘Markov katana’, is to make proposals based on a heuristic algorithm using bootstrapped subsets of the data. Such proposals induce an unintended sampling distribution that must be determined and removed to generate posterior estimates, but the cost of this extra step can in principle be small compared to the added value of more efficient parameter exploration in Markov chain Monte Carlo analyses. Our prototype application uses the simple neighbor-joining distance heuristic on data subsets to propose new reasonably likely phylogenetic trees (including topologies and branch lengths). The evolutionary model used to generate distances in our prototype was far simpler than the more complex model used to evaluate the likelihood of phylogenies based on the full dataset. This prototype implementation indicates that the Markov katana approach could be easily incorporated into existing phylogenetic search programs and may prove a useful alternative in conjunction with existing methods. The general features of this statistical approach may also prove useful in disciplines other than phylogenetics. We demonstrate that this method can be used to efficiently estimate a Bayesian posterior.


2019 ◽  
Vol 69 (2) ◽  
pp. 280-293 ◽  
Author(s):  
Chris Whidden ◽  
Brian C Claywell ◽  
Thayer Fisher ◽  
Andrew F Magee ◽  
Mathieu Fourment ◽  
...  

Abstract Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, “likelihood” of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method “phylogenetic topographer” (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.



Phylogenetic tree is a pictorial representation of evolutionary relationships between organisms. It is important method to analyze the biological data. Phylogenetic trees are based on two methods : Distance based and Character based. Phylogenetic tree are used comparative analysis of any organism like human Beings, Animals, Bacteria, Viruses and Fungi’s etc. In this paper we compare 12 different nucleotide sequences of Azotobacter species having linear DNA of 999 BP as maximum size using substitution model and phylogenetic model. In this study two different models name P-Distance and Jukes cantor model are used and helped in finding UPGMA or Neighbour joining method efficiency in evaluating the similarity and dissimilarity of bacterial species. This paper gives influence in reconciliation of Azotobacter species to produce phylogram with informative branch lengths. This further leads to analyze and understand various expressive characters of Azotobacter in agriculture field.



2021 ◽  
Vol 82 (1-2) ◽  
Author(s):  
Lena Collienne ◽  
Alex Gavryushkin

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).



2019 ◽  
Vol 1 (1) ◽  
Author(s):  
D C Blackburn ◽  
G Giribet ◽  
D E Soltis ◽  
E L Stanley

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.



1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.



Author(s):  
Mochammad Rajasa Mukti Negara ◽  
Ita Krissanti ◽  
Gita Widya Pradini

BACKGROUND Nucleocapsid (N) protein is one of four structural proteins of SARS-CoV-2  which is known to be more conserved than spike protein and is highly immunogenic. This study aimed to analyze the variation of the SARS-CoV-2 N protein sequences in ASEAN countries, including Indonesia. METHODS Complete sequences of SARS-CoV-2 N protein from each ASEAN country were obtained from Global Initiative on Sharing All Influenza Data (GISAID), while the reference sequence was obtained from GenBank. All sequences collected from December 2019 to March 2021 were grouped to the clade according to GISAID, and two representative isolates were chosen from each clade for the analysis. The sequences were aligned by MUSCLE, and phylogenetic trees were built using MEGA-X software based on the nucleotide and translated AA sequences. RESULTS 98 isolates of complete N protein genes from ASEAN countries were analyzed. The nucleotides of all isolates were 97.5% conserved. Of 31 nucleotide changes, 22 led to amino acid (AA) substitutions; thus, the AA sequences were 94.5% conserved. The phylogenetic tree of nucleotide and AA sequences shows similar branches. Nucleotide variations in clade O (C28311T); clade GR (28881–28883 GGG>AAC); and clade GRY (28881–28883 GGG>AAC and C28977T) lead to specific branches corresponding to the clade within both trees. CONCLUSIONS The N protein sequences of SARS-CoV-2 across ASEAN countries are highly conserved. Most isolates were closely related to the reference sequence originating from China, except the isolates representing clade O, GR, and GRY which formed specific branches in the phylogenetic tree.



2019 ◽  
Vol 60 (10) ◽  
pp. 2141-2151 ◽  
Author(s):  
Kota Ishibashi ◽  
Ian Small ◽  
Toshiharu Shikanai

Abstract Amborella trichopoda is placed close to the base of the angiosperm lineage (basal angiosperm). By genome-wide RNA sequencing, we identified 184C-to-U RNA editing sites in the plastid genome of Amborella. This number is much higher than that observed in other angiosperms including maize (44 sites), rice (39 sites) and grape (115 sites). Despite the high frequency of RNA editing, the biased distribution of RNA editing sites in the genome, target codon preference and nucleotide preference adjacent to the edited cytidine are similar to that in other angiosperms, suggesting a common editing machinery. Consistent with this idea, the Amborella nuclear genome encodes 2–3 times more of the E- and DYW-subclass members of pentatricopeptide repeat proteins responsible for RNA editing site recognition in plant organelles. Among 165 editing sites in plastid protein coding sequences in Amborella, 100 sites were conserved at least in one out of 38 species selected to represent key branching points of the angiosperm phylogenetic tree. We assume these 100 sites represent at least a subset of the sites in the plastid editotype of ancestral angiosperms. We then mapped the loss and gain of editing sites on the phylogenetic tree of angiosperms. Our results support the idea that the evolution of angiosperms has led to the loss of RNA editing sites in plastids.



2006 ◽  
Vol 04 (01) ◽  
pp. 59-74 ◽  
Author(s):  
YING-JUN HE ◽  
TRINH N. D. HUYNH ◽  
JESPER JANSSON ◽  
WING-KIN SUNG

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set [Formula: see text] and none of the rooted triplets in another given set [Formula: see text]. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.



2006 ◽  
Vol 12 (2) ◽  
pp. 243-257 ◽  
Author(s):  
Ross Clement

The Cichlid Speciation Project (CSP) is an ALife simulation system for investigating open problems in the speciation of African cichlid fish. The CSP can be used to perform a wide range of experiments that show that speciation is a natural consequence of certain biological systems. A visualization system capable of extracting the history of speciation from low-level trace data and creating a phylogenetic tree has been implemented. Unlike previous approaches, this visualization system presents a concrete trace of speciation, rather than a summary of low-level information from which the viewer can make subjective decisions on how speciation progressed. The phylogenetic trees are a more objective visualization of speciation, and enable automated collection and summarization of the results of experiments. The visualization system is used to create a phylogenetic tree from an experiment that models sympatric speciation.



2020 ◽  
Vol 16 (3) ◽  
pp. 1043-1059
Author(s):  
Jeanne Rezsöhazy ◽  
Hugues Goosse ◽  
Joël Guiot ◽  
Fabio Gennaretti ◽  
Etienne Boucher ◽  
...  

Abstract. Tree-ring archives are one of the main sources of information to reconstruct climate variations over the last millennium with annual resolution. The links between tree-ring proxies and climate have usually been estimated using statistical approaches, assuming linear and stationary relationships. Both assumptions may be inadequate, but this issue can be overcome by ecophysiological modelling based on mechanistic understanding. In this respect, the model MAIDEN (Modeling and Analysis In DENdroecology) simulating tree-ring growth from daily temperature and precipitation, considering carbon assimilation and allocation in forest stands, may constitute a valuable tool. However, the lack of local meteorological data and the limited characterization of tree species traits can complicate the calibration and validation of such a complex model, which may hamper palaeoclimate applications. The goal of this study is to test the applicability of the MAIDEN model in a palaeoclimate context using as a test case tree-ring observations covering the 20th century from 21 Eastern Canadian taiga sites and 3 European sites. More specifically, we investigate the model sensitivity to parameter calibration and to the quality of climatic inputs, and we evaluate the model performance using a validation procedure. We also examine the added value of using MAIDEN in palaeoclimate applications compared to a simpler tree-growth model, i.e. VS-Lite. A Bayesian calibration of the most sensitive model parameters provides good results at most of the selected sites with high correlations between simulated and observed tree growth. Although MAIDEN is found to be sensitive to the quality of the climatic inputs, simple bias correction and downscaling techniques of these data improve significantly the performance of the model. The split-sample validation of MAIDEN gives encouraging results but requires long tree ring and meteorological series to give robust results. We also highlight a risk of overfitting in the calibration of model parameters that increases with short series. Finally, MAIDEN has shown higher calibration and validation correlations in most cases compared to VS-Lite. Nevertheless, this latter model turns out to be more stable over calibration and validation periods. Our results provide a protocol for the application of MAIDEN to potentially any site with tree-ring width data in the extratropical region.



Sign in / Sign up

Export Citation Format

Share Document