Phylogenetic Reconstruction Methods: An Overview

Author(s):  
Alexandre De Bruyn ◽  
Darren P. Martin ◽  
Pierre Lefeuvre
2018 ◽  
Vol 44 (1) ◽  
pp. 20
Author(s):  
Eloiza Teles Caldart ◽  
Helena Mata ◽  
Cláudio Wageck Canal ◽  
Ana Paula Ravazzolo

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.


2010 ◽  
Vol 20 (supp01) ◽  
pp. 1511-1532 ◽  
Author(s):  
S. POMPEI ◽  
E. CAGLIOTI ◽  
V. LORETO ◽  
F. TRIA

Phylogenetic methods have recently been rediscovered in several interesting areas among which immunodynamics, epidemiology and many branches of evolutionary dynamics. In many interesting cases the reconstruction of a correct phylogeny is blurred by high mutation rates and/or horizontal transfer events. As a consequence, a divergence arises between the true evolutionary distances and the distances between pairs of taxa as inferred from the available data, making the phylogenetic reconstruction a challenging problem. Mathematically this divergence translates in the non-additivity of the actual distances between taxa and the quest for new algorithms able to efficiently cope with these effects is wide open. In distance-based reconstruction methods, two properties of additive distances were extensively exploited as antagonist criteria to drive phylogeny reconstruction: on the one hand a local property of quartets, i.e. sets of four taxa in a tree, the four-point condition; on the other hand, a recently proposed formula that allows to write the tree length as a function of the distances between taxa, the Pauplin's formula. A deeper comprehension of the effects of the non-additivity on the inspiring principles of the existing reconstruction algorithms is thus of paramount importance. In this paper we present a comparative analysis of the performances of the most important distance-based phylogenetic algorithms. We focus in particular on the dependence of their performances on two main sources of non-additivity: back-mutation processes and horizontal transfer processes. The comparison is carried out in the framework of a set of generative algorithms for phylogenies that incorporate non-additivity in a tunable way.


2020 ◽  
Vol 37 (9) ◽  
pp. 2747-2762 ◽  
Author(s):  
Guénola Drillon ◽  
Raphaël Champeimont ◽  
Francesco Oteri ◽  
Gilles Fischer ◽  
Alessandra Carbone

Abstract Gene order can be used as an informative character to reconstruct phylogenetic relationships between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively, supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance is evaluated on two data sets of 13 vertebrates and 21 yeast genomes by using up to 130,000 and 179,000 breakpoints, respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared with other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in <15 min.


Mycologia ◽  
2006 ◽  
Vol 98 (6) ◽  
pp. 937-948 ◽  
Author(s):  
Jean-Marc Moncalvo ◽  
R. Henrik Nilsson ◽  
Brenda Koster ◽  
Susie M. Dunham ◽  
Torsten Bernauer ◽  
...  

Diachronica ◽  
2013 ◽  
Vol 30 (2) ◽  
pp. 143-170 ◽  
Author(s):  
François Barbançon ◽  
Steven N. Evans ◽  
Luay Nakhleh ◽  
Don Ringe ◽  
Tandy Warnow

This paper reports a simulation study comparing and evaluating the performance of different linguistic phylogeny reconstruction methods on model datasets for which the true trees are known. UPGMA performed least well, then (in ascending order) neighbor joining, the method of Gray & Atkinson and finally maximum parsimony. Weighting characters greatly improves the accuracy of maximum parsimony and maximum compatibility if the characters with high weights exhibit low homoplasy.


2016 ◽  
Author(s):  
Thijs Janzen ◽  
Rampal S. Etienne

ABSTRACTGeographic isolation that drives speciation is often assumed to slowly increase over time, for instance through the formation of rivers, the formation of mountains or the movement of tectonic plates. Cyclic changes in connectivity between areas may occur with the advancement and retraction of glaciers, with water level fluctuations in seas between islands or in lakes that have an uneven bathymetry. These habitat dynamics may act as a driver of allopatric speciation and propel local diversity. Here we present a parsimonious model of the interaction between cyclical (but not necessarily periodic) changes in the environment and speciation, and provide an ABC-SMC method to infer the rates of allopatric and sympatric speciation from a phylogenetic tree. We apply our approach to the posterior sample of an updated phylogeny of the Lamprologini, a tribe of cichlid fish from Lake Tanganyika where such cyclic changes in water level have occurred. We find that water level changes play a crucial role in driving diversity in Lake Tanganyika. We note that if we apply our analysis to the Most Credible Consensus (MCC) tree, we do not find evidence for water level changes influencing diversity in the Lamprologini, suggesting that the MCC tree is a misleading representation of the true species tree. Furthermore, we note that the signature of habitat dynamics is found in the posterior sample despite the fact that this sample was constructed using a species tree prior that ignores habitat dynamics. However, in other cases this species tree prior might erase this signature. Hence we argue that in order to improve inference of the effect of habitat dynamics on biodiversity, phylogenetic reconstruction methods should include tree priors that explicitly take into account such dynamics.


2018 ◽  
Vol 3 ◽  
pp. 33 ◽  
Author(s):  
John A. Lees ◽  
Michelle Kendall ◽  
Julian Parkhill ◽  
Caroline Colijn ◽  
Stephen D. Bentley ◽  
...  

Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined 'true tree' using a realistic evolutionary model. We  built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. For the most accurate tree, use of either RAxML or IQ-TREE with an alignment of variable sites produced by mapping to a reference genome is best. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology.  We have publicly released our simulated data and code to enable further comparisons.


Mycologia ◽  
2006 ◽  
Vol 98 (6) ◽  
pp. 937-948 ◽  
Author(s):  
J.-M. Moncalvo ◽  
R. H. Nilsson ◽  
B. Koster ◽  
S. M. Dunham ◽  
T. Bernauer ◽  
...  

2005 ◽  
Vol 103 (2) ◽  
pp. 171-192 ◽  
Author(s):  
Luay Nakhleh ◽  
Tandy Warnow ◽  
Don Ringe ◽  
Steven N. Evans

2020 ◽  
Author(s):  
Chao Zhang ◽  
Andrey V. Bzikadze ◽  
Yana Safonova ◽  
Siavash Mirarab

AbstractAffinity maturation (AM) of antibodies through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal trees of antibodies produced by B cells that have evolved from a common naive B cell. Recent advances in high-throughput sequencing have enabled deep scans of antibody repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture micro-evolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal tree evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal tree evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modelling selective pressure due to changes in affinity binding; it enables scalable simulations of millions of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and for measuring their properties. Our benchmarking results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a very simple postprocessing of their results, where super short branches are contracted, leads to inferences that are better than alternative methods.


Sign in / Sign up

Export Citation Format

Share Document