scholarly journals An experimental study comparing linguistic phylogenetic reconstruction methods

Diachronica ◽  
2013 ◽  
Vol 30 (2) ◽  
pp. 143-170 ◽  
Author(s):  
François Barbançon ◽  
Steven N. Evans ◽  
Luay Nakhleh ◽  
Don Ringe ◽  
Tandy Warnow

This paper reports a simulation study comparing and evaluating the performance of different linguistic phylogeny reconstruction methods on model datasets for which the true trees are known. UPGMA performed least well, then (in ascending order) neighbor joining, the method of Gray & Atkinson and finally maximum parsimony. Weighting characters greatly improves the accuracy of maximum parsimony and maximum compatibility if the characters with high weights exhibit low homoplasy.

2018 ◽  
Vol 44 (1) ◽  
pp. 20
Author(s):  
Eloiza Teles Caldart ◽  
Helena Mata ◽  
Cláudio Wageck Canal ◽  
Ana Paula Ravazzolo

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.


2010 ◽  
Vol 20 (supp01) ◽  
pp. 1511-1532 ◽  
Author(s):  
S. POMPEI ◽  
E. CAGLIOTI ◽  
V. LORETO ◽  
F. TRIA

Phylogenetic methods have recently been rediscovered in several interesting areas among which immunodynamics, epidemiology and many branches of evolutionary dynamics. In many interesting cases the reconstruction of a correct phylogeny is blurred by high mutation rates and/or horizontal transfer events. As a consequence, a divergence arises between the true evolutionary distances and the distances between pairs of taxa as inferred from the available data, making the phylogenetic reconstruction a challenging problem. Mathematically this divergence translates in the non-additivity of the actual distances between taxa and the quest for new algorithms able to efficiently cope with these effects is wide open. In distance-based reconstruction methods, two properties of additive distances were extensively exploited as antagonist criteria to drive phylogeny reconstruction: on the one hand a local property of quartets, i.e. sets of four taxa in a tree, the four-point condition; on the other hand, a recently proposed formula that allows to write the tree length as a function of the distances between taxa, the Pauplin's formula. A deeper comprehension of the effects of the non-additivity on the inspiring principles of the existing reconstruction algorithms is thus of paramount importance. In this paper we present a comparative analysis of the performances of the most important distance-based phylogenetic algorithms. We focus in particular on the dependence of their performances on two main sources of non-additivity: back-mutation processes and horizontal transfer processes. The comparison is carried out in the framework of a set of generative algorithms for phylogenies that incorporate non-additivity in a tunable way.


2020 ◽  
Vol 37 (9) ◽  
pp. 2747-2762 ◽  
Author(s):  
Guénola Drillon ◽  
Raphaël Champeimont ◽  
Francesco Oteri ◽  
Gilles Fischer ◽  
Alessandra Carbone

Abstract Gene order can be used as an informative character to reconstruct phylogenetic relationships between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively, supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance is evaluated on two data sets of 13 vertebrates and 21 yeast genomes by using up to 130,000 and 179,000 breakpoints, respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared with other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in <15 min.


2017 ◽  
Vol 51 (s38) ◽  
Author(s):  
Alexei S. Kassian

AbstractThis paper deals with the problem of linguistic homoplasy (parallel or backward development), how it can be detected, what kinds of linguistic homoplasy can be distinguished and which varieties of the phenomenon are the most deleterious for the reconstruction of language phylogeny. It is proposed that language phylogeny reconstruction should consist of two main stages. Firstly, a strict consensus tree should be built on the basis of high-quality input data elaborated with the help of the main phylogenetic methods (such as Neighbor-joining, Bayesian MCMC, and Maximum parsimony), and ancestral character states, allowing us to reveal a certain number of homoplastic characters. Secondly, after the detected instances of homoplasy are eliminated from the input matrix, the consensus tree is to be compiled again. It is expected that after homoplastic optimization it will be possible to better resolve individual “problem clades”, and generally the homoplasy-optimized phylogeny should be more robust than the tree constructed initially. The proposed procedure is tested on the 110-item Swadesh wordlists of the Lezgian and Tsezic groups. The Lezgian and Tsezic results generally support theoretical expectations. The MLN (minimal lateral network) method, currently implemented in the LingPy software, is a helpful tool for the detection of linguistic homoplasy.


Mycologia ◽  
2006 ◽  
Vol 98 (6) ◽  
pp. 937-948 ◽  
Author(s):  
Jean-Marc Moncalvo ◽  
R. Henrik Nilsson ◽  
Brenda Koster ◽  
Susie M. Dunham ◽  
Torsten Bernauer ◽  
...  

2007 ◽  
Vol 8 (1) ◽  
pp. 472 ◽  
Author(s):  
Srinath Sridhar ◽  
Fumei Lam ◽  
Guy E Blelloch ◽  
R Ravi ◽  
Russell Schwartz

Sign in / Sign up

Export Citation Format

Share Document