scholarly journals The Prevalence and Impact of Model Violations in Phylogenetics Analysis

2018 ◽  
Author(s):  
Suha Naser-Khdour ◽  
Bui Quang Minh ◽  
Wenqi Zhang ◽  
Eric Stone ◽  
Robert Lanfear

AbstractIn phylogenetic inference we commonly use models of substitution which assume that sequence evolution is stationary, reversible and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic datasets. We show that many partitions (39.5%) reject the SRH assumptions, and that for most datasets, the topologies of trees inferred from all partitions differ significantly from those inferred using the subset of partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. They also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).

2019 ◽  
Vol 11 (12) ◽  
pp. 3341-3352 ◽  
Author(s):  
Suha Naser-Khdour ◽  
Bui Quang Minh ◽  
Wenqi Zhang ◽  
Eric A Stone ◽  
Robert Lanfear

Abstract In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org).


2019 ◽  
Author(s):  
Bui Quang Minh ◽  
Heiko Schmidt ◽  
Olga Chernomor ◽  
Dominik Schrempf ◽  
Michael Woodhams ◽  
...  

AbstractIQ-TREE (http://www.iqtree.org) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


2017 ◽  
Author(s):  
R. Biczok ◽  
P. Bozsoky ◽  
P. Eisenmann ◽  
J. Ernst ◽  
T. Ribizel ◽  
...  

AbstractMotivationThe presence of terraces in phylogenetic tree space, that is, a potentially large number of distinct tree topologies that have exactly the same analytical likelihood score, was first described by Sanderson et al, (2011). However, popular software tools for maximum likelihood and Bayesian phylogenetic inference do not yet routinely report, if inferred phylogenies reside on a terrace, or not. We believe, this is due to the unavailability of an efficient library implementation to (i) determine if a tree resides on a terrace, (ii) calculate how many trees reside on a terrace, and (iii) enumerate all trees on a terrace.ResultsIn our bioinformatics programming practical we developed two efficient and independent C++ implementations of the SUPERB algorithm by Constantinescu and Sankoff (1995) for counting and enumerating the trees on a terrace. Both implementations yield exactly the same results and are more than one order of magnitude faster and require one order of magnitude less memory than a previous 3rd party python implementation.AvailabilityThe source codes are available under GNU GPL at https://github.com/[email protected]


2020 ◽  
Vol 37 (5) ◽  
pp. 1530-1534 ◽  
Author(s):  
Bui Quang Minh ◽  
Heiko A Schmidt ◽  
Olga Chernomor ◽  
Dominik Schrempf ◽  
Michael D Woodhams ◽  
...  

Abstract IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


2020 ◽  
Author(s):  
Andrew F. Magee ◽  
Sarah K. Hilton ◽  
William S. DeWitt

AbstractLikelihood-based phylogenetic inference posits a probabilistic model of character state change along branches of a phylogenetic tree. These models typically assume statistical independence of sites in the sequence alignment. This is a restrictive assumption that facilitates computational tractability, but ignores how epistasis, the effect of genetic background on mutational effects, influences the evolution of functional sequences. We consider the effect of using a misspecified site-independent model on the accuracy of Bayesian phylogenetic inference in the setting of pairwise-site epistasis. Previous work has shown that as alignment length increases, tree reconstruction accuracy also increases. Here, we present a simulation study demonstrating that accuracy increases with alignment size even if the additional sites are epistatically coupled. We introduce an alignment-based test statistic that is a diagnostic for pair-wise epistasis and can be used in posterior predictive checks.


2020 ◽  
Vol 190 (1) ◽  
pp. 79-113
Author(s):  
Bryan M Gee

Abstract Trematopids are a clade of terrestrial Permo-Carboniferous temnospondyl amphibians. The intrarelationships of this clade are poorly known. This is largely attributable to a substantial disparity in size between type specimens, which range from the small-bodied lectotype of Mattauschia laticeps (< 4 cm skull length) to the large-bodied holotype of Acheloma cumminsi (> 15 cm skull length). Inferred correlation of size disparity with ontogenetic disparity has led previous workers either to omit taxa in phylogenetic analyses or to forgo an analysis altogether. Here, I take a specimen-level approach and multiple subsampling permutations to explore the phylogeny of the Trematopidae as a case study for assessing the effects of ontogenetic disparity on phylogenetic reconstruction in temnospondyls. The various analyses provide evidence that ontogenetic disparity confounds the phylogenetic inference of trematopids but without a directional bias. Tree topologies of most permutations are poorly resolved and weakly supported, reflecting character conflict that results from the inability of the analyses to differentiate retained plesiomorphies from juvenile features. These findings urge caution in the interpretation of phylogenetic analyses for which ontogenetic disparity exists, but is unaccounted for, and provide a strong impetus for more directed exploration of the interplay of ontogeny and phylogeny across Temnospondyli.


2018 ◽  
Author(s):  
Alexey M. Kozlov ◽  
Diego Darriba ◽  
Tomáš Flouri ◽  
Benoit Morel ◽  
Alexandros Stamatakis

AbstractMotivationPhylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum like-lihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.ResultsWe present RAxML-NG, a from scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML- NG offers improved accuracy, flexibility, speed, scalability, and usability compared to RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and a the recently introduced transfer bootstrap support metric.AvailabilityThe code is available under GNU GPL at https://github.com/amkozlov/raxml-ng.RAxML-NG web service (maintained by Vital- IT) is available at https://raxml-ng.vital-it.ch/[email protected]


2014 ◽  
Author(s):  
James B Pease ◽  
Matthew W. Hahn

In clades of closely related taxa, discordant genealogies due to incomplete lineage sorting (ILS) can complicate the detection of introgression. TheD-statistic (a.k.a. the ABBA/BABA test) was proposed to infer introgression in the presence of ILS for a four-taxon clade. However, the originalD-statistic cannot be directly applied to a symmetric five-taxon phylogeny, and the direction of introgression cannot be inferred for any tree topology. Here we explore the issues associated with previous methods for adapting theD-statistic to a larger tree topology, and propose new “DFOIL” tests to infer both the taxa involved in and the direction of introgressions for a symmetric five-taxon phylogeny. Using theory and simulations, we find that previous modifications of theD-statistic to five-taxon phylogenies incorrectly identify both the pairs of taxa exchanging migrants as well as the direction of introgression. TheDFOILstatistics are shown to overcome this deficiency and to correctly determine the direction of introgressions. TheDFOILtests are relatively simple and computationally inexpensive to calculate, and can be easily applied to various phylogenomic datasets. In addition, our general approach to the problem of introgression detection could be adapted to larger tree topologies and other models of sequence evolution.


2013 ◽  
Vol 63 (Pt_11) ◽  
pp. 4266-4270 ◽  
Author(s):  
Anna Tomova ◽  
Iva Tomova ◽  
Evgenia Vasileva-Tonkova ◽  
Irina Lazarkevich ◽  
Margarita Stoilova-Disheva ◽  
...  

A novel psychrotolerant, strictly aerobic, non-motile, rod-shaped bacterial strain, designated IM13T, was isolated from a sample taken from prehistoric guano paintings in Magura Cave, northwest Bulgaria and subjected to a polyphasic taxonomic study. Strain IM13T formed yellow colonies on LB agar plates and was Gram-staining-negative, heterotrophic and alkalitolerant. It grew optimally at pH 7.5 and 30 °C in the absence of NaCl. Phylogenetic analysis of the whole 16S rRNA gene revealed that strain IM13T branched with representatives of the genus Myroides with sequence similarity of 93–94 % with other species of the genus. The novel isolate contained iso-C15 : 0 (49.1 %), iso-C17 : 1ω9c (18.2 %) and iso-C17 : 0 3-OH (14.0 %) as dominant fatty acids. The DNA G+C content of strain IM13T was 33.5 mol%. Based on phylogenetic inference and phenotypic characteristics, it was concluded that strain IM13T represents a novel species of the genus Myroides , for which the name Myroides guanonis sp. nov. is proposed. The type strain is IM13T ( = DSM 26542T = NBIMCC 8736T).


Sign in / Sign up

Export Citation Format

Share Document