scholarly journals A test statistic to quantify treelikeness in phylogenetics

2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness

2019 ◽  
Author(s):  
Jiafan Zhu ◽  
Xinhao Liu ◽  
Huw A. Ogilvie ◽  
Luay K. Nakhleh

AbstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting (ILS). However, these methods can only handle a small number of loci from a handful of genomes.In this paper, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological data sets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet)[email protected]


2018 ◽  
Author(s):  
Kunal Arekar ◽  
Abhijna Parigi ◽  
K. Praveen Karanth

AbstractEvolutionary studies have traditionally relied on concatenation based methods to reconstruct relationships from multiple markers. However, due to limitations of concatenation analyses, recent studies have proposed coalescent based methods to address evolutionary questions. Results from these methods tend to diverge from each other under situations where there is incomplete lineage sorting or hybridization. Here we used concatenation as well as multispecies coalescent (MSC) methods to understand the evolutionary origin of capped and golden langur (CG) lineage. Previous molecular studies have retrieved conflicting phylogenies, with mitochondrial tree grouping CG lineage with a largely Indian genus Semnopithecus, while nuclear markers support their affinities with a Southeast Asian genus, Trachypithecus. However, as pointed by others, the use of nuclear copies of mitochondrial DNA in the above studies might have generated the discordance. Because of this discordance, the phylogenetic position of CG lineage has been much debated in recent times. In this study, we have used nine nuclear and eight mitochondrial markers. Concatenated nuclear as well as the mitochondrial dataset recovered congruent relationships where CG lineage was sister to Trachypithecus. However nuclear species tree estimated using different MSC methods were incongruent with the above result, suggesting presence of incomplete lineage sorting (ILS)/hybridisation. Furthermore, CG lineage is morphologically intermediate between Semnopithecus and Trachypithecus. Based on this evidence, we argue that CG lineage evolved through hybridisation between Semnopithecus and Trachypithecus. Finally, we reason that both concatenation as well as coalescent methods should be used in conjunction for better understanding of various evolutionary hypotheses.


2021 ◽  
Author(s):  
Sarah Lutteropp ◽  
Céline Scornavacca ◽  
Alexey M. Kozlov ◽  
Benoit Morel ◽  
Alexandros Stamatakis

AbstractPhylogenetic networks are used to represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity. Hence, current tools can only analyze small data sets.We present NetRAX, a tool for maximum likelihood inference of phylogenetic networks in the absence of incomplete lineage sorting. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of “displayed trees”. NetRAX can infer maximum likelihood phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format.On simulated data, our results show a very low relative difference in BIC score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8, 000 sites, 30 taxa, and 3 reticulations completes within a few minutes on a standard laptop.Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX.


Author(s):  
Amanda Patsis ◽  
Rick P. Overson ◽  
Krissa A. Skogen ◽  
Norman J. Wickett ◽  
Matthew G. Johnson ◽  
...  

Oenothera sect. Pachylophus has proven to be a valuable system in which to study plant-insect coevolution and the drivers of variation in floral morphology and scent. Current species circumscriptions based on morphological characteristics suggest that the section consists of five species, one of which is subdivided into five subspecies. Previous attempts to understand species (and subspecies) relationships at amolecular level have been largely unsuccessful due to high levels of incomplete lineage sorting and limited phylogenetic signal from slowly evolving gene regions. In the present study, target enrichment was used to sequence 322 conserved protein-coding nuclear genes from 50 individuals spanning the geographic range of Oenothera sect. Pachylophus, with species trees inferred using concatenation and coalescentbasedmethods. Our findings concur with previous research in suggesting that O. psammophila and O. harringtonii are nested within a paraphyleticOenothera cespitosa. By contrast, our results show clearly that the two annual species (O. cavernae and O. brandegeei) did not arise from the O. cespitosa lineage, but rather from a common ancestor of Oenothera sect. Pachylophus. Budding speciation as a result of edaphic specializationappears to best explain the evolution of the narrow endemic species O. harringtonii and O. psammophila. Complete understanding of possible introgression among subspecies of O. cespitosa will require broader sampling across the full geographical and ecological ranges of these taxa.


2019 ◽  
Vol 12 (1) ◽  
pp. 3615-3634 ◽  
Author(s):  
Guangshuai Liu ◽  
Huanxin Zhang ◽  
Chao Zhao ◽  
Honghai Zhang

Abstract Adaptation to a wide range of pathogenic environments is a major aspect of the ecological adaptations of vertebrates during evolution. Toll-like receptors (TLRs) are ancient membrane-bound sensors in animals and are best known for their roles in detecting and defense against invading pathogenic microorganisms. To understand the evolutionary history of the vertebrate TLR gene family, we first traced the origin of single-cysteine cluster TLRs that share the same protein architecture with vertebrate TLRs in early-branching animals and then analyzed all members of the TLR family in over 200 species covering all major vertebrate clades. Our results indicate that although the emergence of single-cysteine cluster TLRs predates the separation of bilaterians and cnidarians, most vertebrate TLR members originated shortly after vertebrate emergence. Phylogenetic analyses divided 1,726 vertebrate TLRs into 8 subfamilies, and TLR3 may represent the most ancient subfamily that emerged before the branching of deuterostomes. Our analysis reveals that purifying selection predominated in the evolution of all vertebrate TLRs, with mean dN/dS (ω) values ranging from 0.082 for TLR21 in birds to 0.434 for TLR11 in mammals. However, we did observe patterns of positive selection acting on specific codons (527 of 60,294 codons across all vertebrate TLRs, 8.7‰), which are significantly concentrated in ligand-binding extracellular domains and suggest host–pathogen coevolutionary interactions. Additionally, we found stronger positive selection acting on nonviral compared with viral TLRs, indicating the more essential nonredundant function of viral TLRs in host immunity. Taken together, our findings provide comprehensive insight into the complex evolutionary processes of the vertebrate TLR gene family, involving gene duplication, pseudogenization, purification, and positive selection.


2019 ◽  
Vol 69 (3) ◽  
pp. 593-601 ◽  
Author(s):  
Christopher Blair ◽  
Cécile Ané

Abstract Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene flow post divergence, calling for additional methods that can accommodate this limitation. Explicit phylogenetic network methods have emerged, which can simultaneously account for ILS and gene flow by representing evolutionary history as a directed acyclic graph. In this point of view, we highlight some of the strengths and limitations of phylogenetic networks and argue that tree-based inference should not be blindly abandoned in favor of networks simply because they represent more parameter rich models. Attention should be given to model selection of reticulation complexity, and the most robust conclusions regarding evolutionary history are likely obtained when combining tree- and network-based inference.


2021 ◽  
Vol 288 (1943) ◽  
pp. 20202934
Author(s):  
Jiaming Hu ◽  
Michael V. Westbury ◽  
Junxia Yuan ◽  
Zhen Zhang ◽  
Shungang Chen ◽  
...  

Cave hyenas (genus Crocuta ) are extinct bone-cracking carnivores from the family Hyaenidae and are generally split into two taxa that correspond to a European/Eurasian and an (East) Asian lineage. They are close relatives of the extant African spotted hyenas, the only extant member of the genus Crocuta . Cave hyenas inhabited a wide range across Eurasia during the Pleistocene, but became extinct at the end of the Late Pleistocene. Using genetic and genomic datasets, previous studies have proposed different scenarios about the evolutionary history of Crocuta. However, causes of the extinction of cave hyenas are widely speculative and samples from China are severely understudied. In this study, we assembled near-complete mitochondrial genomes from two cave hyenas from northeastern China dating to 20 240 and 20 253 calBP, representing the youngest directly dated fossils of Crocuta in Asia. Phylogenetic analyses suggest a monophyletic clade of these two samples within a deeply diverging mitochondrial haplogroup of Crocuta . Bayesian analyses suggest that the split of this Asian cave hyena mitochondrial lineage from their European and African relatives occurred approximately 1.85 Ma (95% CI 1.62–2.09 Ma), which is broadly concordant with the earliest Eurasian Crocuta fossil dating to approximately 2 Ma. Comparisons of mean genetic distance indicate that cave hyenas harboured higher genetic diversity than extant spotted hyenas, brown hyenas and aardwolves, but this is probably at least partially due to the fact that their mitochondrial lineages do not represent a monophyletic group, although this is also true for extant spotted hyenas. Moreover, the joint female effective population size of Crocuta (both cave hyenas and extant spotted hyenas) has sustained two declines during the Late Pleistocene. Combining this mitochondrial phylogeny, previous nuclear findings and fossil records, we discuss the possible relationship of fossil Crocuta in China and the extinction of cave hyenas.


Author(s):  
Jianhua Wang ◽  
Guan-Zhu Han

Abstract The origin and deep history of retroviruses remain mysterious and contentious, largely because the diversity of retroviruses is incompletely understood. Here, we report the discovery of lokiretroviruses, a novel major lineage of retroviruses, within the genomes of a wide range of vertebrates (at least 137 species), including lampreys, ray-finned fishes, lobe-finned fishes, amphibians, and reptiles. Lokiretroviruses share a similar genome architecture with known retroviruses, but display some unique features. Interestingly, lokiretrovirus Env proteins share detectable similarity with fusion glycoproteins of viruses within the Mononegavirales order, blurring the boundary between retroviruses and negative sense single-stranded RNA viruses. Phylogenetic analyses based on reverse transcriptase demonstrate that lokiretroviruses are sister to all the retroviruses sampled to date, providing a crucial nexus for studying the deep history of retroviruses. Comparing congruence between host and virus phylogenies suggests lokiretroviruses mainly underwent cross-species transmission. Moreover, we find that retroviruses replaced their ribonuclease H and integrase domains multiple times during their evolutionary course, revealing the importance of domain shuffling in the evolution of retroviruses. Overall, our findings greatly expand our views of the diversity of retroviruses, and provide novel insights into the origin and complex evolutionary history of retroviruses.


2019 ◽  
Vol 35 (14) ◽  
pp. i370-i378 ◽  
Author(s):  
Jiafan Zhu ◽  
Xinhao Liu ◽  
Huw A Ogilvie ◽  
Luay K Nakhleh

Abstract Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document