INFERRING PHYLOGENETIC RELATIONSHIPS AVOIDING FORBIDDEN ROOTED TRIPLETS

2006 ◽  
Vol 04 (01) ◽  
pp. 59-74 ◽  
Author(s):  
YING-JUN HE ◽  
TRINH N. D. HUYNH ◽  
JESPER JANSSON ◽  
WING-KIN SUNG

To construct a phylogenetic tree or phylogenetic network for describing the evolutionary history of a set of species is a well-studied problem in computational biology. One previously proposed method to infer a phylogenetic tree/network for a large set of species is by merging a collection of known smaller phylogenetic trees on overlapping sets of species so that no (or as little as possible) branching information is lost. However, little work has been done so far on inferring a phylogenetic tree/network from a specified set of trees when in addition, certain evolutionary relationships among the species are known to be highly unlikely. In this paper, we consider the problem of constructing a phylogenetic tree/network which is consistent with all of the rooted triplets in a given set [Formula: see text] and none of the rooted triplets in another given set [Formula: see text]. Although NP-hard in the general case, we provide some efficient exact and approximation algorithms for a number of biologically meaningful variants of the problem.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Rosanne Wallin ◽  
Leo van Iersel ◽  
Steven Kelk ◽  
Leen Stougie

Abstract Background Rooted phylogenetic networks are used to display complex evolutionary history involving so-called reticulation events, such as genetic recombination. Various methods have been developed to construct such networks, using for example a multiple sequence alignment or multiple phylogenetic trees as input data. Coronaviruses are known to recombine frequently, but rooted phylogenetic networks have not yet been used extensively to describe their evolutionary history. Here, we created a workflow to compare the evolutionary history of SARS-CoV-2 with other SARS-like viruses using several rooted phylogenetic network inference algorithms. This workflow includes filtering noise from sets of phylogenetic trees by contracting edges based on branch length and bootstrap support, followed by resolution of multifurcations. We explored the running times of the network inference algorithms, the impact of filtering on the properties of the produced networks, and attempted to derive biological insights regarding the evolution of SARS-CoV-2 from them. Results The network inference algorithms are capable of constructing rooted phylogenetic networks for coronavirus data, although running-time limitations require restricting such datasets to a relatively small number of taxa. Filtering generally reduces the number of reticulations in the produced networks and increases their temporal consistency. Taxon bat-SL-CoVZC45 emerges as a major and structural source of discordance in the dataset. The tested algorithms often indicate that SARS-CoV-2/RaTG13 is a tree-like clade, with possibly some reticulate activity further back in their history. A smaller number of constructed networks posit SARS-CoV-2 as a possible recombinant, although this might be a methodological artefact arising from the interaction of bat-SL-CoVZC45 discordance and the optimization criteria used. Conclusion Our results demonstrate that as part of a wider workflow and with careful attention paid to running time, rooted phylogenetic network algorithms are capable of producing plausible networks from coronavirus data. These networks partly corroborate existing theories about SARS-CoV-2, and partly produce new avenues for exploration regarding the location and significance of reticulate activity within the wider group of SARS-like viruses. Our workflow may serve as a model for pipelines in which phylogenetic network algorithms can be used to analyse different datasets and test different hypotheses.


2019 ◽  
pp. 214-249
Author(s):  
Glenn-Peter Sætre ◽  
Mark Ravinet

How can genetics and genomics be used to understand the evolutionary history of organisms? This chapter focuses on such methods. First, the field of phylogenetics is introduced, as a way to visualize and quantify the evolutionary relationships among species. The chapter outlines how we go from aligning DNA sequence data to building gene trees and we argue that “tree-thinking” is fundamentally important for understanding evolution. The chapter also goes beyond phylogenetic trees to focus on phylogeography, i.e. the understanding of evolutionary relationships in a spatial context. More recently, the explosion of genomic data from ancient and modern human populations has made this an extremely exciting field which is transforming our understanding of our own evolutionary history. Before that, though, the chapter reviews how modern phylogenetics has arisen from historical efforts to classify life on Earth.


Author(s):  
Remie Janssen ◽  
Pengyu Liu

Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.


2019 ◽  
Vol 8 (1) ◽  
pp. 32
Author(s):  
Manuel Villalobos-Cid ◽  
Francisco Salinas ◽  
Eduardo I. Kessi-Pérez ◽  
Matteo De Chiara ◽  
Gianni Liti ◽  
...  

Massive sequencing projects executed in Saccharomyces cerevisiae have revealed in detail its population structure. The recent “1002 yeast genomes project” has become the most complete catalogue of yeast genetic diversity and a powerful resource to analyse the evolutionary history of genes affecting specific phenotypes. In this work, we selected 22 nitrogen associated genes and analysed the sequence information from the 1011 strains of the “1002 yeast genomes project”. We constructed a total evidence (TE) phylogenetic tree using concatenated information, which showed a 27% topology similarity with the reference (REF) tree of the “1002 yeast genomes project”. We also generated individual phylogenetic trees for each gene and compared their topologies, identifying genes with similar topologies (suggesting a shared evolutionary history). Furthermore, we pruned the constructed phylogenetic trees to compare the REF tree topology versus the TE tree and the individual genes trees, considering each phylogenetic cluster/subcluster within the population, observing genes with cluster/subcluster topologies of high similarity to the REF tree. Finally, we used the pruned versions of the phylogenetic trees to compare four strains considered as representatives of S. cerevisiae clean lineages, observing for 15 genes that its cluster topologies match 100% the REF tree, supporting that these strains represent main lineages of yeast population. Altogether, our results showed the potential of tree topologies comparison for exploring the evolutionary history of a specific group of genes.


2019 ◽  
Author(s):  
Nadia Tahiri

Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or hybridization events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree, or Tree of Life, that represents the main patterns of vertical descent. Here, we present a new efficient method for inferring single or multiple consensus trees and supertrees for a given set of phylogenetic trees (i.e. additive trees or X-trees). The output of the traditional tree consensus methods is a unique consensus tree or supertree. Here, we show how Machine Learning (ML) models, based on some interesting properties of the Robinson and Foulds topological distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. We adapt the popular Accuracy, Precision, Sensitivity, and F1 scores to the tree clustering. A special attention is paid to the relevant, but very challenging, problem of inferring alternative supertrees that are built from phylogenies defined on different, but mutually overlapping, sets of species. The use of an approximate objective function in clustering makes the new method faster than the existing tree clustering techniques and thus suitable for the analysis of large genomic datasets.


2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness


2019 ◽  
Vol 11 (9) ◽  
pp. 2531-2541 ◽  
Author(s):  
Valeria Mateo-Estrada ◽  
Lucía Graña-Miraglia ◽  
Gamaliel López-Leal ◽  
Santiago Castillo-Ramírez

Abstract The Gram-negative Acinetobacter genus has several species of clear medical relevance. Many fully sequenced genomes belonging to the genus have been published in recent years; however, there has not been a recent attempt to infer the evolutionary history of Acinetobacter with that vast amount of information. Here, through a phylogenomic approach, we established the most up-to-date view of the evolutionary relationships within this genus and highlighted several cases of poor classification, especially for the very closely related species within the Acinetobacter calcoaceticus–Acinetobacter baumannii complex (Acb complex). Furthermore, we determined appropriate phylogenetic markers for this genus and showed that concatenation of the top 13 gives a very decent reflection of the evolutionary relationships for the genus Acinetobacter. The intersection between our top markers and previously defined universal markers is very small. In general, our study shows that, although there seems to be hardly any universal markers, bespoke phylogenomic approaches can be used to infer the phylogeny of different bacterial genera. We expect that ad hoc phylogenomic approaches will be the standard in the years to come and will provide enough information to resolve intricate evolutionary relationships like those observed in the Acb complex.


2006 ◽  
Vol 12 (2) ◽  
pp. 243-257 ◽  
Author(s):  
Ross Clement

The Cichlid Speciation Project (CSP) is an ALife simulation system for investigating open problems in the speciation of African cichlid fish. The CSP can be used to perform a wide range of experiments that show that speciation is a natural consequence of certain biological systems. A visualization system capable of extracting the history of speciation from low-level trace data and creating a phylogenetic tree has been implemented. Unlike previous approaches, this visualization system presents a concrete trace of speciation, rather than a summary of low-level information from which the viewer can make subjective decisions on how speciation progressed. The phylogenetic trees are a more objective visualization of speciation, and enable automated collection and summarization of the results of experiments. The visualization system is used to create a phylogenetic tree from an experiment that models sympatric speciation.


2011 ◽  
Vol 09 (06) ◽  
pp. 729-747 ◽  
Author(s):  
MD. SHAIK SADI ◽  
FEI-CHING KUO ◽  
JOSHUA W. K. HO ◽  
MICHAEL A. CHARLESTON ◽  
T. Y. CHEN

Many phylogenetic inference programs are available to infer evolutionary relationships among taxa using aligned sequences of characters, typically DNA or amino acids. These programs are often used to infer the evolutionary history of species. However, in most cases it is impossible to systematically verify the correctness of the tree returned by these programs, as the correct evolutionary history is generally unknown and unknowable. In addition, it is nearly impossible to verify whether any non-trivial tree is correct in accordance to the specification of the often complicated search and scoring algorithms. This difficulty is known as the oracle problem of software testing: there is no oracle that we can use to verify the correctness of the returned tree. This makes it very challenging to test the correctness of any phylogenetic inference programs. Here, we demonstrate how to apply a simple software testing technique, called Metamorphic Testing, to alleviate the oracle problem in testing phylogenetic inference programs. We have used both real and randomly generated test inputs to evaluate the effectiveness of metamorphic testing, and found that metamorphic testing can detect failures effectively in faulty phylogenetic inference programs with both types of test inputs.


2009 ◽  
Vol 75 (16) ◽  
pp. 5410-5416 ◽  
Author(s):  
Gabriele Margos ◽  
Stephanie A. Vollmer ◽  
Muriel Cornet ◽  
Martine Garnier ◽  
Volker Fingerle ◽  
...  

ABSTRACT Analysis of Lyme borreliosis (LB) spirochetes, using a novel multilocus sequence analysis scheme, revealed that OspA serotype 4 strains (a rodent-associated ecotype) of Borrelia garinii were sufficiently genetically distinct from bird-associated B. garinii strains to deserve species status. We suggest that OspA serotype 4 strains be raised to species status and named Borrelia bavariensis sp. nov. The rooted phylogenetic trees provide novel insights into the evolutionary history of LB spirochetes.


Sign in / Sign up

Export Citation Format

Share Document