scholarly journals Probabilities of tree topologies with temporal constraints and diversification shifts

2018 ◽  
Author(s):  
Gilles Didier

AbstractDating the tree of life is a task far more complicated that only determining the evolutionary relationships between species. It is therefore of interest to develop approaches able to deal with undated phylogenetic trees.The main result of this work is a method to compute probabilities of undated phylogenetic trees under piecewiseconstant-birth-death-sampling models by constraining some of the divergence times to belong to given time intervals and by allowing diversification shifts on certain clades. The computation is quite fast since its time complexity is quadratic with the size of the tree topology and linear with the number of time constraints and of “pieces” in the model.The interest of this computation method is illustrated with three applications, namely,to compute the exact distribution of the divergence times of a tree topology with temporal constraints,to directly sample the divergence times of a tree topology, andto test for a diversification shift at a given clade.

2018 ◽  
Author(s):  
Gilles Didier ◽  
Michel Laurin

AbstractBeing given a phylogenetic tree of both extant and extinct taxa in which the fossil ages are the only temporal information (namely, in which divergence times are considered unknown), we provide a method to compute the exact probability distribution of any divergence time of the tree with regard to any speciation (cladogenesis), extinction and fossilization rates under the Fossilized-Birth-Death model.We use this new method to obtain a probability distribution for the age of Amniota (the synapsid/sauropsid or bird/mammal divergence), one of the most-frequently used dating constraints. Our results suggest an older age (between about 322 and 340 Ma) than has been assumed by most studies that have used this constraint (which typically assumed a best estimate around 310-315 Ma) and provide, for the first time, a method to compute the shape of the probability density for this divergence time.


2020 ◽  
Vol 69 (6) ◽  
pp. 1068-1087 ◽  
Author(s):  
Gilles Didier ◽  
Michel Laurin

Abstract Being given a phylogenetic tree of both extant and extinct taxa in which the fossil ages are the only temporal information (namely, in which divergence times are considered unknown), we provide a method to compute the exact probability distribution of any divergence time of the tree with regard to any speciation (cladogenesis), extinction, and fossilization rates under the Fossilized Birth–Death model. We use this new method to obtain a probability distribution for the age of Amniota (the synapsid/sauropsid or bird/mammal divergence), one of the most-frequently used dating constraints. Our results suggest an older age (between about 322 and 340 Ma) than has been assumed by most studies that have used this constraint (which typically assumed a best estimate around 310–315 Ma) and provide, for the first time, a method to compute the shape of the probability density for this divergence time. [Divergence times; fossil ages; fossilized birth–death model; probability distribution.]


2021 ◽  
Author(s):  
Aintzane Santaquiteria ◽  
Alexandre C Siqueira ◽  
Emanuell Duarte-Ribeiro ◽  
Giorgio Carnevale ◽  
William White ◽  
...  

Abstract The charismatic trumpetfishes, goatfishes, dragonets, flying gurnards, seahorses, and pipefishes encompass a recently defined yet extraordinarily diverse clade of percomorph fishes—the series Syngnatharia. This group is widely distributed in tropical and warm-temperate regions, with a great proportion of its extant diversity occurring in the Indo-Pacific. Because most syngnatharians feature long-range dispersal capabilities, tracing their biogeographic origins is challenging. Here, we applied an integrative phylogenomic approach to elucidate the evolutionary biogeography of syngnatharians. We built upon a recently published phylogenomic study that examined ultraconserved elements by adding 62 species (total 169 species) and one family (Draconettidae), to cover ca. 25% of the species diversity and all 10 families in the group. We inferred a set of time-calibrated trees and conducted ancestral range estimations. We also examined the sensitivity of these analyses to phylogenetic uncertainty (estimated from multiple genomic subsets), area delimitation, and biogeographic models that include or exclude the jump-dispersal parameter (j). Of the three factors examined, we found that the j parameter has the strongest effect in ancestral range estimates, followed by number of areas defined, and tree topology and divergence times. After accounting for these uncertainties, our results reveal that syngnatharians originated in the ancient Tethys Sea ca. 87 Ma (84–94 Ma; Late Cretaceous) and subsequently occupied the Indo-Pacific. Throughout syngnatharian history, multiple independent lineages colonized the eastern Pacific (6–8 times) and the Atlantic (6–14 times) from their center of origin, with most events taking place following an east-to-west route prior to the closure of the Tethys Seaway ca. 12–18 Ma. Ultimately, our study highlights the importance of accounting for different factors generating uncertainty in macroevolutionary and biogeographic inferences.


2007 ◽  
Vol 70 (2) ◽  
pp. 635-640
Author(s):  
Saralees Nadarajah ◽  
Samuel Kotz

2016 ◽  
Author(s):  
Arshan Nasir ◽  
Kyung Mo Kim ◽  
Gustavo Caetano-Anollés

In a recent eLetter and associated preprint, Harish, Abroi, Gough and Kurland criticized our structural phylogenomic methods, which support the early cellular origin of viruses. Their claims include the argument that the rooting of our trees is artifactual and distorted by small genome (proteome) size. Here we uncover their aprioristic reasoning, which mingles with misunderstandings and misinterpretations of cladistic methodology. To demonstrate, we labeled the phylogenetic positions of the smallest proteomes in our phylogenetic trees and confirm that the smallest genomes were neither attracted towards the root nor caused any distortions in the four-supergroup tree of life. Their results therefore stem from confusing outgroups with ancestors and handpicking problematic taxa to distort tree reconstruction. In doing so, they ignored the details of our rooting method, taxa sampling rationale, the plethora of evidence given in our study supporting the ancient origin of the viral supergroup and also recent literature on viral evolution. Indeed, our tree of life uncovered many viral monophyletic groups consistent with ICTV classifications and showed remarkable evolutionary tracings of virion morphotypes onto a revealing tree topology.


Author(s):  
Gal Horesh ◽  
Grace Blackwell ◽  
Gerry Tonkin-Hill ◽  
Jukka Corander ◽  
Eva Heinz ◽  
...  

AbstractEscherichia coli is a highly diverse organism which includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation, and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10,000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialised pathovars of E. coli. We provide these data in a number of easily accessible formats which can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasises our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.Author NotesAll supporting data have been provided within the article or through supplementary data files. All supporting code is provided in the git repository https://github.com/ghoresh11/ecoli_genome_collection.Significance as a BioResource to the communityAs of today, there are more than 140,000 E. coli genomes available on public databases. While data is widely available, collating the data and extracting meaningful information from it often requires multiple steps, computational resources and expert knowledge. Here, we collate a high quality and comprehensive set of over 10,000 E. coli genomes, isolated from human hosts, into a set of manageable files that offer an accessible and usable snapshot of the currently available genome data, linked to a minimal data quality standard. The data provided includes a detailed synopsis of the main lineages present, including their antimicrobial and virulence profiles, their complete gene content, and all the associated metadata for each genome. This includes a database which enables the user to compare newly sequenced isolates against the assembled genomes. Additionally, we provide a searchable index which allows the user to query any DNA sequence against the assemblies of the collection. This collection paves the path for many future studies, including those investigating the differences between E. coli lineages, following the evolution of different genes in the E. coli pan-genome and exploring the dynamics of horizontal gene transfer in this important organism.Data SummaryThe complete aggregated metadata of 10,146 high quality genomes isolated from human hosts (doi.org/10.6084/m9.figshare.12514883, File F1).A PopPUNK database which can be used to query any genome and examine its context relative to this collection (Deposited to doi.org/10.6084/m9.figshare.12650834).A BIGSI index of all the genomes which can be used to easily and quickly query the genomes for any DNA sequence of 61 bp or longer (Deposited to doi.org/10.6084/m9.figshare.12666497).Description and complete profiling the 50 largest lineages which represent the majority of publicly available human-isolated E. coli genomes (doi.org/10.6084/m9.figshare.12514883, File F2). Phylogenetic trees of representative genomes of these lineages, presented in this manuscript, are also provided (doi.org/10.6084/m9.figshare.12514883, Files tree_500.nwk and tree_50.nwk).The complete pan-genome of the 50 largest lineages which includes:A FASTA file containing a single representative sequence of each gene of the gene pool (doi.org/10.6084/m9.figshare.12514883, File F3).Complete gene presence-absence across all isolates (doi.org/10.6084/m9.figshare.12514883, File F4).The frequency of each gene within each of the lineages (doi.org/10.6084/m9.figshare.12514883, File F5).The representative sequences from each lineage for all the genes (doi.org/10.6084/m9.figshare.12514883, File F6).


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6271 ◽  
Author(s):  
Gabriel A. Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genomic data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formicidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


2018 ◽  
Author(s):  
Gabriel A Vieira ◽  
Francisco Prosdocimi

The advent of Next Generation Sequencing has reduced sequencing costs and increased genomic projects from a huge amount of organismal taxa, generating an unprecedented amount of genomic datasets publicly available. Often, only a tiny fraction of outstanding relevance of the genome data produced by researchers is used in their works. This fact allows the data generated to be recycled in further projects worldwide. The assembly of complete mitogenomes is frequently overlooked though it is useful to understand evolutionary relationships among taxa, especially those presenting poor mtDNA sampling at the level of genera and families. This is exactly the case for ants (Hymenoptera:Formicidae) and more specifically for the subfamily Pseudomyrmecinae, a group of arboreal ants with several cases of convergent coevolution without any complete mitochondrial sequence available. In this work, we assembled, annotated and performed comparative genomics analyses of 14 new complete mitochondria from Pseudomyrmecinae species relying solely on public datasets available from the Sequence Read Archive (SRA). We used all complete mitogenomes available for ants to study the gene order conservation and also to generate two phylogenetic trees using both (i) concatenated set of 13 mitochondrial genes and (ii) the whole mitochondrial sequences. Even though the tree topologies diverged subtly from each other (and from previous studies), our results confirm several known relationships and generate new evidences for sister clade classification inside Pseudomyrmecinae clade. We also performed a synteny analysis for Formcidae and identified possible sites in which nucleotidic insertions happened in mitogenomes of pseudomyrmecine ants. Using a data mining/bioinformatics approach, the current work increased the number of complete mitochondrial genomes available for ants from 15 to 29, demonstrating the unique potential of public databases for mitogenomics studies. The wide applications of mitogenomes in research and presence of mitochondrial data in different public dataset types makes the “no budget mitogenomics” approach ideal for comprehensive molecular studies, especially for subsampled taxa.


2018 ◽  
Author(s):  
Stephen T. Pollard ◽  
Kenji Fukushima ◽  
Zhengyuan O. Wang ◽  
Todd A. Castoe ◽  
David D. Pollock

ABSTRACTPhylogenetic inference requires a means to search phylogenetic tree space. This is usually achieved using progressive algorithms that propose and test small alterations in the current tree topology and branch lengths. Current programs search tree topology space using branch-swapping algorithms, but proposals do not discriminate well between swaps likely to succeed or fail. When applied to datasets with many taxa, the huge number of possible topologies slows these programs dramatically. To overcome this, we developed a statistical approach for proposal generation in Bayesian analysis, and evaluated its applicability for the problem of searching phylogenetic tree space. The general idea of the approach, which we call ‘Markov katana’, is to make proposals based on a heuristic algorithm using bootstrapped subsets of the data. Such proposals induce an unintended sampling distribution that must be determined and removed to generate posterior estimates, but the cost of this extra step can in principle be small compared to the added value of more efficient parameter exploration in Markov chain Monte Carlo analyses. Our prototype application uses the simple neighbor-joining distance heuristic on data subsets to propose new reasonably likely phylogenetic trees (including topologies and branch lengths). The evolutionary model used to generate distances in our prototype was far simpler than the more complex model used to evaluate the likelihood of phylogenies based on the full dataset. This prototype implementation indicates that the Markov katana approach could be easily incorporated into existing phylogenetic search programs and may prove a useful alternative in conjunction with existing methods. The general features of this statistical approach may also prove useful in disciplines other than phylogenetics. We demonstrate that this method can be used to efficiently estimate a Bayesian posterior.


2019 ◽  
Author(s):  
Bowen Shi ◽  
Shan Shi ◽  
Junhua Wu ◽  
Musheng Chen

In this paper, we propose a new stereo matching algorithm to measure the correlation between two rectified image patches. The difficulty near objects' boundaries and textureless areas is a widely discussed issue in local correlation-based algorithms and most approaches focus on the cost aggregation step to solve the problem. We analyze the inherent limitations of sum of absolute differences (SAD) and sum of squared differences (SSD), then propose a new difference computation method to restrain the noise near objects' boundaries and enlarge the intensity variations in textureless areas. The proposed algorithm can effectively deal with the problems and generate more accurate disparity maps than SAD and SSD without time complexity increasing. Furthermore, proved by experiments, the algorithm can also be applied in some SAD-based and SSD-based algorithms to achieve better results than the original.


Sign in / Sign up

Export Citation Format

Share Document