Choice of species affects phylogenetic stability of deep nodes: an empirical example in Terrabacteria

Abstract Motivation The promise of higher phylogenetic stability through increased dataset sizes within tree of life (TOL) reconstructions has not been fulfilled. Among the many possible causes are changes in species composition (taxon sampling) that could influence phylogenetic accuracy of the methods by altering the relative weight of the evolutionary histories of each individual species. This effect would be stronger in clades that are represented by few lineages, which is common in many prokaryote phyla. Indeed, phyla with fewer taxa showed the most discordance among recent TOL studies. We implemented an approach to systematically test how the identity of taxa among a larger dataset and the number of taxa included affected the accuracy of phylogenetic reconstruction. Results Utilizing an empirical dataset within Terrabacteria we found that even within scenarios consisting of the same number of taxa, the species used strongly affected phylogenetic stability. Furthermore, we found that trees with fewer species were more dissimilar to the tree produced from the full dataset. These results hold even when the tree is composed by many phyla and only one of them is being altered. Thus, the effect of taxon sampling in one group does not seem to be buffered by the presence of many other clades, making this issue relevant even to very large datasets. Our results suggest that a systematic evaluation of phylogenetic stability through taxon resampling is advisable even for very large datasets. Availability and implementation https://github.com/BlabOaklandU/PATS.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Using taxon resampling to identify species with contrasting phylogenetic signals: an empirical example in Terrabacteria

10.1101/369264 ◽

2018 ◽

Author(s):

Ashley A. Superson ◽

Doug Phelan ◽

Allyson Dekovich ◽

Fabia U. Battistuzzi

Keyword(s):

Phylogenetic Reconstruction ◽

Relative Weight ◽

Large Datasets ◽

Supplementary Information ◽

Systematic Evaluation ◽

Individual Species ◽

Taxon Sampling ◽

Full Dataset ◽

Very Large Datasets ◽

Dataset Size

AbstractMotivationThe promise of higher phylogenetic stability through increasing dataset size within Tree of Life (TOL) reconstructions has not been fulfilled, especially for deep nodes. Among the many causes proposed are changes in species composition (taxon sampling) that could influence phylogenetic accuracy of the methods by altering the relative weight of the evolutionary histories of each individual species. This effect would be stronger in clades that are represented by few lineages, which is common in many Prokaryote phyla. Indeed, phyla with fewer taxa showed the most discordance among recent TOL studies. Thus, we implemented an approach to systematically test how the number of taxa and the identity of those taxa among a larger dataset affected the accuracy of phylogenetic reconstruction.ResultsWe utilized an empirical dataset of 766 fully-sequenced proteomes for phyla within Terrabacteria as a reference for subsampled datasets that differed in both number of species and composition of species. After evaluating the backbone of trees produced as well as the internal nodes, we found that trees with fewer species were more dissimilar to the tree produced from the full dataset. Further, we found that even within scenarios consisting of the same number of taxa, the species used strongly affected phylogenetic stability. These results hold even when the tree is composed by many phyla and only one of them is being altered. Thus, the effect of taxon sampling in one group does not seem to be buffered by the presence of many other clades, making this issue relevant even to very large datasets. Our results suggest that a systematic evaluation of phylogenetic stability through taxon resampling is advisable even for very large [email protected] informationSupplementary text and figures are available on the journal’s website.

Download Full-text

ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization

Bioinformatics ◽

10.1093/bioinformatics/btz211 ◽

2019 ◽

Vol 35 (20) ◽

pp. 3961-3969 ◽

Cited By ~ 9

Author(s):

John Yin ◽

Chao Zhang ◽

Siavash Mirarab

Keyword(s):

Relative Efficiency ◽

Graphics Processing Units ◽

Large Datasets ◽

Supplementary Information ◽

Gene Trees ◽

Multiple Cores ◽

Dynamic Programing ◽

Very Large Datasets ◽

Speed Up ◽

Graphics Processing

Abstract Motivation Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. Results ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. Availability and implementation ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Unsupervised dimensionality reduction for very large datasets: Are we going to the right direction?

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.105777 ◽

2020 ◽

Vol 196 ◽

pp. 105777

Author(s):

Jadson Jose Monteiro Oliveira ◽

Robson Leonardo Ferreira Cordeiro

Keyword(s):

Dimensionality Reduction ◽

Large Datasets ◽

Very Large Datasets ◽

The Right

Download Full-text

Pairwise likelihood inference for spatial regressions estimated on very large datasets

Spatial Statistics ◽

10.1016/j.spasta.2013.10.001 ◽

2014 ◽

Vol 7 ◽

pp. 21-39 ◽

Cited By ~ 10

Author(s):

Giuseppe Arbia

Keyword(s):

Large Datasets ◽

Likelihood Inference ◽

Pairwise Likelihood ◽

Very Large Datasets

Download Full-text

Scalable computation of streamlines on very large datasets

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09 ◽

10.1145/1654059.1654076 ◽

2009 ◽

Cited By ~ 40

Author(s):

Dave Pugmire ◽

Hank Childs ◽

Christoph Garth ◽

Sean Ahern ◽

Gunther H. Weber

Keyword(s):

Large Datasets ◽

Very Large Datasets ◽

Scalable Computation

Download Full-text

Distributed processing of very large datasets with DataCutter

Parallel Computing ◽

10.1016/s0167-8191(01)00099-0 ◽

2001 ◽

Vol 27 (11) ◽

pp. 1457-1478 ◽

Cited By ~ 118

Author(s):

Michael D Beynon ◽

Tahsin Kurc ◽

Umit Catalyurek ◽

Chialin Chang ◽

Alan Sussman ◽

...

Keyword(s):

Distributed Processing ◽

Large Datasets ◽

Very Large Datasets

Download Full-text

Multidimensional Scaling With Very Large Datasets

Journal of Computational and Graphical Statistics ◽

10.1080/10618600.2018.1470001 ◽

2018 ◽

Vol 27 (4) ◽

pp. 935-939 ◽

Cited By ~ 2

Author(s):

Emmanuel Paradis

Keyword(s):

Multidimensional Scaling ◽

Large Datasets ◽

Very Large Datasets

Download Full-text

Automated single particle detection and tracking for large microscopy datasets

Royal Society Open Science ◽

10.1098/rsos.160225 ◽

2016 ◽

Vol 3 (5) ◽

pp. 160225 ◽

Cited By ~ 11

Author(s):

Rhodri S. Wilson ◽

Lei Yang ◽

Alison Dun ◽

Annya M. Smyth ◽

Rory R. Duncan ◽

...

Keyword(s):

Single Molecule ◽

Single Particle ◽

Image Data ◽

Ground Truth ◽

Detection Algorithm ◽

Large Datasets ◽

Single Particle Tracking ◽

Synthetic Image ◽

Particle Detection ◽

Very Large Datasets

Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

Download Full-text

The SAR Model for Very Large Datasets: A Reduced Rank Approach

Econometrics ◽

10.3390/econometrics3020317 ◽

2015 ◽

Vol 3 (2) ◽

pp. 317-338 ◽

Cited By ~ 12

Author(s):

Sandy Burden ◽

Noel Cressie ◽

David Steel

Keyword(s):

Large Datasets ◽

Reduced Rank ◽

Very Large Datasets ◽

Sar Model

Download Full-text

Ammonites phylogenetic analysis: state of the art and new prospects

BSGF - Earth Sciences Bulletin ◽

10.2113/175.5.507 ◽

2004 ◽

Vol 175 (5) ◽

pp. 507-512 ◽

Cited By ~ 10

Author(s):

Isabelle Rouget ◽

Pascal Neigeet ◽

Jean-Louis Dommergues

Keyword(s):

Phylogenetic Reconstruction ◽

Cladistic Analysis ◽

Relative Weight ◽

Morphological Characters ◽

Quality Of Data ◽

Phylogenetic Studies ◽

Evolutionary Concepts ◽

Phylogenetic Hypotheses ◽

Definition Of ◽

Iterative Evolution

Abstract Two main types of data are available to resolve phylogenies using fossils data: (1) stratigraphic ordering of taxa, and (2) morphological characters. In most phylogenetic studies dealing with ammonites, authors have given priority to the stratigraphic distribution of taxa. This practice is classically justified by the fact that the ammonite fossil record is frequently outstandingly good. In practice, the level of integration of stratigraphic and morphologic information in a single analysis depends on the confidence that authors have in the quality of data. Besides, many evolutionary concepts, which could differ over time and between authors (e.g. anagenesis, cladogenesis, iterative evolution), are added to these data to help infer phylogenetic relationships. As a result, phylogenetic hypotheses are based on eclectic methods which depend on the relative weight given to stratigraphic and morphologic information as well as on evolutionary concepts used. The validity of relationships proposed by previous authors is not dealt with in this paper. Instead, our goal is to draw attention to problems that these eclectic methods may cause, that is to say: (1) ammonites systematics is poorly formalised and (2) phylogenetic hypotheses as they are classically constructed are not rigorously testable. During the last 10 years, cladistic analysis has been applied to ammonites but is still unpopular among ammonitologists. However, studies have consistently shown that cladistics is not as unsuited a tool for ammonites phylogenetic reconstruction as is widely believed. Moreover, classical works open new questions about ammonite phylogeny and in particular, help to reappraise our view on the definition of morphological characters and their phylogenetic significance.

Download Full-text