evolutionary placement algorithm
Recently Published Documents

AbstractNext Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. To achieve this, phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the Evolutionary Placement Algorithm (EPA) included in RAxML, or pplacer, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Here we present EPA-ng, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA, and pplacer. EPA-ng can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-ng we placed 1 billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3,748 taxa in just under 7 hours, using 2,048 cores. Our performance assessment shows that EPA-ng outperforms RAxML-EPA and pplacer by up to a factor of 30 in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-ng scales well up to 3,520 cores. EPA-ng is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng

Download Full-text

Taxonomy assignment approach determines the efficiency of identification of OTUs in marine nematodes

Royal Society Open Science ◽

10.1098/rsos.170315 ◽

2017 ◽

Vol 4 (8) ◽

pp. 170315 ◽

Cited By ~ 16

Author(s):

Oleksandr Holovachov ◽

Quiterie Haenel ◽

Sarah J. Bourlat ◽

Ulf Jondelius

Keyword(s):

Reference Data ◽

Marine Nematodes ◽

Marine Nematode ◽

Reference Dataset ◽

Operational Taxonomic Units ◽

Reference Databases ◽

Placement Algorithm ◽

Evolutionary Placement Algorithm ◽

Taxonomic Categories ◽

Assignment Approach

Precision and reliability of barcode-based biodiversity assessment can be affected at several steps during acquisition and analysis of data. Identification of operational taxonomic units (OTUs) is one of the crucial steps in the process and can be accomplished using several different approaches, namely, alignment-based, probabilistic, tree-based and phylogeny-based. The number of identified sequences in the reference databases affects the precision of identification. This paper compares the identification of marine nematode OTUs using alignment-based, tree-based and phylogeny-based approaches. Because the nematode reference dataset is limited in its taxonomic scope, OTUs can only be assigned to higher taxonomic categories, families. The phylogeny-based approach using the evolutionary placement algorithm provided the largest number of positively assigned OTUs and was least affected by erroneous sequences and limitations of reference data, compared to alignment-based and tree-based approaches.

Download Full-text

Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences

10.1101/042200 ◽

2016 ◽

Author(s):

Alexey M. Kozlov ◽

Jiajie Zhang ◽

Pelin Yilmaz ◽

Frank Oliver Glöckner ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Signal ◽

Simulated Data ◽

High Accuracy ◽

Reference Sequence ◽

Manual Curation ◽

Reference Databases ◽

Placement Algorithm ◽

Evolutionary Placement Algorithm ◽

Taxonomic Annotation

AbstractMolecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labour-intensive manual curation process.Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (“mislabels”) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity / 91.7% precision) as well as correction (94.9% sensitivity / 89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria.SATIVA is freely available at https://github.com/amkozlov/sativa.

Download Full-text

Osmunda pulchella sp. nov. from the Jurassic of Swedenreconciling molecular and fossil evidence in the phylogeny of Osmundaceae

10.1101/005777 ◽

2014 ◽

Author(s):

Benjamin Bomfleur ◽

Guido W Grimm ◽

Stephen McLoughlin

Keyword(s):

Molecular Data ◽

Molecular Evidence ◽

Data Sets ◽

Diagnostic Features ◽

Systematic Classification ◽

Fossil Species ◽

Network Analyses ◽

Placement Algorithm ◽

Morphological And Molecular Data ◽

Evolutionary Placement Algorithm

The systematic classification of Osmundaceae has long remained controversial. Recent molecular data indicate that Osmunda is paraphyletic, and needs to be separated into Osmundastrum and Osmunda s. str. Here we describe an exquisitely preserved Jurassic Osmunda rhizome (O. pulchella sp. nov.) that combines diagnostic features of Osmundastrum and Osmunda, calling molecular evidence for paraphyly into question. We assembled a new morphological matrix based on rhizome anatomy, and used network analyses to establish phylogenetic relationships between fossil and extant members of modern Osmundaceae. We re-analysed the original molecular data to evaluate root-placement support. Finally, we integrated morphological and molecular data-sets using the evolutionary placement algorithm. Osmunda pulchella and five additional, newly identified Jurassic Osmunda species show anatomical character suites intermediate between Osmundastrum and Osmunda. Molecular evidence for paraphyly is ambiguous: a previously unrecognized signal from spacer sequences favours an alternative root placement that would resolve Osmunda s.l. as monophyletic. Our evolutionary placement analysis identifies fossil species as ancestral members of modern genera and subgenera. Altogether, the seemingly conflicting evidence from morphological, anatomical, molecular, and palaeontological data can be elegantly reconciled under the assumption that Osmunda is indeed monophyletic; the recently proposed root-placement in Osmundaceaebased solely on molecular datalikely results from un- or misinformative out-group signals.

Download Full-text

evolutionary placement algorithmRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences

Taxonomy assignment approach determines the efficiency of identification of OTUs in marine nematodes

Phylogeny-aware Identification and Correction of Taxonomically Mislabeled Sequences

Osmunda pulchella sp. nov. from the Jurassic of Swedenreconciling molecular and fossil evidence in the phylogeny of Osmundaceae

evolutionary placement algorithm
Recently Published Documents

Osmunda pulchella sp. nov. from the Jurassic of Swedenreconciling molecular and fossil evidence in the phylogeny of Osmundaceae