evolutionary placement algorithm
Recently Published Documents


TOTAL DOCUMENTS

4
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

2018 ◽  
Author(s):  
Pierre Barbera ◽  
Alexey M. Kozlov ◽  
Lucas Czech ◽  
Benoit Morel ◽  
Diego Darriba ◽  
...  

AbstractNext Generation Sequencing (NGS) technologies have led to a ubiquity of molecular sequence data. This data avalanche is particularly challenging in metagenetics, which focuses on taxonomic identification of sequences obtained from diverse microbial environments. To achieve this, phylogenetic placement methods determine how these sequences fit into an evolutionary context. Previous implementations of phylogenetic placement algorithms, such as the Evolutionary Placement Algorithm (EPA) included in RAxML, or pplacer, are being increasingly used for this purpose. However, due to the steady progress in NGS technologies, the current implementations face substantial scalability limitations. Here we present EPA-ng, a complete reimplementation of the EPA that is substantially faster, offers a distributed memory parallelization, and integrates concepts from both, RAxML-EPA, and pplacer. EPA-ng can be executed on standard shared memory, as well as on distributed memory systems (e.g., computing clusters). To demonstrate the scalability of EPA-ng we placed 1 billion metagenetic reads from the Tara Oceans Project onto a reference tree with 3,748 taxa in just under 7 hours, using 2,048 cores. Our performance assessment shows that EPA-ng outperforms RAxML-EPA and pplacer by up to a factor of 30 in sequential execution mode, while attaining comparable parallel efficiency on shared memory systems. We further show that the distributed memory parallelization of EPA-ng scales well up to 3,520 cores. EPA-ng is available under the AGPLv3 license: https://github.com/Pbdas/epa-ng


2017 ◽  
Vol 4 (8) ◽  
pp. 170315 ◽  
Author(s):  
Oleksandr Holovachov ◽  
Quiterie Haenel ◽  
Sarah J. Bourlat ◽  
Ulf Jondelius

Precision and reliability of barcode-based biodiversity assessment can be affected at several steps during acquisition and analysis of data. Identification of operational taxonomic units (OTUs) is one of the crucial steps in the process and can be accomplished using several different approaches, namely, alignment-based, probabilistic, tree-based and phylogeny-based. The number of identified sequences in the reference databases affects the precision of identification. This paper compares the identification of marine nematode OTUs using alignment-based, tree-based and phylogeny-based approaches. Because the nematode reference dataset is limited in its taxonomic scope, OTUs can only be assigned to higher taxonomic categories, families. The phylogeny-based approach using the evolutionary placement algorithm provided the largest number of positively assigned OTUs and was least affected by erroneous sequences and limitations of reference data, compared to alignment-based and tree-based approaches.


2016 ◽  
Author(s):  
Alexey M. Kozlov ◽  
Jiajie Zhang ◽  
Pelin Yilmaz ◽  
Frank Oliver Glöckner ◽  
Alexandros Stamatakis

AbstractMolecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labour-intensive manual curation process.Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (“mislabels”) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity / 91.7% precision) as well as correction (94.9% sensitivity / 89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria.SATIVA is freely available at https://github.com/amkozlov/sativa.


2014 ◽  
Author(s):  
Benjamin Bomfleur ◽  
Guido W Grimm ◽  
Stephen McLoughlin

The systematic classification of Osmundaceae has long remained controversial. Recent molecular data indicate that Osmunda is paraphyletic, and needs to be separated into Osmundastrum and Osmunda s. str. Here we describe an exquisitely preserved Jurassic Osmunda rhizome (O. pulchella sp. nov.) that combines diagnostic features of Osmundastrum and Osmunda, calling molecular evidence for paraphyly into question. We assembled a new morphological matrix based on rhizome anatomy, and used network analyses to establish phylogenetic relationships between fossil and extant members of modern Osmundaceae. We re-analysed the original molecular data to evaluate root-placement support. Finally, we integrated morphological and molecular data-sets using the evolutionary placement algorithm. Osmunda pulchella and five additional, newly identified Jurassic Osmunda species show anatomical character suites intermediate between Osmundastrum and Osmunda. Molecular evidence for paraphyly is ambiguous: a previously unrecognized signal from spacer sequences favours an alternative root placement that would resolve Osmunda s.l. as monophyletic. Our evolutionary placement analysis identifies fossil species as ancestral members of modern genera and subgenera. Altogether, the seemingly conflicting evidence from morphological, anatomical, molecular, and palaeontological data can be elegantly reconciled under the assumption that Osmunda is indeed monophyletic; the recently proposed root-placement in Osmundaceae—based solely on molecular data—likely results from un- or misinformative out-group signals.


Sign in / Sign up

Export Citation Format

Share Document