reference databases
Recently Published Documents


TOTAL DOCUMENTS

229
(FIVE YEARS 122)

H-INDEX

21
(FIVE YEARS 7)

2022 ◽  
Vol 8 ◽  
Author(s):  
Katharina Kniesz ◽  
Anna Maria Jażdżewska ◽  
Pedro Martínez Arbizu ◽  
Terue Cristina Kihara

Hydrothermal vent areas have drawn increasing interest since they were discovered in 1977. Because of chemoautotrophic bacteria, they possess high abundances of vent endemic species as well as many non-vent species around the fields. During the survey conducted by the Bundesanstalt für Geowissenschaften und Rohstoffe (Federal Institute for Geosciences and Natural Resources, BGR) to identify inactive polymetallic sulfide deposits along Central and Southeast Indian Ridges, the INDEX project studied the scavenging amphipod community at three newly discovered hydrothermal fields. A sample consisting of 463 representatives of Amphipoda (Malacostraca: Crustacea) was collected by means of baited traps in active and inactive vents of three different sites and subsequently studied by both morphological and genetic methods. Molecular methods included the analysis of two mitochondrial (cytochrome c oxidase subunit I [COI] and 16S rRNA) and one nuclear (18S rRNA) genes. By six delimitation methods, 22 molecular operational taxonomic units (MOTUs) belonging to 12 genera and 10 families were defined. The existence of potential species complexes was noted for the representatives of the genus Paralicella. The inactive site, where 19 species were found, showed higher species richness than did the active one, where only 10 taxa were recorded. Seven genera, Ambasiopsis, Cleonardo, Eurythenes, Parandania, Pseudonesimus, Tectovalopsis, and Valettiopsis, were observed only at inactive sites, whereas Haptocallisoma, was collected exclusively at active ones. The species Abyssorchomene distinctus (Birstein and Vinogradov, 1960), Hirondellea brevicaudata Chevreux, 1910, and Hirondellea guyoti Barnard and Ingram, 1990, have been previously reported from vent sites in the Atlantic or Pacific oceans. The present study provides the first report of Eurythenes magellanicus (H. Milne Edwards, 1848) and five other already described species in the Indian Ocean. The addition of 356 sequences strongly increases the number of amphipod barcodes in reference databases and provides for the first time COI barcodes for Cleonardo neuvillei Chevreux, 1908, Haptocallisoma abyssi (Oldevig, 1959), Hirondellea guyoti, Tectovalopsis fusilus Barnard and Ingram, 1990, and the genera Haptocallisoma, Pseudonesimus, and Valettiopsis.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261548
Author(s):  
Benjamin Voigt ◽  
Oliver Fischer ◽  
Christian Krumnow ◽  
Christian Herta ◽  
Piotr Wojciech Dabrowski

Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient’s sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen’s genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.


2021 ◽  
Vol 5 ◽  
Author(s):  
Alexis Canino ◽  
Agnès Bouchez ◽  
Christophe Laplace-Treyture ◽  
Isabelle Domaizon ◽  
Frédéric Rimet

Methods for biomonitoring of freshwater phytoplankton are evolving rapidly with eDNA-based methods, offering great complementarity with microscopy. Metabarcoding approaches have been more commonly used over the last years, with a continuous increase in the amount of data generated. Depending on the researchers and the way they assigned barcodes to species (bioinformatic pipelines and molecular reference databases), the taxonomic assignment obtained for HTS DNA reads might vary. This is also true for traditional taxonomic studies by microscopy with regular adjustments of the classification and taxonomy. For those reasons (leading to non-homogeneous taxonomies), gap-analyses and comparisons between studies become even more challenging and the curation processes to find potential consensus names are time-consuming. Here, we present a web-based application (Phytool), developed with ShinyApp (Rstudio), that aims to make the harmonisation of taxonomy easier and in a more efficient way, using a complete and up-to-date taxonomy reference database for freshwater microalgae. Phytool allows users to homogenise and update freshwater phytoplankton taxonomical names from sequence files and data tables directly uploaded in the application. It also gathers barcodes from curated references in a user-friendly way in which it is possible to search for specific organisms. All the data provided are downloadable with the possibility to apply filters in order to select only the required taxa and fields (e.g. specific taxonomic ranks). The main goal is to make accessible to a broad range of users the connection between microscopy and molecular biology and taxonomy through different ready-to-use functions. This study estimates that only 25% of species of freshwater phytoplankton in Phytobs are associated with a barcode. We plead for an increased effort to enrich reference databases by coupling taxonomy and molecular methods. Phytool should make this crucial work more efficient. The application is available at https://caninuzzo.shinyapps.io/phytool_v1/


Author(s):  
Jörg Rau ◽  
Tobias Eisenberg ◽  
Christine Wind ◽  
Ingrid Huber ◽  
Melanie Pavlovic ◽  
...  

AbstractMatrix-assisted laser-desorption/ionization-time-of-flight-mass-spectrometry (MALDI-TOF-MS) is widely used to identify microorganisms. Recently, new applications such as identification of the animal species from meat, milk or fish are emerging. Standards for the validation of species identifications are still missing. Now, the § 64-LFGB working-group “MALDI-TOF”, established at the Federal Office of Consumer Protection and Food Safety, has compiled a guideline for the validation of species identifications. This guideline is intended for single laboratories as well as for lab networks and shows practical ways for validation of qualitative MALDI-TOF-MS methods. The special opportunities of the technology, in particular the use of extended reference databases and of collections of well-documented individual spectra for validation, have been taken into account in the guideline presented.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009581
Author(s):  
Michael S. Robeson ◽  
Devon R. O’Rourke ◽  
Benjamin D. Kaehler ◽  
Michal Ziemski ◽  
Matthew R. Dillon ◽  
...  

Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.


2021 ◽  
Author(s):  
G. Péter

Abstracts Zygosaccharomyces species are among the most problematic food spoilage yeasts. The two most infamous species are Zygosaccharomyces balii and Zygosaccharomyces rouxii, although they may also take a positive role during the production of some fermented foods. DNA sequence based yeast identification aided by freely available reference databases of barcoding DNA sequences has boosted the description rate of novel yeast species in the last two decades. The genus Zygosaccharomyces has been considerably expanded as well. Especially the number of the extremely osmotolerant Zygosaccharomyces species, related to Z. rouxii and regularly found in high-sugar foods, has enlarged. A brief account of recent developments in the taxonomy and biodiversity of this important food associated genus is given in this review.


2021 ◽  
Vol 14 (6) ◽  
pp. 1667
Author(s):  
Anurag Renduchintala ◽  
Shantanu Madaboosi ◽  
Mick Gordinier ◽  
Nicholas Mischel ◽  
Richard Weiner

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Audrey Giraud-Gatineau ◽  
Gaetan Texier ◽  
Pierre-Edouard Fournier ◽  
Didier Raoult ◽  
Hervé Chaudet

Abstract Background For the purpose of epidemiological surveillance, the Hospital University Institute Méditerranée infection has implemented since 2013 a system named MIDaS, based on the systematic collection of routine activity materials, including MALDI-TOF spectra, and results. The objective of this paper is to present the pipeline we use for processing MALDI-TOF spectra during epidemiological surveillance in order to disclose proteinic cues that may suggest the existence of epidemic processes in complement of incidence surveillance. It is illustrated by the analysis of an alarm observed for Streptococcus pneumoniae. Methods The MALDI-TOF spectra analysis process looks for the existence of clusters of spectra characterized by a double time and proteinic close proximity. This process relies on several specific methods aiming at contrasting and clustering the spectra, presenting graphically the results for an easy epidemiological interpretation, and for determining the discriminating spectra peaks with their possible identification using reference databases. Results The use of this pipeline in the case of an alarm issued for Streptococcus pneumoniae has made it possible to reveal a cluster of spectra with close proteinic and temporal distances, characterized by the presence of three discriminant peaks (5228.8, 5917.8, and 8974.3 m/z) and the absence of peak 4996.9 m/z. A further investigation on UniProt KB showed that peak 5228.8 is possibly an OxaA protein and that the absent peak may be a transposase. Conclusion This example shows this pipeline may support a quasi-real time identification and characterization of clusters that provide essential information on a potentially epidemic situation. It brings valuable information for epidemiological sensemaking and for deciding on the continuation of the epidemiological investigation, in particular the involving of additional costly resources to confirm or invalidate the alarm. Clinical trials registration NCT03626987.


2021 ◽  
Vol 12 ◽  
Author(s):  
Valérian Lupo ◽  
Mick Van Vlierberghe ◽  
Hervé Vanderschuren ◽  
Frédéric Kerff ◽  
Denis Baurain ◽  
...  

Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.


Author(s):  
Victor Trevino ◽  
Mariel Oyervides ◽  
Genaro A. Ramírez-Correa ◽  
Lourdes Garza

Sign in / Sign up

Export Citation Format

Share Document