ncbi taxonomy
Recently Published Documents


TOTAL DOCUMENTS

24
(FIVE YEARS 14)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Vol 6 (4) ◽  
pp. 3-18
Author(s):  
Н. В. Данцюк ◽  
Э. С. Челебиева ◽  
Г. С. Минюк
Keyword(s):  

В статье приведены сведения о специализированной рабочей коллекции каротиногенных микроводорослей отдела физиологии животных и биохимии Федерального исследовательского центра «Институт биологии южных морей имени А. О. Ковалевского РАН» (ФИЦ ИнБЮМ), созданной в рамках научной и прикладной тематик института для исследования механизмов стресс-толерантности у эврибионтных и экстремофильных одноклеточных фототрофов, а также для выявления коммерчески значимых источников высокоценных в медицинском и пищевом отношении кетокаротиноидов группы астаксантина. Коллекция насчитывает 44 штамма микроводорослей различной таксономической и экологической специализации с выраженной способностью к гиперсинтезу вторичных каротиноидов и липидов при экстремальных внешних воздействиях (высыхание, острое голодание, высокая освещённость, температура и солёность, действие токсикантов и др.). Основными способами пополнения фонда являются направленный обмен каротиногенными видами с ведущими российскими и зарубежными коллекциями микроводорослей и собственные полевые сборы в причерноморских зонах Крыма и Кавказа. Большинство штаммов в коллекции — представители двух порядков класса Chlorophyceae [Chlamydomonadales (25 штаммов) и Sphaeropleales (15 штаммов)], так как именно в этих порядках явление вторичного каротиногенеза распространено наиболее широко. Среди них преобладают обитатели эфемерных пресноводных водоёмов, аэрофильные и почвенные микроводоросли. Все штаммы поддерживаются в состоянии альгологически чистых культур при контролируемых условиях на агаризованных минеральных средах. Описания вариететов коллекции включают следующие сведения: а) современный таксономический статус вида, верифицированный с учётом обновлённых данных депонирующих коллекций и альгологических баз AlgaeBase и NCBI Taxonomy Browser; б) базионим и известные синонимы вида; в) время и источник поступления штамма в коллекцию; г) фамилию автора, географическое место и биотоп, из которого штамм был изолирован; д) номер штамма в NCBI (если есть); е) питательную среду, на которой штамм поддерживается в коллекции ФИЦ ИнБЮМ. Проанализировано значение коллекции для проведения морфобиологических и физиолого-биохимических исследований особенностей роста, вторичного каротиногенеза и биотехнологического потенциала зелёных микроводорослей.


2021 ◽  
Author(s):  
Anupam Gautam ◽  
Hendrik Felderhoff ◽  
Caner Bagci ◽  
Daniel H Huson

In microbiome analysis, one main approach is to align metagenomic sequencing reads against a protein-reference database such as NCBI-nr, and then to perform taxonomic and functional binning based on the alignments. This approach is embodied, for example, in the standard DIAMOND+MEGAN analysis pipeline, which first aligns reads against NCBI-nr using DIAMOND and then performs taxonomic and functional binning using MEGAN. Here we propose the use of the AnnoTree protein database, rather than NCBI-nr, in such alignment-based analyses to determine the prokaryotic content of metagenomic samples. We demonstrate a 2-fold speedup over the usage of the prokaryotic part of NCBI-nr, and increased assignment rates, in particular, assigning twice as many reads to KEGG. In addition to binning to the NCBI taxonomy, MEGAN now also bins to the GTDB taxonomy.


2021 ◽  
Author(s):  
Boqian Wang ◽  
Jianglin Zhou ◽  
Yuan Jin ◽  
Mingda Hu ◽  
Yunxiang Zhao ◽  
...  

It is important to conduct taxonomy research on the bacteria kingdom for deeper understanding, which can utilize the conserved genes, 16s rRNA, protein domain, and so on. Among them, the methods based on the protein domain has a direct relationship with phenotype. However, these methods still lack analysis of their biological significance, models evaluation and the comparison of taxonomy results. To this end, we propose a complete framework to standardize the process for taxonomy problem based on the protein functional domain. By applying it to bacteria kingdom and comparing the results with the NCBI taxonomy, we point out the most appropriate method in each step of the framework and evaluate models according to the biological significance. Finally, taxonomy suggestions and recommendations are proposed based on the phylogenetic tree generated by the framework with the most appropriate combination.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tetsu Sakamoto ◽  
J. Miguel Ortega

Abstract Background NCBI Taxonomy is the main taxonomic source for several bioinformatics tools and databases since all organisms with sequence accessions deposited on INSDC are organized in its hierarchical structure. Despite the extensive use and application of this data source, an alternative representation of data as a table would facilitate the use of information for processing bioinformatics data. To do so, since some taxonomic-ranks are missing in some lineages, an algorithm might propose provisional names for all taxonomic-ranks. Results To address this issue, we developed an algorithm that takes the tree structure from NCBI Taxonomy and generates a hierarchically complete taxonomic table, maintaining its compatibility with the original tree. The procedures performed by the algorithm consist of attempting to assign a taxonomic-rank to an existing clade or “no rank” node when possible, using its name as part of the created taxonomic-rank name (e.g. Ord_Ornithischia) or interpolating parent nodes when needed (e.g. Cla_of_Ornithischia), both examples given for the dinosaur Brachylophosaurus lineage. The new hierarchical structure was named Taxallnomy because it contains names for all taxonomic-ranks, and it contains 41 hierarchical levels corresponding to the 41 taxonomic-ranks currently found in the NCBI Taxonomy database. From Taxallnomy, users can obtain the complete taxonomic lineage with 41 nodes of all taxa available in the NCBI Taxonomy database, without any hazard to the original tree information. In this work, we demonstrate its applicability by embedding taxonomic information of a specified rank into a phylogenetic tree and by producing metagenomics profiles. Conclusion Taxallnomy applies to any bioinformatics analyses that depend on the information from NCBI Taxonomy. Taxallnomy is updated periodically but with a distributed PERL script users can generate it locally using NCBI Taxonomy as input. All Taxallnomy resources are available at http://bioinfo.icb.ufmg.br/taxallnomy.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Lindy Edwards ◽  
Rebecca Jackson ◽  
James A Overton ◽  
Randi Vita ◽  
Nina Blazeska ◽  
...  

Abstract The Immune Epitope Database (IEDB) freely provides experimental data regarding immune epitopes to the scientific public. The main users of the IEDB are immunologists who can easily use our web interface to search for peptidic epitopes via their simple single-letter codes. For example, ‘A’ stands for ‘alanine’. Similarly, users can easily navigate the IEDB’s simplified NCBI taxonomy hierarchy to locate proteins from specific organisms. However, some epitopes are non-peptidic, such as carbohydrates, lipids, chemicals and drugs, and it is more challenging to consistently name them and search upon, making access to their data more problematic for immunologists. Therefore, we set out to improve access to non-peptidic epitope data in the IEDB through the simplification of the non-peptidic hierarchy used in our search interfaces. Here, we present these efforts and their outcomes. Database URL:  http://www.iedb.org/


Author(s):  
Takeru Nakazato

DNA barcoding technology has become employed widely for biodiversity and molecular biology researchers to identify species and analyze their phylogeny. Recently, DNA metabarcoding and environmental DNA (eDNA) technology have developed by expanding the concept of DNA barcoding. These techniques analyze the diversity and quantity of organisms within an environment by detecting biogenic DNA in water and soil. It is particularly popular for monitoring fish species living in rivers and lakes (Takahara et al. 2012). BOLD Systems (Barcode of Life Database systems, Ratnasingham and Hebert 2007) is a database for DNA barcoding, archiving 8.5 million of barcodes (as of August 2020) along with the voucher specimen, from which the DNA barcode sequence is derived, including taxonomy, collected country, and museum vouchered as metadata (e.g. https://www.boldsystems.org/index.php/Public_RecordView?processid=TRIBS054-16). Also, many barcoding data are submitted to GenBank (Sayers et al. 2020), which is a database for DNA sequences managed by NCBI (National Center for Biotechnology Information, US). The number of the records of DNA barcodes, i.e. COI (cytochrome c oxidase I) gene for animal, has grown significantly (Porter and Hajibabaei 2018). BOLD imports DNA barcoding data from GenBank, and lots of DNA barcoding data in GenBank are also assigned BOLD IDs. However, we have to refer to both BOLD and GenBank data when performing DNA barcoding. I have previously investigated the registration of DNA barcoding data in GenBank, especially the association with BOLD, using insects and flowering plants as examples (Nakazato 2019). Here, I surveyed the number of species covered by BOLD and GenBank. I used fish data as an example because eDNA research is particularly focused on fish. I downloaded all GenBank files for vertebrates from NCBI FTP (File Transfer Protocol) sites (as of November 2019). Of the GenBank fish entries, 86,958 (7.3%) were assigned BOLD identifiers (IDs). The NCBI taxonomy database has registrations for 39,127 species of fish, and 20,987 scientific names at the species level (i.e., excluding names that included sp., cf. or aff.). GenBank entries with BOLD IDs covered 11,784 species (30.1%) and 8,665 species-level names (41.3%). I also obtained whole "specimens and sequences combined data" for fish from BOLD systems (as of November 2019). In the BOLD, there are 273,426 entries that are registered as fish. Of these entries, 211,589 BOLD entries were assigned GenBank IDs, i.e. with values in “genbank_accession” column, and 121,748 entries were imported from GenBank, i.e. with "Mined from GenBank, NCBI" description in "institution_storing" column. The BOLD data covered 18,952 fish species and 15,063 species-level names, but 35,500 entries were assigned no species-level names and 22,123 entries were not even filled with family-level names. At the species level, 8,067 names co-occurred in GenBank and BOLD, with 6,997 BOLD-specific names and 599 GenBank-specific names. GenBank has 425,732 fish entries with voucher IDs, of which 340,386 were not assigned a BOLD ID. Of these 340,386 entries, 43,872 entries are registrations for COI genes, which could be candidates for DNA barcodes. These candidates include 4,201 species that are not included in BOLD, thus adding these data will enable us to identify 19,863 fish to the species level. For researchers, it would be very useful if both BOLD and GenBank DNA barcoding data could be searched in one place. For this purpose, it is necessary to integrate data from the two databases. A lot of biodiversity data are recorded based on the Darwin Core standard while DNA sequencing data are sometimes integrated or cross-linked by RDF (Resource Description Framework). It may not be technically difficult to integrate these data, but the species data referenced differ from the EoL (The Encyclopedia of Life) for BOLD and the NCBI taxonomy for GenBank, and the differences in taxonomic systems make it difficult to match by scientific name description. GenBank has fields for the latitude and longitude of the specimens sampled, and Porter and Hajibabaei 2018 argue that this information should be enhanced. However, this information may be better described in the specimen and occurrence databases. The integration of barcoding data with the specimen and occurrence data will solve these problems. Most importantly, it will save the researcher from having to register the same information in multiple databases. In the field of biodiversity, only DNA barcode sequences may have been focused on and used as gene sequences. The museomics community regards museum-preserved specimens as rich resources for DNA studies because their biodiversity information can accompany the extraction and analysis of their DNA (Nakazato 2018). GenBank is useful for biodiversity studies due to its low rate of mislabelling (Leray et al. 2019). In the future, we will be working with a variety of DNA, including genomes from museum specimens as well as DNA barcoding. This will require more integrated use of biodiversity information and DNA sequence data. This integration is also of interest to molecular biologists and bioinformaticians.


Author(s):  
Marcos Zárate ◽  
Paula Zermoglio ◽  
John Wieczorek ◽  
Anabela Plos ◽  
Renato Mazzanti

Scientists frequently collect biological and environmental information over years and store it in database systems to answer their own research questions without exposing it in repositories that make it easy to find and retrieve. While in recent years the community working on biodiversity informatics has made significant strides by creating common shared vocabularies such as the Darwin Core (DwC, Wieczorek et al. 2012) and publishing mechanisms such as the Integrated Publishing Toolkit (IPT, Robertson et al. 2014), integration is largely limited to the aggregation of datasets and full interoperability has still not been achieved. In this context, The Semantic Web (SW) aims to represent information in a way that, in addition to the human-centered display purposes, it can be used autonomously by machines for integration and reuse across applications. From the biodiversity informatics point of view, interoperability and links among data sources would allow integration of information that is otherwise disconnected, enabling scientists to answer broader questions. These considerations provide strong motivations to formulate a web application considering the semantic interoperability that may provide answers to questions such as the following: (Q1) Is it possible to complement taxonomic, bibliographic and environmental information of a particular species without relying on specific Application Programming Interfaces (APIs)? (Q2) How to relate occurrences of species with environmental variables within a specific region? (Q3) What are the bibliographic references associated with a given species? (Q1) Is it possible to complement taxonomic, bibliographic and environmental information of a particular species without relying on specific Application Programming Interfaces (APIs)? (Q2) How to relate occurrences of species with environmental variables within a specific region? (Q3) What are the bibliographic references associated with a given species? With questions such as these in mind, we present the design of a proof-of-concept application: Linked Open Biodiversity Data (LOBD). LOBD uses Linked Data (LD) (Heath and Bizer 2011) to complement species occurrence information previously extracted from GBIF and converted to Resource Description Framework (RDF) (Zárate et al. 2020) with information about the taxa in question from different RDF datasets, such as Wikidata, NCBI Taxonomy, Springer Nature SciGraph and OpenCitation corpus. A simplified view of the architecture is shown in Fig. 1. To achieve semantic interoperability, we use the SPARQL query language, which allows us not to depend on specific APIs to retrieve information. The application consists of three modules: General information, where the Wikidata endpoint is used to retrieve additional information about the selected species, including links to other databases and information about the species extracted from National Center for Biotechnology Information (NCBI) Taxonomy. Bibliography, where all publications related to the species are retrieved and extracted from OpenCitation. Environment, where users can plot species on a map and add layers related to marine regions as well as environmental layers (e.g., temperature, salinity, etc). General information, where the Wikidata endpoint is used to retrieve additional information about the selected species, including links to other databases and information about the species extracted from National Center for Biotechnology Information (NCBI) Taxonomy. Bibliography, where all publications related to the species are retrieved and extracted from OpenCitation. Environment, where users can plot species on a map and add layers related to marine regions as well as environmental layers (e.g., temperature, salinity, etc). For the development of the application, we use the Shiny framework for R, access to SPARQL endpoints is done through the SPARQL package, marine regions are obtained from marineregion.org and the environmental layers are extracted from Bio-ORACLE. The data used for this article were collected by the Center for the Study of Marine Systems at the National Patagonian Sci-Tech Centre (CCT CENPAT-CONICET), and are published and available through the GBIF network. Linked Data is a powerful tool for scientists, as it allows generating new approaches to biodiversity informatics, which can help to address the data integration challenges. Users would benefit from complementing the current prevalent use of vocabularies that are not ontologically defined (like DwC) for sharing biodiversity data. Although this application is a proof of concept, it shows that with little effort, it is possible to achieve greater interoperability between datasets that were not initially represented as LD.


Author(s):  
Tetsu Sakamoto ◽  
J. Miguel Ortega

ABSTRACTNCBI Taxonomy is the main taxonomic source for several bioinformatics tools and databases since all organisms with sequence accessions deposited on INSDC are organized in its hierarchical structure. Despite the extensive use and application of this data source, taking advantage of its taxonomic tree could be challenging because (1) some taxonomic ranks are missing in some lineages and (2) some nodes in the tree do not have a taxonomic rank assigned (referred to as “no rank”). To address this issue, we developed an algorithm that takes the tree structure from NCBI Taxonomy and generates a hierarchically complete taxonomic tree. The procedures performed by the algorithm consist of attempting to assign a taxonomic rank to “no rank” nodes and of creating/deleting nodes throughout the tree. The algorithm also creates a name for the new nodes by borrowing the names from its ranked child or, if there is no child, from its ranked parent node. The new hierarchical structure was named taxallnomy and it contains 33 hierarchical levels corresponding to the 33 taxonomic ranks currently used in the NCBI Taxonomy database. From taxallnomy, users can obtain the complete taxonomic lineage with 33 nodes of all taxa available in the NCBI Taxonomy database. Taxallnomy is applicable to several bioinformatics analyses that depend on NCBI Taxonomy data. In this work, we demonstrate its applicability by embedding taxonomic information of a specified rank into a phylogenetic tree; and by making metagenomics profiles. Taxallnomy algorithm was written in PERL and all its resources are available at bioinfo.icb.ufmg.br/taxallnomy.Database URL: http://bioinfo.icb.ufmg.br/taxallnomy


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Conrad L Schoch ◽  
Stacy Ciufo ◽  
Mikhail Domrachev ◽  
Carol L Hotton ◽  
Sivakumar Kannan ◽  
...  

Abstract The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy


Sign in / Sign up

Export Citation Format

Share Document