scholarly journals Contribution to a reference library of DNA barcodes for Colombian freshwater fishes

Author(s):  
Manuela Mejía Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Iván Soto Calderón

The Barcoding was proposed motivated by the mismatch between the low number of taxonomists that contrasts with the large number of species, the method requires the construction of reference collections of DNA sequences that represent existing biodiversity. Freshwater fishes are key indicators for understanding biogeography around the world. Colombia with 1610 species of freshwater fishes is the second richest country in the world in this group. However, genetic information of the species continues to be limited, the contribution to a reference library of DNA barcodes for Colombian freshwater fishes highlights the importance of biological collections and seeks to strengthen inventories and taxonomy of such collections in future studies. This dataset contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Freshwater fishes from Colombia. The species represented in this dataset correspond to an addition to BOLD public databases of 39 species. Forty-nine specimens were collected in Atrato bassin and 708 in Magdalena-Cauca bassin during the period of 2010 to 2020, two species (Loricariichthys brunneus and Poecilia sphenops) are considered exotic to the Atrato, Cauca and Magdalena basins and four species (Oncorhynchu mykiss, Oreochromis niloticus, Parachromis friedrichsthalii and Xiphophorus helleri) are exotic to Colombian hydrogeographic regions. All specimens are deposited in the CIUA collection at University of Antioquia and have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).

2022 ◽  
Vol 10 ◽  
Author(s):  
Manuela Mejía-Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Marcela Hernández-Zapata ◽  
Iván Soto Calderón

The Barcode of Life initiative was originally motivated by the large number of species, taxonomic difficulties and the limited number of expert taxonomists. Colombia has 1,610 freshwater fish species and comprises the second largest diversity of this group in the world. As genetic information continues to be limited, we constructed a reference collection of DNA sequences of Colombian freshwater fishes deposited in the Ichthyology Collection of the University of Antioquia (CIUA), thus joining the multiple efforts that have been made in the country to contribute to the knowledge of genetic diversity in order to strengthen the inventories of biological collections and facilitate the solution of taxonomic issues in the future. This study contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Colombian freshwater fishes. Fifty-seven of the species represented in this dataset were already available in the Barcode Of Life Data System (BOLD System), while 39 correspond to new species to the BOLD System. Forty-nine specimens were collected in the Atrato River Basin and 708 in the Magdalena-Cauca asin during the period 2010-2020. Two species (Loricariichthys brunneus (Hancock, 1828) and Poecilia sphenops Valenciennes, 1846) are considered exotic to the Atrato, Cauca and Magdalena Basins and four species (Oncorhynchus mykiss (Walbaum, 1792), Oreochromis niloticus (Linnaeus, 1758), Parachromis friedrichsthalii (Heckel, 1840) and Xiphophorus helleri Heckel, 1848) are exotic to the Colombian hydrogeographic regions. All specimens are deposited in CIUA and have their DNA barcodes made publicly available in the BOLD online database. The geographical distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


2019 ◽  
Author(s):  
Jeremy R. deWaard ◽  
Sujeevan Ratnasingham ◽  
Evgeny V. Zakharov ◽  
Alex V. Borisenko ◽  
Dirk Steinke ◽  
...  

AbstractThe reliable taxonomic identification of organisms through DNA sequence data requires a well parameterized library of curated reference sequences. However, it is estimated that just 15% of described animal species are represented in public sequence repositories. To begin to address this deficiency, we provide DNA barcodes for 1,500,003 animal specimens collected from 23 terrestrial and aquatic ecozones at sites across Canada, a nation that comprises 7% of the planet’s land surface. In total, 14 phyla, 43 classes, 163 orders, 1123 families, 6186 genera, and 64,264 Barcode Index Numbers (BINs; a proxy for species) are represented. Species-level taxonomy was available for 38% of the specimens, but higher proportions were assigned to a genus (69.5%) and a family (99.9%). Voucher specimens and DNA extracts are archived at the Centre for Biodiversity Genomics where they are available for further research. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, and the Global Genome Biodiversity Network Data Portal.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Jeremy R. deWaard ◽  
Sujeevan Ratnasingham ◽  
Evgeny V. Zakharov ◽  
Alex V. Borisenko ◽  
Dirk Steinke ◽  
...  

AbstractThe reliable taxonomic identification of organisms through DNA sequence data requires a well parameterized library of curated reference sequences. However, it is estimated that just 15% of described animal species are represented in public sequence repositories. To begin to address this deficiency, we provide DNA barcodes for 1,500,003 animal specimens collected from 23 terrestrial and aquatic ecozones at sites across Canada, a nation that comprises 7% of the planet’s land surface. In total, 14 phyla, 43 classes, 163 orders, 1123 families, 6186 genera, and 64,264 Barcode Index Numbers (BINs; a proxy for species) are represented. Species-level taxonomy was available for 38% of the specimens, but higher proportions were assigned to a genus (69.5%) and a family (99.9%). Voucher specimens and DNA extracts are archived at the Centre for Biodiversity Genomics where they are available for further research. The corresponding sequence and taxonomic data can be accessed through the Barcode of Life Data System, GenBank, the Global Biodiversity Information Facility, and the Global Genome Biodiversity Network Data Portal.


2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
José Manuel Tierno de Figueroa ◽  
Filipa Martins ◽  
Joana Verissimo ◽  
Lorenzo Quaglietta ◽  
...  

The use of DNA barcoding allows unprecedented advances in biodiversity assessments and monitoring schemes of freshwater ecosystems; nevertheless, it requires the construction of comprehensive reference collections of DNA sequences that represent the existing biodiversity. Plecoptera are considered particularly good ecological indicators and one of the most endangered groups of insects, but very limited information on their DNA barcodes is available in public databases. Currently, less than 50% of the Iberian species are represented in BOLD. The InBIO Barcoding Initiative Database: contribution to the knowledge on DNA barcodes of Iberian Plecoptera dataset contains records of 71 specimens of Plecoptera. All specimens have been morphologically identified to species level and belong to 29 species in total. This dataset contributes to the knowledge on the DNA barcodes and distribution of Plecoptera from the Iberian Peninsula and it is one of the IBI database public releases that makes available genetic and distribution data for a series of taxa. The species represented in this dataset correspond to an addition to public databases of 17 species and 21 BINs. Fifty-eight specimens were collected in Portugal and 18 in Spain during the period of 2004 to 2018. All specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources and their DNA barcodes are publicly available in the Barcode of Life Data System (BOLD) online database. The distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
Rui Andrade ◽  
Ana Gonçalves ◽  
Pedro Sousa ◽  
Joana Paupério ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Jeremy Miller ◽  
Yanell Braumuller ◽  
Puneet Kishor ◽  
David Shorthouse ◽  
Mariya Dimitrova ◽  
...  

A vast amount of biodiversity data is reported in the primary taxonomic literature. In the past, we have demonstrated the use of semantic enhancement to extract data from taxonomic literature and make it available to a network of databases (Miller et al. 2015). For technical reasons, semantic enhancement of taxonomic literature is most efficient when customized according to the format of a particular journal. This journal-based approach captures and disseminates data on whatever taxa happen to be published therein. But if we want to extract all treatments on a particular taxon of interest, these are likely to be spread across multiple journals. Fortunately, the GoldenGATE Imagine document editor (Sautter 2019) is flexible enough to parse most taxonomic literature. Tyrannosaurus rex is an iconic dinosaur with broad public appeal, as well as the subject of more than a century of scholarship. The Naturalis Biodiversity Center recently acquired a specimen that has become a major attraction in the public exhibit space. For most species on earth, the primary taxonomic literature contains nearly everything that is known about it. Every described species on earth is the subject of one or more taxonomic treatments. A taxon-based approach to semantic enhancement can mobilize all this knowledge using the network of databases and resources that comprise the modern biodiversity informatics infrastructure. When a particular species is of special interest, a taxon-based approach to semantic enhancement can be a powerful tool for scholarship and communication. In light of this, we resolved to semantically enhance all taxonomic treatments on T. rex. Our objective was to make these treatments and associated data available for the broad range of stakeholders who might have an interest in this animal, including professional paleontologists, the curious public, and museum exhibits and public communications personnel. Among the routine parsing and data sharing activities in the Plazi workflow (Agosti and Egloff 2009), taxonomic treatments, as well as cited figures, are deposited in the Biodiversity Literature Repository (BLR), and occurrence records are shared with the Global Biodiversity Information Facility (GBIF). Treatment citations were enhanced with hyperlinks to the cited treatment on TreatmentBank, and specimen citations were linked to their entries on public facing collections databases. We used the OpenBiodiv biodiversity knowledge graph (Senderov et al. 2017) to discover other taxa mentioned together with T. rex, and to create a timeline of T. rex research to evaluate the impact of individual researchers and specimen repositories to T. rex research. We contributed treatment links to WikiData, and queried WikiData to discover identifiers to different platforms holding data about T. rex. We used bloodhound-tracker.net to disambiguate human agents, like collectors, identifiers, and authors. We evaluate the adequacy of the fields currently available to extract data from taxonomic treatments, and make recommendations for future standards.


ZooKeys ◽  
2018 ◽  
Vol 751 ◽  
pp. 129-146 ◽  
Author(s):  
Robert Mesibov

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.


Author(s):  
Javier Molina ◽  
Peggy Newman ◽  
David Martin ◽  
Vicente Ruiz Jurado

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two leading infrastructures serving the biodiversity community. In 2020, the ALA’s occurrence records management systems reached end of life after more than 10 years of operation, and the ALA embarked on a project to replace them. Significant overlap exists in the function of the ALA and GBIF data ingestion pipeline systems. Instead of the ALA developing new systems from scratch, we initiated a project to better align the two infrastructures. The collaboration brings benefits such as the improved reuse of modules and an overall reduction in development and operation costs. The ALA recently replaced its occurrence ingestion system with GBIF pipelines infrastructure and shared code. This is the first milestone of the broader ALA’s Core Infrastructure Project and some of the benefits from it are a more reliable, performant and scalable system, proven by the ability to ingest more and larger datasets while at the same time reducing infrastructure operational costs by more than 40% compared to the previous system. The new system is a key building block for an improved ingestion framework that is being developed within the ALA. The collaboration between the ALA and GBIF development teams will result in more consistent outputs from their respective processing pipelines. It will also allow the broader collective expertise of both infrastructure communities to inform future development and direction. The ALA’s adoption of GBIF pipelines will pave the way for the Living Atlases community to adopt GBIF systems and also contribute to them. In this talk we will introduce the project, share insights on how both the teams from the GBIF and the ALA worked together and finally we will delve into details about the technical implementation and benefits.


Author(s):  
Takeru Nakazato

DNA barcoding technology has become employed widely for biodiversity and molecular biology researchers to identify species and analyze their phylogeny. Recently, DNA metabarcoding and environmental DNA (eDNA) technology have developed by expanding the concept of DNA barcoding. These techniques analyze the diversity and quantity of organisms within an environment by detecting biogenic DNA in water and soil. It is particularly popular for monitoring fish species living in rivers and lakes (Takahara et al. 2012). BOLD Systems (Barcode of Life Database systems, Ratnasingham and Hebert 2007) is a database for DNA barcoding, archiving 8.5 million of barcodes (as of August 2020) along with the voucher specimen, from which the DNA barcode sequence is derived, including taxonomy, collected country, and museum vouchered as metadata (e.g. https://www.boldsystems.org/index.php/Public_RecordView?processid=TRIBS054-16). Also, many barcoding data are submitted to GenBank (Sayers et al. 2020), which is a database for DNA sequences managed by NCBI (National Center for Biotechnology Information, US). The number of the records of DNA barcodes, i.e. COI (cytochrome c oxidase I) gene for animal, has grown significantly (Porter and Hajibabaei 2018). BOLD imports DNA barcoding data from GenBank, and lots of DNA barcoding data in GenBank are also assigned BOLD IDs. However, we have to refer to both BOLD and GenBank data when performing DNA barcoding. I have previously investigated the registration of DNA barcoding data in GenBank, especially the association with BOLD, using insects and flowering plants as examples (Nakazato 2019). Here, I surveyed the number of species covered by BOLD and GenBank. I used fish data as an example because eDNA research is particularly focused on fish. I downloaded all GenBank files for vertebrates from NCBI FTP (File Transfer Protocol) sites (as of November 2019). Of the GenBank fish entries, 86,958 (7.3%) were assigned BOLD identifiers (IDs). The NCBI taxonomy database has registrations for 39,127 species of fish, and 20,987 scientific names at the species level (i.e., excluding names that included sp., cf. or aff.). GenBank entries with BOLD IDs covered 11,784 species (30.1%) and 8,665 species-level names (41.3%). I also obtained whole "specimens and sequences combined data" for fish from BOLD systems (as of November 2019). In the BOLD, there are 273,426 entries that are registered as fish. Of these entries, 211,589 BOLD entries were assigned GenBank IDs, i.e. with values in “genbank_accession” column, and 121,748 entries were imported from GenBank, i.e. with "Mined from GenBank, NCBI" description in "institution_storing" column. The BOLD data covered 18,952 fish species and 15,063 species-level names, but 35,500 entries were assigned no species-level names and 22,123 entries were not even filled with family-level names. At the species level, 8,067 names co-occurred in GenBank and BOLD, with 6,997 BOLD-specific names and 599 GenBank-specific names. GenBank has 425,732 fish entries with voucher IDs, of which 340,386 were not assigned a BOLD ID. Of these 340,386 entries, 43,872 entries are registrations for COI genes, which could be candidates for DNA barcodes. These candidates include 4,201 species that are not included in BOLD, thus adding these data will enable us to identify 19,863 fish to the species level. For researchers, it would be very useful if both BOLD and GenBank DNA barcoding data could be searched in one place. For this purpose, it is necessary to integrate data from the two databases. A lot of biodiversity data are recorded based on the Darwin Core standard while DNA sequencing data are sometimes integrated or cross-linked by RDF (Resource Description Framework). It may not be technically difficult to integrate these data, but the species data referenced differ from the EoL (The Encyclopedia of Life) for BOLD and the NCBI taxonomy for GenBank, and the differences in taxonomic systems make it difficult to match by scientific name description. GenBank has fields for the latitude and longitude of the specimens sampled, and Porter and Hajibabaei 2018 argue that this information should be enhanced. However, this information may be better described in the specimen and occurrence databases. The integration of barcoding data with the specimen and occurrence data will solve these problems. Most importantly, it will save the researcher from having to register the same information in multiple databases. In the field of biodiversity, only DNA barcode sequences may have been focused on and used as gene sequences. The museomics community regards museum-preserved specimens as rich resources for DNA studies because their biodiversity information can accompany the extraction and analysis of their DNA (Nakazato 2018). GenBank is useful for biodiversity studies due to its low rate of mislabelling (Leray et al. 2019). In the future, we will be working with a variety of DNA, including genomes from museum specimens as well as DNA barcoding. This will require more integrated use of biodiversity information and DNA sequence data. This integration is also of interest to molecular biologists and bioinformaticians.


Author(s):  
David Mitchell ◽  
Thomas Orrell

The Integrated Taxonomic Information System (ITIS) provides a regularly updated, global database that currently contains over 868,000 scientific names and their hierarchy. The program exists to communicate a comprehensive taxonomy of global species across 7 kingdoms that enables biodiversity information to be discovered, indexed, and connected across all human endeavors. ITIS partners with taxonomists and experts across the world to assemble scientific names and their taxonomic relationships, and then distributes that data through publicly available software. A single taxon may be represented by multiple scientific names, so ITIS makes it a priority to provide synonymy. Linking valid or accepted names with their subjective and objective synonyms is a key component of name translation and increases the precision of searches and organization of information. ITIS and its partner Species2000 create the Catalogue of Life (CoL) checklist that provides quality scientific name data for over 2.2M species. The CoL is the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Providing automated open access to complete, current, literature-referenced, and expert-validated taxonomic information enables biological data management systems, and is elemental to enhancing the utility of the amassed scientific data across the world. Fully leveraging this information for the public good is crucial for empowering the global digital society to confront the most pressing social and environmental challenges.


Sign in / Sign up

Export Citation Format

Share Document