scholarly journals Survey of Species Covered by DNA Barcoding Data in BOLD and GenBank for Integration of Data for Museomics

Author(s):  
Takeru Nakazato

DNA barcoding technology has become employed widely for biodiversity and molecular biology researchers to identify species and analyze their phylogeny. Recently, DNA metabarcoding and environmental DNA (eDNA) technology have developed by expanding the concept of DNA barcoding. These techniques analyze the diversity and quantity of organisms within an environment by detecting biogenic DNA in water and soil. It is particularly popular for monitoring fish species living in rivers and lakes (Takahara et al. 2012). BOLD Systems (Barcode of Life Database systems, Ratnasingham and Hebert 2007) is a database for DNA barcoding, archiving 8.5 million of barcodes (as of August 2020) along with the voucher specimen, from which the DNA barcode sequence is derived, including taxonomy, collected country, and museum vouchered as metadata (e.g. https://www.boldsystems.org/index.php/Public_RecordView?processid=TRIBS054-16). Also, many barcoding data are submitted to GenBank (Sayers et al. 2020), which is a database for DNA sequences managed by NCBI (National Center for Biotechnology Information, US). The number of the records of DNA barcodes, i.e. COI (cytochrome c oxidase I) gene for animal, has grown significantly (Porter and Hajibabaei 2018). BOLD imports DNA barcoding data from GenBank, and lots of DNA barcoding data in GenBank are also assigned BOLD IDs. However, we have to refer to both BOLD and GenBank data when performing DNA barcoding. I have previously investigated the registration of DNA barcoding data in GenBank, especially the association with BOLD, using insects and flowering plants as examples (Nakazato 2019). Here, I surveyed the number of species covered by BOLD and GenBank. I used fish data as an example because eDNA research is particularly focused on fish. I downloaded all GenBank files for vertebrates from NCBI FTP (File Transfer Protocol) sites (as of November 2019). Of the GenBank fish entries, 86,958 (7.3%) were assigned BOLD identifiers (IDs). The NCBI taxonomy database has registrations for 39,127 species of fish, and 20,987 scientific names at the species level (i.e., excluding names that included sp., cf. or aff.). GenBank entries with BOLD IDs covered 11,784 species (30.1%) and 8,665 species-level names (41.3%). I also obtained whole "specimens and sequences combined data" for fish from BOLD systems (as of November 2019). In the BOLD, there are 273,426 entries that are registered as fish. Of these entries, 211,589 BOLD entries were assigned GenBank IDs, i.e. with values in “genbank_accession” column, and 121,748 entries were imported from GenBank, i.e. with "Mined from GenBank, NCBI" description in "institution_storing" column. The BOLD data covered 18,952 fish species and 15,063 species-level names, but 35,500 entries were assigned no species-level names and 22,123 entries were not even filled with family-level names. At the species level, 8,067 names co-occurred in GenBank and BOLD, with 6,997 BOLD-specific names and 599 GenBank-specific names. GenBank has 425,732 fish entries with voucher IDs, of which 340,386 were not assigned a BOLD ID. Of these 340,386 entries, 43,872 entries are registrations for COI genes, which could be candidates for DNA barcodes. These candidates include 4,201 species that are not included in BOLD, thus adding these data will enable us to identify 19,863 fish to the species level. For researchers, it would be very useful if both BOLD and GenBank DNA barcoding data could be searched in one place. For this purpose, it is necessary to integrate data from the two databases. A lot of biodiversity data are recorded based on the Darwin Core standard while DNA sequencing data are sometimes integrated or cross-linked by RDF (Resource Description Framework). It may not be technically difficult to integrate these data, but the species data referenced differ from the EoL (The Encyclopedia of Life) for BOLD and the NCBI taxonomy for GenBank, and the differences in taxonomic systems make it difficult to match by scientific name description. GenBank has fields for the latitude and longitude of the specimens sampled, and Porter and Hajibabaei 2018 argue that this information should be enhanced. However, this information may be better described in the specimen and occurrence databases. The integration of barcoding data with the specimen and occurrence data will solve these problems. Most importantly, it will save the researcher from having to register the same information in multiple databases. In the field of biodiversity, only DNA barcode sequences may have been focused on and used as gene sequences. The museomics community regards museum-preserved specimens as rich resources for DNA studies because their biodiversity information can accompany the extraction and analysis of their DNA (Nakazato 2018). GenBank is useful for biodiversity studies due to its low rate of mislabelling (Leray et al. 2019). In the future, we will be working with a variety of DNA, including genomes from museum specimens as well as DNA barcoding. This will require more integrated use of biodiversity information and DNA sequence data. This integration is also of interest to molecular biologists and bioinformaticians.

Genome ◽  
2016 ◽  
Vol 59 (9) ◽  
pp. 641-660 ◽  
Author(s):  
Daniel H. Janzen ◽  
Winnie Hallwachs

The 37-year ongoing inventory of the estimated 15 000 species of Lepidoptera living in the 125 000 terrestrial hectares of Area de Conservacion Guanacaste, northwestern Costa Rica, has DNA barcode documented 11 000+ species, and the simultaneous inventory of at least 6000+ species of wild-caught caterpillars, plus 2700+ species of parasitoids. The inventory began with Victorian methodologies and species-level perceptions, but it was transformed in 2004 by the full application of DNA barcoding for specimen identification and species discovery. This tropical inventory of an extraordinarily species-rich and complex multidimensional trophic web has relied upon the sequencing services provided by the Canadian Centre for DNA Barcoding, and the informatics support from BOLD, the Barcode of Life Data Systems, major tools developed by the Centre for Biodiversity Genomics at the Biodiversity Institute of Ontario, and available to all through couriers and the internet. As biodiversity information flows from these many thousands of undescribed and often look-alike species through their transformations to usable product, we see that DNA barcoding, firmly married to our centuries-old morphology-, ecology-, microgeography-, and behavior-based ways of taxonomizing the wild world, has made possible what was impossible before 2004. We can now work with all the species that we find, as recognizable species-level units of biology. In this essay, we touch on some of the details of the mechanics of actually using DNA barcoding in an inventory.


2016 ◽  
Vol 371 (1702) ◽  
pp. 20160025 ◽  
Author(s):  
Xin Zhou ◽  
Paul B. Frandsen ◽  
Ralph W. Holzenthal ◽  
Clare R. Beet ◽  
Kristi R. Bennett ◽  
...  

DNA barcoding was intended as a means to provide species-level identifications through associating DNA sequences from unknown specimens to those from curated reference specimens. Although barcodes were not designed for phylogenetics, they can be beneficial to the completion of the Tree of Life. The barcode database for Trichoptera is relatively comprehensive, with data from every family, approximately two-thirds of the genera, and one-third of the described species. Most Trichoptera, as with most of life's species, have never been subjected to any formal phylogenetic analysis. Here, we present a phylogeny with over 16 000 unique haplotypes as a working hypothesis that can be updated as our estimates improve. We suggest a strategy of implementing constrained tree searches, which allow larger datasets to dictate the backbone phylogeny, while the barcode data fill out the tips of the tree. We also discuss how this phylogeny could be used to focus taxonomic attention on ambiguous species boundaries and hidden biodiversity. We suggest that systematists continue to differentiate between ‘Barcode Index Numbers’ (BINs) and ‘species’ that have been formally described. Each has utility, but they are not synonyms. We highlight examples of integrative taxonomy, using both barcodes and morphology for species description. This article is part of the themed issue ‘From DNA barcodes to biomes’.


2020 ◽  
Vol 21 (8) ◽  
Author(s):  
Viet The Ho ◽  
MINH PHUONG NGUYEN

Abstract. Ho VT, Nguyen MP. 2020. An in silico approach for evaluation of rbcL and matK loci for DNA barcoding of Cucurbitaceae family. Biodiversitas 21: 3879-3885. DNA barcodes have been used intensively to discriminate different species in Cucurbitaceae family. The main of this study is to evaluate the effectiveness of rbcL and matK loci for 16 species of Cucurbitaceae family by using in silico approach. For analysis, sequences were firstly retrieved from NCBI and then calculated for sequence parameters. Sequences were then aligned and constructed phylogenetic try and examined for species resolution ability. The obtained data show the variability of resolving capacity among species. rbcL region is suitable for distinguishing five species namely S. edule, M. cochinchinensis, L. aegyptiaca, C. melo, and C. pepo, whereas matK locus is more proper for different five species consisting of M. balsamina, M. cochinchinensis, M. charantia, S. edule, and C. sativus. The resolving power is improved sharply by analyzing the rbcL + matK combination with up to nine species consisting of C. lanatus, B. hispida, C. melo, C. sativus, C. pepo, C. agryrosperma, L. aegyptiaca, S. edule, and M. cochinchinensis. Therefore, the integration of rbcL and matK loci may improve the competence of assessing genetic relatedness at species level of members in Cucurbitaceae family. The obtained information could be important for choosing proper DNA barcode loci for phylogenetic study of this crop family.


Genome ◽  
2017 ◽  
Vol 60 (3) ◽  
pp. 248-259 ◽  
Author(s):  
Derek S. Sikes ◽  
Matthew Bowser ◽  
John M. Morton ◽  
Casey Bickford ◽  
Sarah Meierotto ◽  
...  

Climate change may result in ecological futures with novel species assemblages, trophic mismatch, and mass extinction. Alaska has a limited taxonomic workforce to address these changes. We are building a DNA barcode library to facilitate a metabarcoding approach to monitoring non-marine arthropods. Working with the Canadian Centre for DNA Barcoding, we obtained DNA barcodes from recently collected and authoritatively identified specimens in the University of Alaska Museum (UAM) Insect Collection and the Kenai National Wildlife Refuge collection. We submitted tissues from 4776 specimens, of which 81% yielded DNA barcodes representing 1662 species and 1788 Barcode Index Numbers (BINs), of primarily terrestrial, large-bodied arthropods. This represents 84% of the species available for DNA barcoding in the UAM Insect Collection. There are now 4020 Alaskan arthropod species represented by DNA barcodes, after including all records in Barcode of Life Data Systems (BOLD) of species that occur in Alaska — i.e., 48.5% of the 8277 Alaskan, non-marine-arthropod, named species have associated DNA barcodes. An assessment of the identification power of the library in its current state yielded fewer species-level identifications than expected, but the results were not discouraging. We believe we are the first to deliberately begin development of a DNA barcode library of the entire arthropod fauna for a North American state or province. Although far from complete, this library will become increasingly valuable as more species are added and costs to obtain DNA sequences fall.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5013 ◽  
Author(s):  
Lijuan Wang ◽  
Zhihao Wu ◽  
Mengxia Liu ◽  
Wei Liu ◽  
Wenxi Zhao ◽  
...  

Rongcheng Bay is a coastal bay of the Northern Yellow Sea, China. To investigate and monitor the fish resources in Rongcheng Bay, 187 specimens from 41 different species belonging to 28 families in nine orders were DNA-barcoded using the mitochondrial cytochrome c oxidase subunit I gene (COI). Most of the fish species could be discriminated using this COI sequence with the exception of Cynoglossus joyneri and Cynoglossus lighti. The average GC% content of the 41 fish species was 47.3%. The average Kimura 2-parameter genetic distances within the species, genera, families, and orders were 0.21%, 5.28%, 21.30%, and 23.63%, respectively. Our results confirmed that the use of combined morphological and DNA barcoding identification methods facilitated fish species identification in Rongcheng Bay, and also established a reliable DNA barcode reference library for these fish. DNA barcodes will contribute to future efforts to achieve better monitoring, conservation, and management of fisheries in this area.


Genome ◽  
2006 ◽  
Vol 49 (7) ◽  
pp. 851-854 ◽  
Author(s):  
Mehrdad Hajibabaei ◽  
Gregory AC Singer ◽  
Donal A Hickey

DNA barcoding has been recently promoted as a method for both assigning specimens to known species and for discovering new and cryptic species. Here we test both the potential and the limitations of DNA barcodes by analysing a group of well-studied organisms—the primates. Our results show that DNA barcodes provide enough information to efficiently identify and delineate primate species, but that they cannot reliably uncover many of the deeper phylogenetic relationships. Our conclusion is that these short DNA sequences do not contain enough information to build reliable molecular phylogenies or define new species, but that they can provide efficient sequence tags for assigning unknown specimens to known species. As such, DNA barcoding provides enormous potential for use in global biodiversity studies.Key words: DNA barcoding, species identification, primate, biodiversity.


2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
Rui Andrade ◽  
Ana Gonçalves ◽  
Pedro Sousa ◽  
Joana Paupério ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Dudu Özkum Yavuz ◽  
Mustapha Bulama- Modu

Aims: To review the phytomedicinal researches on endemic plants of Northern Cyprus and to assess the plants of their DNA barcoding status. Study Design: A review. Methodology: This work reviewed available and accessible original articles in EBSCO, Ovid MEDLINE®, PubMed®, ScienceDirectTM, Scopus® and Web of ScienceTM databases on phytomedicinal investigations and BOLD System, MMDBD version 1.5 and GenBank® on DNA barcodes of the endemic plants of Northern Cyprus until May, 2020. Using keywords searches related to phytochemistry, biological activity and DNA barcoding, DNA Sequences and the data obtain evaluated and the information that does not meet the inclusion criteria were excluded. We believe that this information would tentatively help researchers to ethically explore these plants for their Medicinal and Aromatic potentials. Results: Only 6 of the 20 endemic plants of Northern Cyprus were phytopharmaceutically investigated, while DNA sequences of 5 were found to be deposited in the publicly accessible databases accounting for 30% and 25% of the total plants respectively. Conclusion: Endemism is related to uniqueness in features including the phytomedicinal features, thus Northern Cyprus endemic plants hold ample of such. However the results of this review showed that only few were harnessed for their medicinal properties and hence the need for their pharmacological properties and comprehensive barcoding for proper authentication, detection of adulteration, and quality control.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Kannika Thongkhao ◽  
Veerachai Pongkittiphan ◽  
Thatree Phadungcharoen ◽  
Chayapol Tungphatthong ◽  
Santhosh Kumar J. Urumarudappa ◽  
...  

Abstract Cyanthillium cinereum (L.) H.Rob. is one of the most popular herbal smoking cessation aids currently used in Thailand, and its adulteration with Emilia sonchifolia (L.) DC. is often found in the herbal market. Therefore, the quality of the raw material must be considered. This work aimed to integrate macro- and microscopic, chemical and genetic authentication strategies to differentiate C. cinereum raw material from its adulterant. Different morphological features between C. cinereum and E. sonchifolia were simply recognized at the leaf base. For microscopic characteristics, trichome and pappus features were different between the two plants. HPTLC profiles showed a distinct band that could be used to unambiguously differentiate C. cinereum from E. sonchifolia. Four triterpenoid compounds, β-amyrin, taraxasterol, lupeol, and betulin, were identified from the distinct HPTLC band of C. cinereum. The use of core DNA barcode regions; rbcL, matK, ITS and psbA-trnH provided species-level resolution to differentiate the two plants. Taken together, the integration of macroscopic and microscopic characterization, phytochemical analysis by HPTLC and DNA barcoding distinguished C. cinereum from E. sonchifolia. The signatures of C. cinereum obtained here can help manufacturers to increase the quality control of C. cinereum raw material in commercialized smoking cessation products.


2014 ◽  
Vol 104 (4) ◽  
pp. 486-493 ◽  
Author(s):  
Y.J. Wang ◽  
Z.H. Li ◽  
S.F. Zhang ◽  
Z. Varadínová ◽  
F. Jiang ◽  
...  

AbstractSeveral species of the genus Cryptolestes Ganglbauer, 1899 (Coleoptera: Laemophloeidae) are commonly found in stored products. In this study, five species of Cryptolestes, with almost worldwide distribution, were obtained from laboratories in China, Czech Republic and the USA: Cryptolestes ferrugineus (Stephens, 1831), Cryptolestes pusillus (Schönherr, 1817), Cryptolestes turcicus (Grouvelle, 1876), Cryptolestes pusilloides (Steel & Howe, 1952) and Cryptolestes capensis (Waltl, 1834). Molecular identification based on a 658 bp fragment from the mitochondrial DNA cytochrome c oxidase subunit I (COI) was adopted to overcome some problems of morphological identification of Cryptolestes species. The utility of COI sequences as DNA barcodes in discriminating the five Cryptolestes species was evaluated on adults and larvae by analysing Kimura 2-parameter distances, phylogenetic tree and haplotype networks. The results showed that molecular approaches based on DNA barcodes were able to accurately identify these species. This is the first study using DNA barcoding to identify Cryptolestes species and the gathered DNA sequences will complement the biological barcode database.


Sign in / Sign up

Export Citation Format

Share Document