scholarly journals DNA barcodes for UK freshwater arthropod species: coverage, quality and implications

2021 ◽  
Vol 4 ◽  
Author(s):  
Liz Davidson

DNA-based identification methods have been shown to have high detection capability and reduced costs compared to traditional methods and can also enable the detection of species that might be missed using traditional methods (e.g. rare species, cryptic species, larval stages). The success of DNA-based identification is dependent on the ‘DNA barcodes’ of target species being present in a barcode reference database. In order to use DNA-based identification methods to assess and monitor UK freshwater arthropods for biodiversity and ecological quality assessments, it is vital that comprehensive reference databases are available. Incomplete reference databases result in many sequences derived from metabarcoding not being assigned to species. Two current projects aim to create collections of high-quality sequences from expertly identified specimens of UK species. The Darwin Tree of Life project aims to sequence the genomes of all the eukaryotic species in Britain and Ireland and FreshBase aims to create a genomic reference collection for UK freshwater invertebrates. The Barcode of Life Data System (BOLD) is one of the main reference databases for animal barcodes. Prioritising the sequencing of UK freshwater arthropod species that are not yet represented in BOLD, would enable more complete identification of UK freshwater biodiversity using metabarcoding and would enable the development of primers to target specific arthropod groups or species. We analysed the coverage of UK freshwater arthropod species in BOLD. Our analyses show that coverage varies between taxonomic groups and large proportions of sequences in some orders are only represented by privately stored sequences in BOLD. Analyses of intra- and inter-specific variation in sequences stored in BOLD show that misidentifications or errors can reduce the barcode gap in some species which could cause difficulties in accurately identifying sequences derived from metabarcoding. Representation in BOLD by specimens from the UK is extremely low and analyses show that high geographic variation in sequences in some species could be important for accurate DNA-based identification of UK species. Our results have implications for prioritising the sequencing of UK freshwater arthropods and for the quality control of stored sequences in order to reduce the occurrence of misidentifications and errors that could impact the accuracy of DNA-based identification.

2021 ◽  
Vol 4 ◽  
Author(s):  
François Keck ◽  
Florian Altermatt

Reference databases of sequences that have been taxonomically assigned are a key element for DNA-based identification of organisms. Accurate and complete reference databases are necessary to associate a correct taxonomic name to the sequences obtained in studies using metabarcoding. Today many research projects using DNA metabarcoding include the development of a custom reference database, often derived from large repositories like GenBank. At the same time, many projects are focussing on the development of ready-to-use databases validated by experts and targeting specific markers and taxonomic groups. While mainstream tools such as spreadsheet softwares may be suitable to manage small databases, they quickly become insufficient when the amount of data increases and validation operations become more complex. There is a clear need for providing user‐friendly and powerful tools to manipulate biological sequences and manage reference databases. The R language which is a free software and has already been adopted by many researchers to perform their analyses is highly suitable to develop such tools. In this talk, we will outline the approach we recommend to handle small- to middle-sized reference databases, currently still making the majority of projects. We will advocate that a simple tabular approach where each sequence constitutes an observation may be the most adequate. While such a single table may be less flexible and less optimized than relational databases or more complex data structures, it is easy to maintain and allows the direct use of modern dataframe centric tools. We will specifically present and discuss two R packages that can be used jointly to make reference database development more accessible and more reproducible. First, we will briefly introduce bioseq (Keck 2020) which is dedicated to biological sequence manipulation and analysis. The package implements classes and functions to make analyses of complex datasets including DNA, RNA or protein sequences as simple as possible. The strength of bioseq is to provide standard and more advanced functions to perform low level operations through a simple and consistent programming interface. Then we will present refdb, which has been developed as an environment for semi-automatic and assisted construction of reference databases. The refdb package is a reference database manager offering a set of powerful functions to import, organize, clean, filter, audit and export the data. We will outline how these two packages together can speed up reference database generation and handling, and contribute to standardization and repeatability in metabarcoding studies.


2021 ◽  
Vol 4 ◽  
Author(s):  
Cristina Claver ◽  
Oriol Canals ◽  
Naiara Rodriguez-Ezpeleta

Environmental DNA (eDNA) metabarcoding, the process of sequencing DNA collected from the environment for producing biodiversity inventories, is increasingly being applied to assess fish diversity and distribution in marine environments. Yet, the successful application of this technique deeply relies on accurate and complete reference databases used for taxonomic assignment. The most used markers for fish eDNA metabarcoding studies are the cytochrome C oxidase subunit 1 (COI), 16S ribosomal RNA (16S), the 12S ribosomal RNA (12S) and cytochrome b (cyt b) genes, whose sequences are usually retrieved from GenBank, the largest DNA sequence database that represents a worldwide public resource for genetic studies. Thus, the completeness and accuracy of GenBank is critical to derive reliable estimations from fish eDNA metabarcoding data. Here, we have i) compiled the checklist of European marine fishes, ii) performed a gap analysis of the four genes and, within COI and 12S, also of the most used barcodes for fish, and iii) developed a workflow to detect potentially incorrect records in GenBank. We found that from the 1965 species in the checklist (1761 Actinopterygii, 189 Elasmobranchii, 9 Holocephali, 4 Petromyzonti and 2 Myxini), about 70% have sequences for COI, whereas less have sequences for 12S, 16S and cyt b (45-55%). Among the species for which COI ad 12S sequences are available, about 60% and 40% have sequences covering the most used barcodes respectively. The analysis of pairwise distances between sequences revealed pairs belonging to the same species with significantly low similarity and pairs belonging to different high level taxonomic groups (class, order) with significantly large similarity. In light of this further confirmation of presence of a substantial number of incorrect records in GenBank, we propose a method for identifying and removing spurious sequences to create reliable and accurate reference databases for eDNA metabarcoding.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1736
Author(s):  
Zengchong Yang ◽  
Xiucheng Liu ◽  
Bin Wu ◽  
Ren Liu

Previous studies on Lamb wave touchscreen (LWT) were carried out based on the assumption that the unknown touch had the consistent parameters with acoustic fingerprints in the reference database. The adaptability of LWT to the variations in touch force and touch area was investigated in this study for the first time. The automatic collection of the databases of acoustic fingerprints was realized with an experimental prototype of LWT employing three pairs of transmitter–receivers. The self-adaptive updated weight coefficient of the used transmitter–receiver pairs was employed to successfully improve the accuracy of the localization model established based on a learning method. The performance of the improved method in locating single- and two-touch actions with the reference database of different parameters was carefully evaluated. The robustness of the LWT to the variation of the touch force varied with the touch area. Moreover, it was feasible to locate touch actions of large area with reference databases of small touch areas as long as the unknown touch and the reference databases met the condition of equivalent averaged stress.


2021 ◽  
Vol 168 (6) ◽  
Author(s):  
Ann Bucklin ◽  
Katja T. C. A. Peijnenburg ◽  
Ksenia N. Kosobokova ◽  
Todd D. O’Brien ◽  
Leocadio Blanco-Bercial ◽  
...  

AbstractCharacterization of species diversity of zooplankton is key to understanding, assessing, and predicting the function and future of pelagic ecosystems throughout the global ocean. The marine zooplankton assemblage, including only metazoans, is highly diverse and taxonomically complex, with an estimated ~28,000 species of 41 major taxonomic groups. This review provides a comprehensive summary of DNA sequences for the barcode region of mitochondrial cytochrome oxidase I (COI) for identified specimens. The foundation of this summary is the MetaZooGene Barcode Atlas and Database (MZGdb), a new open-access data and metadata portal that is linked to NCBI GenBank and BOLD data repositories. The MZGdb provides enhanced quality control and tools for assembling COI reference sequence databases that are specific to selected taxonomic groups and/or ocean regions, with associated metadata (e.g., collection georeferencing, verification of species identification, molecular protocols), and tools for statistical analysis, mapping, and visualization. To date, over 150,000 COI sequences for ~ 5600 described species of marine metazoan plankton (including holo- and meroplankton) are available via the MZGdb portal. This review uses the MZGdb as a resource for summaries of COI barcode data and metadata for important taxonomic groups of marine zooplankton and selected regions, including the North Atlantic, Arctic, North Pacific, and Southern Oceans. The MZGdb is designed to provide a foundation for analysis of species diversity of marine zooplankton based on DNA barcoding and metabarcoding for assessment of marine ecosystems and rapid detection of the impacts of climate change.


Author(s):  
Hatice Çiğdem ZAĞRA ◽  
Sibel ÖZDEN

Aim: This study aims to comparatively evaluate the use potential of orthophoto images obtained by terrestrial laser scanning technologies on an urban scale through the "Old Lapseki Finds Life Project" prepared using terrestrial laser scanning technologies and the "Enez Historical City Square Project" prepared using traditional methods. Method: In the study, street improvement projects of 29.210 m2 Lapseki and 29.214 m2 Enez city designed on an urban scale were evaluated and compared with descriptive statistics based on different parameters. Results: In the study, it has been determined that terrestrial laser (point cloud) technologies are 99,9% accurate when compared to traditional methods, save time by 83,08% and reduce workforce by 80%. In addition, it has been determined that terrestrial laser scanning technologies accelerate project processes compared to traditional methods. Conclusion: In this study, the use of laser scanning technologies, which are basically reverse engineering applications, in architectural restoration projects, determination of the current situation and damage, architectural documentation of structures and preparation of three-dimensional models, in terms of efficiency in survey studies are evaluated. It has been observed that orthophoto images obtained by terrestrial laser scanning technologies in architectural relief-restoration-restitution projects have potentials' worth using in different stages of the project.


2017 ◽  
Author(s):  
Zhemin Zhou ◽  
Nina Luhmann ◽  
Nabil-Fareed Alikhan ◽  
Christopher Quince ◽  
Mark Achtman

AbstractExploring the genetic diversity of microbes within the environment through metagenomic sequencing first requires classifying these reads into taxonomic groups. Current methods compare these sequencing data with existing biased and limited reference databases. Several recent evaluation studies demonstrate that current methods either lack sufficient sensitivity for species-level assignments or suffer from false positives, overestimating the number of species in the metagenome. Both are especially problematic for the identification of low-abundance microbial species, e. g. detecting pathogens in ancient metagenomic samples. We present a new method, SPARSE, which improves taxonomic assignments of metagenomic reads. SPARSE balances existing biased reference databases by grouping reference genomes into similarity-based hierarchical clusters, implemented as an efficient incremental data structure. SPARSE assigns reads to these clusters using a probabilistic model, which specifically penalizes non-specific mappings of reads from unknown sources and hence reduces false-positive assignments. Our evaluation on simulated datasets from two recent evaluation studies demonstrated the improved precision of SPARSE in comparison to other methods for species-level classification. In a third simulation, our method successfully differentiated multiple co-existing Escherichia coli strains from the same sample. In real archaeological datasets, SPARSE identified ancient pathogens with ≤ 0.02% abundance, consistent with published findings that required additional sequencing data. In these datasets, other methods either missed targeted pathogens or reported non-existent ones. SPARSE and all evaluation scripts are available at https://github.com/zheminzhou/SPARSE.


Author(s):  
Nicole Foster ◽  
Kor-jent Dijk ◽  
Ed Biffin ◽  
Jennifer Young ◽  
Vicki Thomson ◽  
...  

A proliferation in environmental DNA (eDNA) research has increased the reliance on reference sequence databases to assign unknown DNA sequences to known taxa. Without comprehensive reference databases, DNA extracted from environmental samples cannot be correctly assigned to taxa, limiting the use of this genetic information to identify organisms in unknown sample mixtures. For animals, standard metabarcoding practices involve amplification of the mitochondrial Cytochrome-c oxidase subunit 1 (CO1) region, which is a universally amplifyable region across majority of animal taxa. This region, however, does not work well as a DNA barcode for plants and fungi, and there is no similar universal single barcode locus that has the same species resolution. Therefore, generating reference sequences has been more difficult and several loci have been suggested to be used in parallel to get to species identification. For this reason, we developed a multi-gene targeted capture approach to generate reference DNA sequences for plant taxa across 20 target chloroplast gene regions in a single assay. We successfully compiled a reference database for 93 temperate coastal plants including seagrasses, mangroves, and saltmarshes/samphire’s. We demonstrate the importance of a comprehensive reference database to prevent species going undetected in eDNA studies. We also investigate how using multiple chloroplast gene regions impacts the ability to discriminate between taxa.


NeoBiota ◽  
2021 ◽  
Vol 70 ◽  
pp. 151-165
Author(s):  
Francesco Zangaro ◽  
Benedetta Saccomanno ◽  
Eftychia Tzafesta ◽  
Fabio Bozzeda ◽  
Valeria Specchia ◽  
...  

The biodiversity of the Mediterranean Sea is currently threatened by the introduction of Non-Indigenous Species (NIS). Therefore, monitoring the distribution of NIS is of utmost importance to preserve the ecosystems. A promising approach for the identification of species and the assessment of biodiversity is the use of DNA barcoding, as well as DNA and eDNA metabarcoding. Currently, the main limitation in the use of genomic data for species identification is the incompleteness of the DNA barcode databases. In this research, we assessed the availability of DNA barcodes in the main reference libraries for the most updated inventory of 665 confirmed NIS in the Mediterranean Sea, with a special focus on the cytochrome oxidase I (COI) barcode and primers. The results of this study show that there are no barcodes for 33.18% of the species in question, and that 45.30% of the 382 species with COI barcode, have no primers publicly available. This highlights the importance of directing scientific efforts to fill the barcode gap of specific taxonomic groups in order to help in the effective application of the eDNA technique for investigating the occurrence and the distribution of NIS in the Mediterranean Sea.


Zootaxa ◽  
2008 ◽  
Vol 1691 (1) ◽  
pp. 67 ◽  
Author(s):  
M. ALEX SMITH

The 5' end (Folmer or Barcode region) of cytochrome c oxidase 1 (CO1) has been proposed as the gene region of choice for a standardized animal DNA barcode (Hebert et al. 2003). Concerns have been raised regarding the decision to utilize this particular mitochondrial gene region as a barcode. Nevertheless, widely divergent taxonomic groups have reported success using CO1 for both species identification and discovery. The utility of CO1 for barcoding amphibians was raised early on (Vences, et al. 2005) and concerns for this group were reported widely (Waugh 2007)—although some considered that the reporting of the concerns outstripped the data that had been analyzed at that point (Smith et al. 2008). Indeed, our analysis of CO1 for a small group of Holarctic amphibians was neither more difficult to generate nor to analyze than for other groups where we have utilized the technique.


2020 ◽  
Vol 11 ◽  
Author(s):  
Paul E. Smith ◽  
Sinead M. Waters ◽  
Ruth Gómez Expósito ◽  
Hauke Smidt ◽  
Ciara A. Carberry ◽  
...  

Our understanding of complex microbial communities, such as those residing in the rumen, has drastically advanced through the use of high throughput sequencing (HTS) technologies. Indeed, with the use of barcoded amplicon sequencing, it is now cost effective and computationally feasible to identify individual rumen microbial genera associated with ruminant livestock nutrition, genetics, performance and greenhouse gas production. However, across all disciplines of microbial ecology, there is currently little reporting of the use of internal controls for validating HTS results. Furthermore, there is little consensus of the most appropriate reference database for analyzing rumen microbiota amplicon sequencing data. Therefore, in this study, a synthetic rumen-specific sequencing standard was used to assess the effects of database choice on results obtained from rumen microbial amplicon sequencing. Four DADA2 reference training sets (RDP, SILVA, GTDB, and RefSeq + RDP) were compared to assess their ability to correctly classify sequences included in the rumen-specific sequencing standard. In addition, two thresholds of phylogenetic bootstrapping, 50 and 80, were applied to investigate the effect of increasing stringency. Sequence classification differences were apparent amongst the databases. For example the classification of Clostridium differed between all databases, thus highlighting the need for a consistent approach to nomenclature amongst different reference databases. It is hoped the effect of database on taxonomic classification observed in this study, will encourage research groups across various microbial disciplines to develop and routinely use their own microbiome-specific reference standard to validate analysis pipelines and database choice.


Sign in / Sign up

Export Citation Format

Share Document