scholarly journals Mind the gap-analysis! – How complete are DNA barcode reference libraries for monitoring-relevant aquatic species in Europe?

2021 ◽  
Vol 4 ◽  
Author(s):  
Hannah Weigand

Molecular species identification with DNA metabarcoding can potentially accelerate, streamline and standardise biomonitoring routines. Currently, it is tested how this new technique can be implemented for the European Water Framework Directive (WFD) and the European Marine Strategy Framework Directive (MSFD). To connect the results from DNA metabarcoding with the current monitoring routines, an extensive, high-quality DNA barcode reference database is required. Hence, a gap-analysis of the Barcode of Life Data Systems (BOLD) was performed as part of the EU-COST Action DNAqua-Net (Weigand et al. 2019), which was updated in 2021. It aimed to analyse the completeness of BOLD for species on the national WFD monitoring lists and for marine species on the ERMS (European Register of Marine Species) and AMBI (AZTI Marine Biotic Index) lists. The data were supplemented by MitoFish for freshwater fish and Diat.barcode for diatoms. Several thousands of species were included in the gap-analysis, although not all countries currently apply species-level data for all WFD biological quality elements. The barcode coverage of the different taxonomic groups varied strongly, with high levels (> 80%) for fish and freshwater vascular plants, and low levels for diatoms and freshwater plathelminths (< 15%). As a general pattern, species monitored by several countries had a higher coverage compared to those monitored only by a single country. The gap-analysis focused additionally on the availability of metadata (e.g., geographical origin of the specimen or determiner name) for the barcodes. Hence, we analysed if the data were stored public (with access to metadata) or private (without access to metadata) in BOLD or if the data were mined from GenBank (metadata are potentially available but not easy to access). Although public data were stored for many species (43% of freshwater macroinvertebrates and 21% of AMBI marine species), the proportion of species without public metadata was not neglectable (22% of freshwater macroinvertebrates and 22% of AMBI marine species). Another issue that emerged from the gap-analysis was that several deposited barcodes were identified by reverse taxonomy (RT), i.e., specimens were molecularly identified via its DNA barcode and the barcode itself is stored in BOLD with the associated species name. This can be problematic as originally misidentified samples can lead to false RT-identifications, making the data appear more trustworthy than it actually is. For the analysed freshwater macroinvertebrates, 39% of all barcodes and 65% of all public data originated from RT, impacting 11% of all monitored species. As the information about RT is only available for publicly stored data, the real impact of RT might even be higher.

2019 ◽  
Author(s):  
Hannah Weigand ◽  
Arne J. Beermann ◽  
Fedor Čiampor ◽  
Filipe O. Costa ◽  
Zoltán Csabai ◽  
...  

AbstractEffective identification of species using short DNA fragments (DNA barcoding and DNA metabarcoding) requires reliable sequence reference libraries of known taxa. Both taxonomically comprehensive coverage and content quality are important for sufficient accuracy. For aquatic ecosystems in Europe, reliable barcode reference libraries are particularly important if molecular identification tools are to be implemented in biomonitoring and reports in the context of the EU Water Framework Directive (WFD) and the Marine Strategy Framework Directive (MSFD). We analysed gaps in the two most important reference databases, Barcode of Life Data Systems (BOLD) and NCBI GenBank, with a focus on the taxa most frequently used in WFD and MSFD. Our analyses show that coverage varies strongly among taxonomic groups, and among geographic regions. In general, groups that were actively targeted in barcode projects (e.g. fish, true bugs, caddisflies and vascular plants) are well represented in the barcode libraries, while others have fewer records (e.g. marine molluscs, ascidians, and freshwater diatoms). We also found that species monitored in several countries often are represented by barcodes in reference libraries, while species monitored in a single country frequently lack sequence records. A large proportion of species (up to 50%) in several taxonomic groups are only represented by private data in BOLD. Our results have implications for the future strategy to fill existing gaps in barcode libraries, especially if DNA metabarcoding is to be used in the monitoring of European aquatic biota under the WFD and MSFD. For example, missing species relevant to monitoring in multiple countries should be prioritized. We also discuss why a strategy for quality control and quality assurance of barcode reference libraries is needed and recommend future steps to ensure full utilization of metabarcoding in aquatic biomonitoring.


2020 ◽  
Author(s):  
Marie Hoffmann ◽  
Michael T. Monaghan ◽  
Knut Reinert

AbstractMotivationDNA metabarcoding is a commonly applied technique used to infer the species composition of environmental samples. These samples can comprise hundreds of organisms that can be closely or very distantly related in the taxonomic tree of life. DNA metabarcoding combines polymerase chain reaction (PCR) and next-generation sequencing (NGS), whereby a short, homologous sequence of DNA is amplified and sequenced from all members of the community. Sequences are then taxonomically identified based on their match to a reference database. Ideally, each species of interest would have a unique DNA barcode. This short, variable sequence needs to be flanked by relatively conserved regions that can be used as primer binding sites. Appropriate PCR primer pairs would match to a broad evolutionary range of taxa, such that we only need a few to achieve high taxonomic coverage. At the same time however, the DNA barcodes between primer pairs should be different to allow us to distinguish between species to improve resolution. This poses an interesting optimization problem. More specifically: Given a set of references ℛ = {R1, R2, …, Rm}, the problem is to find a primer set P balancing both: high taxonomic coverage and high resolution. This goal can be captured by filtering for frequent primers and ranking by coverage or variation, i.e. the number of unique barcodes. Here we present the software PriSeT, an offline primer discovery tool that is capable of processing large libraries and is robust against mislabeled or low quality references. It tackles the computationally expensive steps with linear runtime filters and efficient encodings.ResultsWe first evaluated PriSeT on references (mostly 18S rRNA genes) from 19 clades covering eukaryotic organisms that are typical for freshwater plankton samples. PriSeT recovered several published primer sets as well as additional, more chemically suitable primer sets. For these new sets, we compared frequency, taxon coverage, and amplicon variation with published primer sets. For 11 clades we found de novo primer pairs that cover more taxa than the published ones, and for six clades de novo primers resulted in greater sequence (i.e., DNA barcode) variation. We also applied PriSeT to 19 SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in other taxa. These primer sets would be suitable for empirical testing.Availabilityhttps://github.com/mariehoffmann/[email protected]


2021 ◽  
Vol 4 ◽  
Author(s):  
Valeria Specchia ◽  
Francesco Zangaro ◽  
Eftychia Tzafesta ◽  
Maurizio Pinna

DNA metabarcoding for the identification of species and ecosystem biomonitoring is a promising innovative approach. The applicability of this tool is at first dependent on the coverage of the DNA sequence reference libraries. We performed a gap analysis of available DNA barcodes in the international databases using the aquatic macroinvertebrate species checklist of the Apulia region in southeast Italy. Our analyses show that 42% of the 1546 examined species do not have representative DNA barcodes in the reference libraries, indicating the importance of working toward their completeness and addressing this effort toward specific taxonomic groups in particular at local/regional level. The DNA-barcode coverage also varies among different taxonomic groups and aquatic ecosystem types in which a large number of species are rare. We also analyzed the DNA barcode reference libraries for the primer set used to barcode species. Only for 52% of the examined barcoded species were the primers reported, indicating the importance of uploading this information in the databases for a more extensive use of the DNA metabarcoding. We also highlighted the opportunity to develop combinations of primers useful at the regional level. We tested the application of the DNA barcoding single species to a lagoon ecosystem (the lagoon named “Aquatina di Frigole” in the Apulia region) which are richer in humic substances than other aquatic environments and in which DNA metabarcoding remains under explored.


Author(s):  
Nicole Foster ◽  
Kor-jent Dijk ◽  
Ed Biffin ◽  
Jennifer Young ◽  
Vicki Thomson ◽  
...  

A proliferation in environmental DNA (eDNA) research has increased the reliance on reference sequence databases to assign unknown DNA sequences to known taxa. Without comprehensive reference databases, DNA extracted from environmental samples cannot be correctly assigned to taxa, limiting the use of this genetic information to identify organisms in unknown sample mixtures. For animals, standard metabarcoding practices involve amplification of the mitochondrial Cytochrome-c oxidase subunit 1 (CO1) region, which is a universally amplifyable region across majority of animal taxa. This region, however, does not work well as a DNA barcode for plants and fungi, and there is no similar universal single barcode locus that has the same species resolution. Therefore, generating reference sequences has been more difficult and several loci have been suggested to be used in parallel to get to species identification. For this reason, we developed a multi-gene targeted capture approach to generate reference DNA sequences for plant taxa across 20 target chloroplast gene regions in a single assay. We successfully compiled a reference database for 93 temperate coastal plants including seagrasses, mangroves, and saltmarshes/samphire’s. We demonstrate the importance of a comprehensive reference database to prevent species going undetected in eDNA studies. We also investigate how using multiple chloroplast gene regions impacts the ability to discriminate between taxa.


2021 ◽  
Vol 4 ◽  
Author(s):  
Frederic Rimet ◽  
Teofana Chonova ◽  
Gilles Gassiole ◽  
Maria Kahlert ◽  
François Keck ◽  
...  

Diatoms (Bacillariophyta) are ubiquitous microalgae, which present a huge taxonomic diversity, changing in correlation with differing environmental conditions. This makes them excellent ecological indicators for various ecosystems and ecological problematics (ecotoxicology, biomonitoring, paleo-environmental reconstruction …). Current standardized methodologies for diatoms are based on microscopic determinations, which is time consuming and prone to identification uncertainties. DNA metabarcoding has been proposed as a way to avoid these flaws, enabling the sequencing of a large quantity of barcodes from natural samples. A taxonomic identity is given to these barcodes by comparing their sequences to a barcoding reference library. However, to identify environmental sequences correctly, the reference database should contain a representative number of reference sequences to ensure a good coverage of diatom diversity. Moreover, the reference database needs to be carefully taxonomically curated by experts, as its content has an obvious impact on species detection. Diat.barcode is an open-access library for diatoms linking diatom taxonomic identities to rbcL barcode sequences (a chloroplast marker suitable for species-level identification of diatoms), which has been maintained since 2012. Data are accumulated from three sources: (1) the NCBI nucleotide database, (2) unpublished sequencing data of culture collections and more recently (3) environmental sequences. Since 2017, an international network of experts in diatom taxonomy curate this library. The last version of the database (version 9.2), includes 8066 entries that correspond to more than 280 different genera and 1490 different species. In addition to the taxonomic information, morphological features (e.g. biovolumes, chloroplasts, etc.), life-forms (mobility, colony-type) and ecological features (taxa preferences to pollution) are given. The database can be downloaded from the website (www6.inrae.fr/carrtel-collection/Barcoding-database/) or directly through the R package diatbarcode. Ready-to-use files for commonly used metabarcoding pipelines (Mothur and DADA2) are also available.


2021 ◽  
Vol 4 ◽  
Author(s):  
Andreia Mortágua ◽  
Marco Teixeira ◽  
Manuela Sales ◽  
Maria Feio ◽  
Salomé Almeida

The European Water Framework Directive (2000/60/EC) includes biological assessment of water bodies that has been implemented for many years. Indicator organisms such as diatoms respond to geological and hydrological features of rivers by modifying their structure. Therefore, when implementing the WFD, it was necessary to establish type-specific reference conditions to be able to measure the deviations of sampled communities due to anthropogenic impact.HTS-related eDNA metabarcoding has been developed to complement or even replace traditional approaches for its rapid, low-cost and highly accurate identification of communities for assessment of rivers’ ecological status (e.g. Mortágua et al., 2019; Pérez-Burillo et al. 2020) and proved to provide even more in-depth information about biological elements. The use of this information without assignment to species is being addressed once it eliminates the limiting factor of the reference database incompleteness and may provide new ecological information (e.g. Feio et al., 2020; Rivera et al., 2020). Since WFD requires the establishment of reference conditions for each water body type, for eDNA methods’ implementation it will be essential to review, confirm or reformulate, and perhaps create new typologies. Hereupon, the aim of this study is to analyze diatom communities from different typologies of Portuguese rivers resulting from DNA metabarcoding data and compare it with current typology system. To do so, we will verify the consistency of biological groups included in each type, validate the molecular data, analyze the correspondence of OTU/ISU/ESV to environmental characteristics of rivers. A total of 154 sampling sites were selected from central Portugal and northern Portugal in 2017 and 2019. The biofilm was collected for morphological identification and DNA sequencing of diatoms. Reference sites were selected for 4 river types (mountain, littoral, small and medium-large northern rivers) based on a set of pressure information (water quality, hydromorphology, land use and riparian zones). Diatom inventories were obtained from molecular and morphological analysis. DNA sequences were treated using Mothur software which processed two bioinformatic strategies in order to obtain the final ISU and OTU tables, while ESVs were treated with DADA2 package from R. Identification and counting of diatom valves took place under the light microscope concerning the morphological approach. We expect results to validate the molecular data for each typology either when assigning to species or not, and to understand whether it is necessary to establish new typologies for future use of the molecular approach in ecological assessment of rivers. Directive, W. F. (2000). Water Framework Directive. Journal reference OJL, 327, 1-73. Feio, M. J., Serra, S. R., Mortágua, A., Bouchez, A., Rimet, F., Vasselon, V., & Almeida, S. F. P. (2020). A taxonomy-free approach based on machine learning to assess the quality of rivers with diatoms. Science of the Total Environment, 722, 137900. https://doi.org/10.1016/j.scitotenv.2020.137900 Mortágua, A., Vasselon, V., Oliveira, R., Elias, C., Chardon, C., Bouchez, A., ... & Almeida, S. F. P. (2019). Applicability of DNA metabarcoding approach in the bioassessment of Portuguese rivers using diatoms. Ecological indicators, 106, 105470. https://doi.org/10.1016/j.ecolind.2019.105470 Pérez-Burillo, J., Trobajo, R., Vasselon, V., Rimet, F., Bouchez, A., & Mann, D. G. (2020). Evaluation and sensitivity analysis of diatom DNA metabarcoding for WFD bioassessment of Mediterranean rivers. Science of the Total Environment, 727, 138445. https://doi.org/10.1016/j.scitotenv.2020.138445 Rivera, S. F., Vasselon, V., Bouchez, A., & Rimet, F. (2020). Diatom metabarcoding applied to large scale monitoring networks: Optimization of bioinformatics strategies using Mothur software. Ecological indicators, 109, 105775. https://doi.org/10.1016/j.ecolind.2019.105775


Author(s):  
Aija Staffans ◽  
Heli Rantanen ◽  
Pilvi Nummi

The Internet is shaking up the expertise and production of knowledge in the planning institution. Digital citizens are searching for information from different places, combining formal and informal sources without apology, and are debating and speaking out on matters. Public planning organisations will be fully stretched to adapt their practices and services to meet these demands. This chapter will present the research results of a project that embarked on gathering and combining local information and knowledge on urban planning on Internet forums. Interactive applications were also developed for these forums to support public participation in ongoing land use planning and development projects in the City of Espoo, Finland. The research results demonstrate how fragmented local, place-based knowledge is, how difficult it is to combine informal and formal information in urban planning, and how inaccessible public data systems still are.


Genome ◽  
2019 ◽  
Vol 62 (3) ◽  
pp. 160-169 ◽  
Author(s):  
Wieland Meyer ◽  
Laszlo Irinyi ◽  
Minh Thuy Vi Hoang ◽  
Vincent Robert ◽  
Dea Garcia-Hermoso ◽  
...  

With new or emerging fungal infections, human and animal fungal pathogens are a growing threat worldwide. Current diagnostic tools are slow, non-specific at the species and subspecies levels, and require specific morphological expertise to accurately identify pathogens from pure cultures. DNA barcodes are easily amplified, universal, short species-specific DNA sequences, which enable rapid identification by comparison with a well-curated reference sequence collection. The primary fungal DNA barcode, ITS region, was introduced in 2012 and is now routinely used in diagnostic laboratories. However, the ITS region only accurately identifies around 75% of all medically relevant fungal species, which has prompted the development of a secondary barcode to increase the resolution power and suitability of DNA barcoding for fungal disease diagnostics. The translational elongation factor 1α (TEF1α) was selected in 2015 as a secondary fungal DNA barcode, but it has not been implemented into practice, due to the absence of a reference database. Here, we have established a quality-controlled reference database for the secondary barcode that together with the ISHAM-ITS database, forms the ISHAM barcode database, available online at http://its.mycologylab.org/ . We encourage the mycology community for active contributions.


Author(s):  
Christer Erséus ◽  
Sebastian Kvist

Intra- and interspecific variation in a 658 bp long part of the cytochrome c oxidase subunit 1 (COI) gene of the mitochondrial genome, i.e. a suggested ‘DNA barcode’, was assessed in four north-west European species of the marine tubificid genus Tubificoides: T. benedii, T. amplivasatus, T. heterochaetus and T. kozloffi. Within species mean genetic distance was from 0.10% (T. amplivasatus) to 0.14% (T. benedii), between species from 19.3% to 22.9%. For T. benedii and T. amplivasatus, material collected in two separate areas, The Sound between Denmark and Sweden, and the Koster area about 330 km to the north along the Swedish west coast, showed a geographically random distribution of COI haplotypes, suggesting that each of these two species forms a continuous population in southern Scandinavia. We conclude that the COI gene is suitable as a barcode marker for the secure identification of these species, at least within the area investigated. Tubificoides heterochaetus is reported for the first time from Denmark.


2016 ◽  
Author(s):  
Jonathan A Coddington ◽  
Ingi Agnarsson ◽  
Ren-Chung Cheng ◽  
Klemen Čandek ◽  
Amy Driskell ◽  
...  

The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level identifications. We used BLAST queries of each sequence against the entire library and got the top ten hits resulting in 8160 hits. The percent sequence identity was reported from these hits (PIdent, range 75-100%). Accurate identification (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values > 95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all identifications were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades.


Sign in / Sign up

Export Citation Format

Share Document