Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner"

Mock Community ◽

Freshwater Invertebrate ◽

Dna Metabarcoding ◽

High Base

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. However, typically used barcoding genes, such as the cytochrome c oxidase subunit I (COI) region for animals, are highly variable. Thus, different taxa in communities under study are often not amplified equally well and some might even remain undetected due to primer bias. To reduce these problems, optimized metabarcoding primers for the typical communities found in certain geographic regions- and/or ecosystems are necessary. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxonomic groups and then applies sequence clustering to reduce biases introduced by the different number of available sequences per species. We downloaded COI data for the 15 most relevant freshwater invertebrate groups for stream ecosystem assessment and developed four primer sets with high base degeneracy based on that. Primer performance was tested by sequencing ten mock community samples each consisting of 52 freshwater invertebrate taxa. Additionally, we used PrimerMiner to evaluate the developed primers against other metabarcoding primers in silico. 3) The developed primers varied in amplification efficiency and the amount of detected taxa, yet all retrieved more taxa than standard Folmer barcoding primers. Additionally, the BF/BR primers amplified taxa very consistently, with the BF2+BR2 and BF2+BR1 primer combinations showing better amplification than a previously tested ribosomal marker (16S). Except for the BF1+BR1 primers all BF/BR primers combinations detected all 42 insect taxa present in the mock community samples. In silico evaluation of the developed primers demonstrates their suitability for metabarcoding of non-aquatic insect samples. 4) With PrimerMiner we provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. Our sequence datasets generated with the newly developed metabarcoding primers demonstrate that the design of optimized primers with high base degeneracy is superior to classical markers and enables us to detect almost 100% of animal taxa present in a sample using the standard COI barcoding gene. Therefore, the PrimerMiner package and the developed primers are useful beyond biodiversity assessment in aquatic ecosystems.

PrimerMiner: an R package for development and in silico validation of DNA metabarcoding primers

10.7287/peerj.preprints.2352 ◽

2016 ◽

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Dna Barcode ◽

R Package ◽

Primer Design ◽

Reference Sequence ◽

Gene Marker ◽

Sequence Alignments ◽

Variable Binding ◽

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. Its success is often limited due to variable binding sites that introduce amplification biases. Thus the development of optimized primers for communities or taxa under study in a certain geographic region and/or ecosystems is of critical importance. However, no tool for obtaining and processing of reference sequence data in bulk that serve as a backbone for primer design is currently available. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxonomic groups and then applies sequence clustering into operational taxonomic units (OTUs) to reduce biases introduced by the different number of available sequences per species. Additionally, PrimerMiner offers functionalities to evaluate primers in silico, which are in our opinion more realistic then the strategy employed in another available software for that purpose, ecoPCR. 3) We used PrimerMiner to download cytochrome c oxidase subunit I (COI) sequences for 15 important freshwater invertebrate groups, relevant for ecosystem assessment. By processing COI markers from both databases, we were able to increase the amount of reference data 249-fold on average, compared to using complete mitochondrial genomes alone. Furthermore, we visualized the generated OTU sequence alignments and describe how to evaluate primers in silico using PrimerMiner. 4) With PrimerMiner we provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. The OTU based reference alignments generated with PrimerMiner can be used for manual primer design, or processed with bioinformatic tools for primer development.

PrimerMiner: an R package for development and in silico validation of DNA metabarcoding primers

10.7287/peerj.preprints.2352v1 ◽

2016 ◽

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Dna Barcode ◽

R Package ◽

Primer Design ◽

Reference Sequence ◽

Gene Marker ◽

Sequence Alignments ◽

Variable Binding ◽

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. Its success is often limited due to variable binding sites that introduce amplification biases. Thus the development of optimized primers for communities or taxa under study in a certain geographic region and/or ecosystems is of critical importance. However, no tool for obtaining and processing of reference sequence data in bulk that serve as a backbone for primer design is currently available. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxonomic groups and then applies sequence clustering into operational taxonomic units (OTUs) to reduce biases introduced by the different number of available sequences per species. Additionally, PrimerMiner offers functionalities to evaluate primers in silico, which are in our opinion more realistic then the strategy employed in another available software for that purpose, ecoPCR. 3) We used PrimerMiner to download cytochrome c oxidase subunit I (COI) sequences for 15 important freshwater invertebrate groups, relevant for ecosystem assessment. By processing COI markers from both databases, we were able to increase the amount of reference data 249-fold on average, compared to using complete mitochondrial genomes alone. Furthermore, we visualized the generated OTU sequence alignments and describe how to evaluate primers in silico using PrimerMiner. 4) With PrimerMiner we provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. The OTU based reference alignments generated with PrimerMiner can be used for manual primer design, or processed with bioinformatic tools for primer development.

Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment

10.7287/peerj.preprints.2044v5 ◽

2017 ◽

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Human Impacts ◽

Biodiversity Loss ◽

Freshwater Ecosystems ◽

Mock Community ◽

Sequence Alignments ◽

Freshwater Invertebrate ◽

A central challenge in the present era of biodiversity loss is to assess and manage human impacts on freshwater ecosystems. Macroinvertebrates are an important group for bioassessment as many taxa show specific responses to environmental conditions. However, generating accurate macroinvertebrate inventories based on larval morphology is difficult and error-prone. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify invertebrates in bulk samples to the species level, has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for bioassessment critically relies on carefully evaluating primers. We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 most globally relevant freshwater invertebrate groups for stream assessment. Using these sequence alignments, we developed four primer combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples, each consisting of 52 freshwater invertebrate taxa. Additionally, popular metabarcoding primers from the literature and the developed primers were tested in silico against the 15 relevant invertebrate groups. The developed primers varied in amplification efficiency and the number of detected taxa, yet all detected more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature. We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that some primers currently used in metabarcoding studies may not be suitable for amplification of freshwater macroinvertebrates. Therefore, careful primer evaluation and more region / ecosystem specific primers are needed before DNA metabarcoding can be used for routine bioassessment of freshwater ecosystems.

Validation and development of freshwater invertebrate metabarcoding COI primers for Environmental Impact Assessment

10.7287/peerj.preprints.2044v4 ◽

2017 ◽

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Human Impacts ◽

Biodiversity Loss ◽

Freshwater Ecosystems ◽

Mock Community ◽

Sequence Alignments ◽

Freshwater Invertebrate ◽

A central challenge in the present era of biodiversity loss is to assess and manage human impacts on freshwater ecosystems. Macroinvertebrates are an ideal group for Environmental Impacts Assessment (EIA). However, generating accurate macroinvertebrate inventories based on larval morphology is difficult and error-prone. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify invertebrates in bulk samples at the species level, has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for EIA critically relies on carefully evaluating primers. We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 most globally relevant freshwater invertebrate groups in EIAs. Using these sequence alignments, we developed four primer combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples each consisting of 52 freshwater invertebrate taxa. Additionally, popular metabarcoding primers from the literature and the developed primers were tested in silico against the 15 relevant invertebrate groups. The developed primers varied in amplification efficiency and the number of detected taxa, yet all detected more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature. We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that some primers currently used in metabarcoding studies may not be suitable for amplification of insect and freshwater taxa. Therefore, careful primer evaluation and more region / ecosystem specific primers are needed before DNA metabarcoding can be used for routine EIA of freshwater ecosystems.

Validation and development of freshwater invertebrate metabarcoding COI primers for Environmental Impact Assessment

10.7287/peerj.preprints.2044v3 ◽

2016 ◽

Cited By ~ 1

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Human Impacts ◽

Biodiversity Loss ◽

Freshwater Ecosystems ◽

Mock Community ◽

Sequence Alignments ◽

Freshwater Invertebrate ◽

To assess and manage human impacts on freshwater ecosystems is a central challenge in the present era of biodiversity loss. Macroinvertebrates are an ideal group for Environmental Impacts Assessment (EIA). However, generating accurate macroinvertebrate inventories is difficult and error-prone based on larval morphology. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify bulk invertebrates on species level has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for EIA critically relies on carefully evaluated primers. We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 globally most relevant freshwater invertebrate groups in EIAs. Using these sequence alignments, we developed four primers combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples each consisting of 52 freshwater invertebrate taxa. Additionally popular metabarcoding primers from the literature and the here developed primers were tested in silico against the 15 relevant invertebrate groups. The developed primers varied in amplification efficiency and the amount of detected taxa, yet all retrieved more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature. We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that not all primers currently used in metabarcoding studies are likely not suitable for amplification of insect and freshwater taxa. Thus careful primer evaluation and more region / ecosystem specific primers might be needed, before DNA metabarcoding can be used for routine EIA of freshwater ecosystems.

Validation and development of COI metabarcoding primers for freshwater macroinvertebrate bioassessment

10.7287/peerj.preprints.2044 ◽

2017 ◽

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Human Impacts ◽

Biodiversity Loss ◽

Freshwater Ecosystems ◽

Mock Community ◽

Sequence Alignments ◽

Freshwater Invertebrate ◽

A central challenge in the present era of biodiversity loss is to assess and manage human impacts on freshwater ecosystems. Macroinvertebrates are an important group for bioassessment as many taxa show specific responses to environmental conditions. However, generating accurate macroinvertebrate inventories based on larval morphology is difficult and error-prone. Here, DNA metabarcoding provides new opportunities. Its potential to accurately identify invertebrates in bulk samples to the species level, has been demonstrated in several case studies. However, DNA based identification is often limited by primer bias, potentially leading to taxa in the sample remaining undetected. Thus, the success of DNA metabarcoding as an emerging technique for bioassessment critically relies on carefully evaluating primers. We used the R package PrimerMiner to obtain and process cytochrome c oxidase I (COI) sequence data for the 15 most globally relevant freshwater invertebrate groups for stream assessment. Using these sequence alignments, we developed four primer combinations optimized for freshwater macrozoobenthos. All primers were evaluated by sequencing ten mock community samples, each consisting of 52 freshwater invertebrate taxa. Additionally, popular metabarcoding primers from the literature and the developed primers were tested in silico against the 15 relevant invertebrate groups. The developed primers varied in amplification efficiency and the number of detected taxa, yet all detected more taxa than standard ‘Folmer’ barcoding primers. Two new primer combinations showed more consistent amplification than a previously tested ribosomal marker (16S) and detected all 42 insect taxa present in the mock community samples. In silico evaluation revealed critical design flaws in some commonly used primers from the literature. We demonstrate a reliable strategy to develop optimized primers using the tool PrimerMiner. The developed primers detected almost all taxa present in the mock samples, and we argue that high base degeneracy is necessary to decrease primer bias as confirmed by experimental results and in silico primer evaluation. We further demonstrate that some primers currently used in metabarcoding studies may not be suitable for amplification of freshwater macroinvertebrates. Therefore, careful primer evaluation and more region / ecosystem specific primers are needed before DNA metabarcoding can be used for routine bioassessment of freshwater ecosystems.

coil: an R package for cytochrome c oxidase I (COI) DNA barcode data cleaning, translation, and error evaluation

Genome ◽

10.1139/gen-2019-0206 ◽

2020 ◽

Vol 63 (6) ◽

pp. 291-305 ◽

Cited By ~ 5

Author(s):

Cameron M. Nugent ◽

Tyler A. Elliott ◽

Sujeevan Ratnasingham ◽

Sarah J. Adamowicz

Keyword(s):

Cytochrome C ◽

Cytochrome C Oxidase ◽

Sequence Data ◽

Dna Barcode ◽

R Package ◽

Data Generation ◽

Error Identification ◽

Reading Frame ◽

Biological Contaminants ◽

And Function

Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The 5′ region of cytochrome c oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here, we present coil ( https://CRAN.R-project.org/package=coil ), an R package for the pre-processing and frameshift error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a common reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10 000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error-free sequences with greater than 97.5% accuracy. Package limitations were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and Wolbachia COI-5P sequences. Results demonstrated that coil is a strong technical error identification method but is not reliable for detecting all biological contaminants.

Estimating intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269v4 ◽

2018 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Biological Diversity ◽

Intraspecific Diversity ◽

Great Promise ◽

Data Sets ◽

Mock Community ◽

Data Set ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets

Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), losing information on intraspecific diversity in the process. While COI haplotype information is limited in resolution, it is nevertheless useful in a phylogeographic context, helping to formulate hypothesis on taxon dispersal. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach was added to the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate monitoring samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable of capture the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.

Estimating intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269v3 ◽

2018 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Biological Diversity ◽

Intraspecific Diversity ◽

Great Promise ◽

Data Sets ◽

Mock Community ◽

Data Set ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets

Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), losing information on intraspecific diversity in the process. While COI haplotype information is limited in resolution, it is nevertheless useful in a phylogeographic context, helping to formulate hypothesis on taxon dispersal. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach was added to the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate monitoring samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable of capture the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.