PriSeT: Efficient De Novo Primer Discovery

AbstractMotivationDNA metabarcoding is a commonly applied technique used to infer the species composition of environmental samples. These samples can comprise hundreds of organisms that can be closely or very distantly related in the taxonomic tree of life. DNA metabarcoding combines polymerase chain reaction (PCR) and next-generation sequencing (NGS), whereby a short, homologous sequence of DNA is amplified and sequenced from all members of the community. Sequences are then taxonomically identified based on their match to a reference database. Ideally, each species of interest would have a unique DNA barcode. This short, variable sequence needs to be flanked by relatively conserved regions that can be used as primer binding sites. Appropriate PCR primer pairs would match to a broad evolutionary range of taxa, such that we only need a few to achieve high taxonomic coverage. At the same time however, the DNA barcodes between primer pairs should be different to allow us to distinguish between species to improve resolution. This poses an interesting optimization problem. More specifically: Given a set of references ℛ = {R1, R2, …, Rm}, the problem is to find a primer set P balancing both: high taxonomic coverage and high resolution. This goal can be captured by filtering for frequent primers and ranking by coverage or variation, i.e. the number of unique barcodes. Here we present the software PriSeT, an offline primer discovery tool that is capable of processing large libraries and is robust against mislabeled or low quality references. It tackles the computationally expensive steps with linear runtime filters and efficient encodings.ResultsWe first evaluated PriSeT on references (mostly 18S rRNA genes) from 19 clades covering eukaryotic organisms that are typical for freshwater plankton samples. PriSeT recovered several published primer sets as well as additional, more chemically suitable primer sets. For these new sets, we compared frequency, taxon coverage, and amplicon variation with published primer sets. For 11 clades we found de novo primer pairs that cover more taxa than the published ones, and for six clades de novo primers resulted in greater sequence (i.e., DNA barcode) variation. We also applied PriSeT to 19 SARS-CoV-2 genomes and computed 114 new primer pairs with the additional constraint that the sequences have no co-occurrences in other taxa. These primer sets would be suitable for empirical testing.Availabilityhttps://github.com/mariehoffmann/[email protected]

Download Full-text

Design and Validation of Four New Primers for Next-Generation Sequencing To Target the 18S rRNA Genes of Gastrointestinal Ciliate Protozoa

Applied and Environmental Microbiology ◽

10.1128/aem.01644-14 ◽

2014 ◽

Vol 80 (17) ◽

pp. 5515-5521 ◽

Cited By ~ 22

Author(s):

Suzanne L. Ishaq ◽

André-Denis G. Wright

Keyword(s):

18S Rrna ◽

18S Rrna Gene ◽

Rrna Genes ◽

Rrna Gene ◽

Content Type ◽

Hypervariable Regions ◽

Blast Database ◽

Primer Sets ◽

18S Rrna Genes ◽

Ciliate Protozoa

ABSTRACTFour new primers and one published primer were used to PCR amplify hypervariable regions within the protozoal 18S rRNA gene to determine which primer pair provided the best identification and statistical analysis. PCR amplicons of 394 to 498 bases were generated from three primer sets, sequenced using Roche 454 pyrosequencing with Titanium, and analyzed using the BLAST database (NCBI) and MOTHUR version 1.29. The protozoal diversity of rumen contents from moose in Alaska was assessed. In the present study, primer set 1, P-SSU-316F and GIC758R (amplicon of 482 bases), gave the best representation of diversity using BLAST classification, and the set amplifiedEntodinium simplexandOstracodiniumspp., which were not amplified by the other two primer sets. Primer set 2, GIC1080F and GIC1578R (amplicon of 498 bases), had similar BLAST results and a slightly higher percentage of sequences that were identified with a higher sequence identity. Primer sets 1 and 2 are recommended for use in ruminants. However, primer set 1 may be inadequate to determine protozoal diversity in nonruminants. The amplicons created by primer set 1 were indistinguishable for certain species within the generaBandia,Blepharocorys,Polycosta, andTetratoxumand betweenHemiprorodon gymnoprosthiumandProrodonopsiscoli, none of which are normally found in the rumen.

Download Full-text

Mind the gap-analysis! – How complete are DNA barcode reference libraries for monitoring-relevant aquatic species in Europe?

ARPHA Conference Abstracts ◽

10.3897/aca.4.e65473 ◽

2021 ◽

Vol 4 ◽

Author(s):

Hannah Weigand

Keyword(s):

Gap Analysis ◽

General Pattern ◽

Dna Barcode ◽

Marine Species ◽

Biotic Index ◽

Reference Database ◽

Data Systems ◽

Freshwater Macroinvertebrates ◽

Public Data ◽

Dna Metabarcoding

Molecular species identification with DNA metabarcoding can potentially accelerate, streamline and standardise biomonitoring routines. Currently, it is tested how this new technique can be implemented for the European Water Framework Directive (WFD) and the European Marine Strategy Framework Directive (MSFD). To connect the results from DNA metabarcoding with the current monitoring routines, an extensive, high-quality DNA barcode reference database is required. Hence, a gap-analysis of the Barcode of Life Data Systems (BOLD) was performed as part of the EU-COST Action DNAqua-Net (Weigand et al. 2019), which was updated in 2021. It aimed to analyse the completeness of BOLD for species on the national WFD monitoring lists and for marine species on the ERMS (European Register of Marine Species) and AMBI (AZTI Marine Biotic Index) lists. The data were supplemented by MitoFish for freshwater fish and Diat.barcode for diatoms. Several thousands of species were included in the gap-analysis, although not all countries currently apply species-level data for all WFD biological quality elements. The barcode coverage of the different taxonomic groups varied strongly, with high levels (> 80%) for fish and freshwater vascular plants, and low levels for diatoms and freshwater plathelminths (< 15%). As a general pattern, species monitored by several countries had a higher coverage compared to those monitored only by a single country. The gap-analysis focused additionally on the availability of metadata (e.g., geographical origin of the specimen or determiner name) for the barcodes. Hence, we analysed if the data were stored public (with access to metadata) or private (without access to metadata) in BOLD or if the data were mined from GenBank (metadata are potentially available but not easy to access). Although public data were stored for many species (43% of freshwater macroinvertebrates and 21% of AMBI marine species), the proportion of species without public metadata was not neglectable (22% of freshwater macroinvertebrates and 22% of AMBI marine species). Another issue that emerged from the gap-analysis was that several deposited barcodes were identified by reverse taxonomy (RT), i.e., specimens were molecularly identified via its DNA barcode and the barcode itself is stored in BOLD with the associated species name. This can be problematic as originally misidentified samples can lead to false RT-identifications, making the data appear more trustworthy than it actually is. For the analysed freshwater macroinvertebrates, 39% of all barcodes and 65% of all public data originated from RT, impacting 11% of all monitored species. As the information about RT is only available for publicly stored data, the real impact of RT might even be higher.

Download Full-text

Detection of Macrobenthos Species With Metabarcoding Is Consistent in Bulk DNA but Dependent on Body Size and Sclerotization in eDNA From the Ethanol Preservative

Frontiers in Marine Science ◽

10.3389/fmars.2021.637858 ◽

2021 ◽

Vol 8 ◽

Author(s):

Sofie Derycke ◽

Sara Maes ◽

Laure Van den Bulcke ◽

Joran Vanhollebeke ◽

Jan Wittoeck ◽

...

Keyword(s):

Body Size ◽

Reference Database ◽

Biodiversity Monitoring ◽

Marine Monitoring ◽

Diversity Patterns ◽

The North ◽

Dna Metabarcoding ◽

The North Sea ◽

Primer Sets ◽

Marine Macrobenthos

DNA metabarcoding is a promising method to increase cost and time efficiency of marine monitoring. While substantial evidence exists that bulk DNA samples adequately reflect diversity patterns of marine macrobenthos, the potential of eDNA in the ethanol preservative of benthic samples for biodiversity monitoring remains largely unexplored. We investigated species detection in bulk DNA and eDNA from the ethanol preservative in samples from four distinct macrobenthic communities in the North Sea. Bulk DNA and eDNA were extracted with different extraction kits and five COI primer sets were tested. Despite the availability of a nearly complete reference database, at most 22% of the amplicon sequence variants (ASVs) were assigned taxonomy at the phylum level. However, the unassigned ASVs represented only a small fraction of the total reads (13%). The Leray primer set outperformed the four other primer sets in the number of non-chimeric reads and species detected, and in the recovery of beta diversity patterns. Community composition differed significantly between bulk DNA and eDNA samples, but both sample types were able to differentiate the four communities. The probability of detecting a species in the eDNA from the ethanol preservative was significantly lower than for bulk DNA for macrobenthos species having small to medium body size and for species having chitine or CaCO3 in their cuticula. Detection in the bulk DNA samples was not affected by the investigated morphological traits, indicating that monitoring of macrobenthos species will be most robust when using bulk DNA as template for metabarcoding.

Download Full-text

Long-read DNA metabarcoding of ribosomal rRNA in the analysis of fungi from aquatic environments

10.1101/283127 ◽

2018 ◽

Cited By ~ 4

Author(s):

Felix Heeger ◽

Elizabeth C. Bourne ◽

Christiane Baschien ◽

Andrey Yurkov ◽

Boyke Bunk ◽

...

Keyword(s):

De Novo ◽

Error Rates ◽

Rrna Genes ◽

Taxonomic Resolution ◽

Sequencing Error ◽

Rrna Gene ◽

Dna Metabarcoding ◽

Long Read ◽

Reference Sequences ◽

Ribosomal Rrna

ABSTRACTDNA metabarcoding is now widely used to study prokaryotic and eukaryotic microbial diversity. Technological constraints have limited most studies to marker lengths of ca. 300-600 bp. Longer sequencing reads of several 5 thousand bp are now possible with third-generation sequencing. The increased marker lengths provide greater taxonomic resolution and enable the use of phylogenetic methods of classifcation, but longer reads may be subject to higher rates of sequencing error and chimera formation. In addition, most well-established bioinformatics tools for DNA metabarcoding were originally 10 designed for short reads and are therefore not suitable. Here we used Pacifc Biosciences circular consensus sequencing (CCS) to DNA-metabarcode environmental samples using a ca. 4,500 bp marker that included most of the eukaryote ribosomal SSU and LSU rRNA genes and the ITS spacer region. We developed a long-read analysis pipeline that reduced error rates to levels 15 comparable to short-read platforms. Validation using fungal isolates and a mock community indicated that our pipeline detected 98% of chimeras de novo i.e., even in the absence of reference sequences. We recovered 947 OTUs from water and sediment samples in a natural lake, 848 of which could be classifed to phylum, 486 to family, 397 to genus and 330 to species. By 20 allowing for the simultaneous use of three global databases (Unite, SILVA, RDP LSU), long-read DNA metabarcoding provided better taxonomic resolution than any single marker. We foresee the use of long reads enabling the cross-validation of reference sequences and the synthesis of ribosomal rRNA gene databases. The universal nature of the rRNA operon and our recovery of >100 25 non-fungal OTUs indicate that long-read DNA metabarcoding holds promise for the study of eukaryotic diversity more broadly.

Download Full-text

DNA barcoding to support conservation: species identification, genetic structure and biogeography of fishes in the Murray - Darling River Basin, Australia

Marine and Freshwater Research ◽

10.1071/mf11027 ◽

2011 ◽

Vol 62 (8) ◽

pp. 887 ◽

Cited By ~ 22

Author(s):

Christopher M. Hardy ◽

Mark Adams ◽

Dean R. Jerry ◽

Leon N. Court ◽

Matthew J. Morgan ◽

...

Keyword(s):

Freshwater Fish ◽

Introduced Species ◽

Native Species ◽

Captive Breeding ◽

Dna Barcode ◽

River System ◽

Active Management ◽

Rrna Genes ◽

Gut Contents ◽

18S Rrna Genes

Freshwater fish stocks worldwide are under increasing threat of overfishing, disease, pollution and competition from introduced species. In the Murray—Darling Basin (MDB), the largest river system of Australia, more than half the native species are listed as rare or endangered. Active management is required to counteract reduction in population sizes, prevent local extinctions and to maintain genetic diversity. We describe the first comprehensive set of DNA barcodes able to discriminate between all 58 native and introduced species of freshwater fish recorded in the MDB. These barcodes also distinguish populations from those in adjacent basins, with estimated separation times as short as 0.1 million years ago. We demonstrate the feasibility of using DNA fingerprinting of ribosomal RNA (12S and 18S rRNA) genes and mitochondrial DNA control region (mtDNA CR) sequences to identify species from eggs, larvae, tissues and predator gut contents as well as differentiate populations, morphologically cryptic species and hybrids. The DNA barcode resource will enhance capacity in many areas of fish conservation biology that can benefit from improved knowledge of genetic provenance. These include captive breeding and restocking programs, life history studies and ecological research into the interactions between populations of native and exotic species.

Download Full-text

DNA Metabarcoding Methods for the Study of Marine Benthic Meiofauna: A Review

Frontiers in Marine Science ◽

10.3389/fmars.2021.730063 ◽

2021 ◽

Vol 8 ◽

Author(s):

Romy Gielings ◽

Maria Fais ◽

Diego Fontaneto ◽

Simon Creer ◽

Filipe Oliveira Costa ◽

...

Keyword(s):

Rrna Genes ◽

Biodiversity Monitoring ◽

Standard Tool ◽

Dna Metabarcoding ◽

High Species Richness ◽

Pre Treatment ◽

18S Rrna Genes ◽

Key Aspects ◽

Response To Environmental Change

Meiofaunal animals, roughly between 0.045 and 1 mm in size, are ubiquitous and ecologically important inhabitants of benthic marine ecosystems. Their high species richness and rapid response to environmental change make them promising targets for ecological and biomonitoring studies. However, diversity patterns of benthic marine meiofauna remain poorly known due to challenges in species identification using classical morphological methods. DNA metabarcoding is a powerful tool to overcome this limitation. Here, we review DNA metabarcoding approaches used in studies on marine meiobenthos with the aim of facilitating researchers to make informed decisions for the implementation of DNA metabarcoding in meiofaunal biodiversity monitoring. We found that the applied methods vary greatly between researchers and studies, and concluded that further explicit comparisons of protocols are needed to apply DNA metabarcoding as a standard tool for assessing benthic meiofaunal community composition. Key aspects that require additional consideration include: (1) comparability of sample pre-treatment methods; (2) integration of different primers and molecular markers for both the mitochondrial cytochrome c oxidase subunit I (COI) and the nuclear 18S rRNA genes to maximize taxon recovery; (3) precise and standardized description of sampling methods to allow for comparison and replication; and (4) evaluation and testing of bioinformatic pipelines to enhance comparability between studies. By enhancing comparability between the various approaches currently used for the different aspects of the analyses, DNA metabarcoding will improve the long-term integrative potential for surveying and biomonitoring marine benthic meiofauna.

Download Full-text

Development and validation of DNA metabarcoding COI primers for aquatic invertebrates using the R package "PrimerMiner"

10.7287/peerj.preprints.2044v1 ◽

2016 ◽

Cited By ~ 1

Author(s):

Vasco Elbrecht ◽

Florian Leese

Keyword(s):

In Silico ◽

Sequence Data ◽

Dna Barcode ◽

R Package ◽

Gene Marker ◽

Amplification Efficiency ◽

Mock Community ◽

Dna Metabarcoding ◽

Primer Sets ◽

High Base

1) DNA metabarcoding is a powerful tool to assess biodiversity by amplifying and sequencing a standardized gene marker region. However, typically used barcoding genes, such as the cytochrome c oxidase subunit I (COI) region for animals, are highly variable. Thus, different taxa in communities under study are often not amplified equally well and some might even remain undetected due to primer bias. To reduce these problems, optimized region- and/or ecosystem- specific metabarcoding primers are necessary. 2) We developed the R package PrimerMiner, which batch downloads DNA barcode gene sequences from BOLD and NCBI databases for specified target taxa and then applies sequence clustering to reduce biases introduced by differed number of available sequences per species. To design primers targeted for freshwater invertebrates, we downloaded COI data for the 15 most important invertebrate groups relevant for stream ecosystem assessment. Four primer sets with high base degeneracy were developed and their performance tested by sequencing ten mock community samples consisting each of 52 freshwater invertebrate taxa. Additionally, we evaluated the developed primers against other metabarcoding primers in silico using PrimerMiner. 3) Amplification and sequencing was successful for all ten mock community samples with the four different primer combinations. The developed primers varied in amplification efficiency and amount of taxa detected, but all primer sets detected more taxa than standard Folmer barcoding primers. Additionally, the BF / BR primers amplified taxa very consistently, the BF2+BR2 and BF2+BR1 primer combination even better than a previously tested ribosomal marker (16S). Except for the BF1+BR1 primer combination, all BF / BR primers detected all 42 insect taxa present in the mock samples. In silico evaluation of the developed primers showed that they are also likely to work very well on other non aquatic invertebrate samples. 4) With PrimerMiner, we here provide a useful tool to obtain relevant sequence data for targeted primer development and evaluation. Our sequence datasets generated with the newly developed metabarcoding primers demonstrate that the design of optimized primers with high base degeneracy is superior to classical markers and enable us to detect almost 100% of animal taxa present in a sample using the standard COI barcoding gene. Therefore, the PrimerMiner package and primers developed using this tool are useful beyond assessment of biodiversity in aquatic ecosystems.

Download Full-text

A preliminary survey of lichen associated eukaryotes using pyrosequencing

The Lichenologist ◽

10.1017/s0024282911000648 ◽

2011 ◽

Vol 44 (1) ◽

pp. 137-146 ◽

Cited By ~ 46

Author(s):

Scott T. BATES ◽

Donna BERG-LYONS ◽

Christian L. LAUBER ◽

William A. WALTERS ◽

Rob KNIGHT ◽

...

Keyword(s):

Bacterial Diversity ◽

Recent Work ◽

18S Rrna ◽

High Abundance ◽

Rrna Genes ◽

Preliminary Survey ◽

Lichenicolous Fungi ◽

18S Rrna Genes ◽

Eukaryotic Organisms ◽

Symbiotic Organisms

AbstractAlthough various eukaryotic organisms, such as arthropods, endolichenic/lichenicolous fungi, and nematodes, have been isolated from lichens, the diversity and structure of eukaryotic communities associated with lichen thalli has not been well studied. In addressing this knowledge gap, we used bar-coded pyrosequencing of 18S rRNA genes to survey eukaryotes associated with thalli of three different lichen species. In addition to revealing an expected high abundance of lichen biont-related 18S genes, sequences recovered in our survey showed non-biont fungi from the Ascomycota also have a substantial presence in these thalli. Our samples additionally harboured fungi representing phyla (Blastocladiomycota, Chytridiomycota) that have not been isolated previously from lichens; however, their very low abundance indicates an incidental presence. The recovery of Alveolata, Metazoa, and Rhizaria sequences, along with recent work revealing the considerable bacterial diversity in these same samples, suggests lichens function as minute ecosystems in addition to being symbiotic organisms.

Download Full-text

Design of new universal PCR primers for detecting 18S rRNA genes of bacterivorous protozoa in the environment

Journal of Japan Society of Civil Engineers Ser G (Environmental Research) ◽

10.2208/jscejer.74.iii_239 ◽

2018 ◽

Vol 74 (7) ◽

pp. III_239-III_245

Author(s):

Kanji NAKAMURA ◽

Haruka OKUDA

Keyword(s):

18S Rrna ◽

Pcr Primers ◽

Rrna Genes ◽

18S Rrna Genes

Download Full-text

Phylogeny of hymenolepidids (Cestoda: Cyclophyllidea) from mammals: sequences of 18S rRNA and COI genes confirm major clades revealed by the 28S rRNA analyses

Journal of Helminthology ◽

10.1017/s0022149x21000110 ◽

2021 ◽

Vol 95 ◽

Author(s):

B. Neov ◽

G.P. Vasileva ◽

G. Radoslavov ◽

P. Hristov ◽

D.T.J. Littlewood ◽

...

Keyword(s):

Ribosomal Rna ◽

18S Rrna ◽

Phylogenetic Analyses ◽

The Other ◽

Rrna Genes ◽

Host Switching ◽

28S Rrna ◽

Mitochondrial Cytochrome ◽

Rapid Radiation ◽

18S Rrna Genes

Abstract The aim of the study is to test a hypothesis for the phylogenetic relationships among mammalian hymenolepidid tapeworms, based on partial (D1–D3) nuclear 28S ribosomal RNA (rRNA) genes, by estimating new molecular phylogenies for the group based on partial mitochondrial cytochrome c oxidase I (COI) and nuclear 18S rRNA genes, as well as a combined analysis using all three genes. New sequences of COI and 18S rRNA genes were obtained for Coronacanthus integrus, C. magnihamatus, C. omissus, C. vassilevi, Ditestolepis diaphana, Lineolepis scutigera, Spasskylepis ovaluteri, Staphylocystis tiara, S. furcata, S. uncinata, Vaucherilepis trichophorus and Neoskrjabinolepis sp. The phylogenetic analyses confirmed the major clades identified by Haukisalmi et al. (Zoologica Scripta 39: 631–641, 2010): Ditestolepis clade, Hymenolepis clade, Rodentolepis clade and Arostrilepis clade. While the Ditestolepis clade is associated with soricids, the structure of the other three clades suggests multiple evolutionary events of host switching between shrews and rodents. Two of the present analyses (18S rRNA and COI genes) show that the basal relationships of the four mammalian clades are branching at the same polytomy with several hymenolepidids from birds (both terrestrial and aquatic). This may indicate a rapid radiation of the group, with multiple events of colonizations of mammalian hosts by avian parasites.

Download Full-text