scholarly journals Composition and distribution of fish environmental DNA in an Adirondack watershed

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10539
Author(s):  
Robert S. Cornman ◽  
James E. McKenna, Jr. ◽  
Jennifer A. Fike

Background Environmental DNA (eDNA) surveys are appealing options for monitoring aquatic biodiversity. While factors affecting eDNA persistence, capture and amplification have been heavily studied, watershed-scale surveys of fish communities and our confidence in such need further exploration. Methods We characterized fish eDNA compositions using rapid, low-volume filtering with replicate and control samples scaled for a single Illumina MiSeq flow cell, using the mitochondrial 12S ribosomal RNA locus for taxonomic profiling. Our goals were to determine: (1) spatiotemporal variation in eDNA abundance, (2) the filtrate needed to achieve strong sequencing libraries, (3) the taxonomic resolution of 12S ribosomal sequences in the study environment, (4) the portion of the expected fish community detectable by 12S sequencing, (5) biases in species recovery, (6) correlations between eDNA compositions and catch per unit effort (CPUE) and (7) the extent that eDNA profiles reflect major watershed features. Our bioinformatic approach included (1) estimation of sequencing error from unambiguous mappings and simulation of taxonomic assignment error under various mapping criteria; (2) binning of species based on inferred assignment error rather than by taxonomic rank; and (3) visualization of mismatch distributions to facilitate discovery of distinct haplotypes attributed to the same reference. Our approach was implemented within the St. Regis River, NY, USA, which supports tribal and recreational fisheries and has been a target of restoration activities. We used a large record of St. Regis-specific observations to validate our assignments. Results We found that 300 mL drawn through 25-mm cellulose nitrate filters yielded greater than 5 ng/µL DNA at most sites in summer, which was an approximate threshold for generating strong sequencing libraries in our hands. Using inferred sequence error rates, we binned 12S references for 110 species on a state checklist into 85 single-species bins and seven multispecies bins. Of 48 bins observed by capture survey in the St. Regis, we detected eDNA consistent with 40, with an additional four detections flagged as potential contaminants. Sixteen unobserved species detected by eDNA ranged from plausible to implausible based on distributional data, whereas six observed species had no 12S reference sequence. Summed log-ratio compositions of eDNA-detected taxa correlated with log(CPUE) (Pearson’s R = 0.655, P < 0.001). Shifts in eDNA composition of several taxa and a genotypic shift in channel catfish (Ictalurus punctatus) coincided with the Hogansburg Dam, NY, USA. In summary, a simple filtering apparatus operated by field crews without prior expertise gave useful summaries of eDNA composition with minimal evidence of field contamination. 12S sequencing achieved useful taxonomic resolution despite the short marker length, and data exploration with standard bioinformatic tools clarified taxonomic uncertainty and sources of error.

2018 ◽  
Author(s):  
Felix Heeger ◽  
Elizabeth C. Bourne ◽  
Christiane Baschien ◽  
Andrey Yurkov ◽  
Boyke Bunk ◽  
...  

ABSTRACTDNA metabarcoding is now widely used to study prokaryotic and eukaryotic microbial diversity. Technological constraints have limited most studies to marker lengths of ca. 300-600 bp. Longer sequencing reads of several 5 thousand bp are now possible with third-generation sequencing. The increased marker lengths provide greater taxonomic resolution and enable the use of phylogenetic methods of classifcation, but longer reads may be subject to higher rates of sequencing error and chimera formation. In addition, most well-established bioinformatics tools for DNA metabarcoding were originally 10 designed for short reads and are therefore not suitable. Here we used Pacifc Biosciences circular consensus sequencing (CCS) to DNA-metabarcode environmental samples using a ca. 4,500 bp marker that included most of the eukaryote ribosomal SSU and LSU rRNA genes and the ITS spacer region. We developed a long-read analysis pipeline that reduced error rates to levels 15 comparable to short-read platforms. Validation using fungal isolates and a mock community indicated that our pipeline detected 98% of chimeras de novo i.e., even in the absence of reference sequences. We recovered 947 OTUs from water and sediment samples in a natural lake, 848 of which could be classifed to phylum, 486 to family, 397 to genus and 330 to species. By 20 allowing for the simultaneous use of three global databases (Unite, SILVA, RDP LSU), long-read DNA metabarcoding provided better taxonomic resolution than any single marker. We foresee the use of long reads enabling the cross-validation of reference sequences and the synthesis of ribosomal rRNA gene databases. The universal nature of the rRNA operon and our recovery of >100 25 non-fungal OTUs indicate that long-read DNA metabarcoding holds promise for the study of eukaryotic diversity more broadly.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Kelley Paskov ◽  
Jae-Yoon Jung ◽  
Brianna Chrisman ◽  
Nate T. Stockham ◽  
Peter Washington ◽  
...  

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.


Animals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 3186
Author(s):  
Eunkyung Choi ◽  
Sun Hee Kim ◽  
Seung Jae Lee ◽  
Euna Jo ◽  
Jinmu Kim ◽  
...  

Trematomus loennbergii Regan, 1913, is an evolutionarily important marine fish species distributed in the Antarctic Ocean. However, its genome has not been studied to date. In the present study, whole genome sequencing was performed using next-generation sequencing (NGS) technology to characterize its genome and develop genomic microsatellite markers. The 25-mer frequency distribution was estimated to be the best, and the genome size was predicted to be 815,042,992 bp. The heterozygosity, average rate of read duplication, and sequencing error rates were 0.536%, 0.724%, and 0.292%, respectively. These data were used to analyze microsatellite markers, and a total of 2,264,647 repeat motifs were identified. The most frequent repeat motif was di-nucleotide with 87.00% frequency, followed by tri-nucleotide (10.45%), tetra-nucleotide (1.94%), penta-nucleotide (0.34%), and hexa-nucleotide (0.27%). The AC repeat motif was the most abundant motif among di-nucleotides and among all repeat motifs. Among microsatellite markers, 181 markers were selected and PCR technology was used to validate several markers. A total of 15 markers produced only one band. In summary, these results provide a good basis for further studies, including evolutionary biology studies and population genetics of Antarctic fish species.


Author(s):  
Nicole Foster ◽  
Kor-jent Dijk ◽  
Ed Biffin ◽  
Jennifer Young ◽  
Vicki Thomson ◽  
...  

A proliferation in environmental DNA (eDNA) research has increased the reliance on reference sequence databases to assign unknown DNA sequences to known taxa. Without comprehensive reference databases, DNA extracted from environmental samples cannot be correctly assigned to taxa, limiting the use of this genetic information to identify organisms in unknown sample mixtures. For animals, standard metabarcoding practices involve amplification of the mitochondrial Cytochrome-c oxidase subunit 1 (CO1) region, which is a universally amplifyable region across majority of animal taxa. This region, however, does not work well as a DNA barcode for plants and fungi, and there is no similar universal single barcode locus that has the same species resolution. Therefore, generating reference sequences has been more difficult and several loci have been suggested to be used in parallel to get to species identification. For this reason, we developed a multi-gene targeted capture approach to generate reference DNA sequences for plant taxa across 20 target chloroplast gene regions in a single assay. We successfully compiled a reference database for 93 temperate coastal plants including seagrasses, mangroves, and saltmarshes/samphire’s. We demonstrate the importance of a comprehensive reference database to prevent species going undetected in eDNA studies. We also investigate how using multiple chloroplast gene regions impacts the ability to discriminate between taxa.


Plants ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1126
Author(s):  
Robert Korzeniewicz ◽  
Marlena Baranowska ◽  
Hanna Kwaśna ◽  
Gniewko Niedbała ◽  
Jolanta Behnke-Borowczyk

So far, there have been no studies on fungal communities in Prunus serotina (black cherry) wood. Our objectives were to characterize fungal communities from P. serotina wood and to evaluate effects of glyphosate (Glifocyd 360 SL) used on P. serotina stumps on abundance, species richness and diversity of those communities. In August 2016, in the Podanin Forest District, stumps of black cherry trees left after felling were treated with the herbicide. Control stumps were treated with water. Wood discs were cut from the surface of the stumps in May and July–August 2017. Eight treatment combinations (2 herbicide treatments × 2 disc sizes × 2 sample times) were tested. Sub-samples were pooled and ground in an acryogenic mill. Environmental DNA was extracted with a Plant Genomic DNA Purification Kit. The ITS1, 5.8S rDNA region was used to identify fungal species, using primers ITS1FI2 5′-GAACCWGCGGARGGATCA-3′ and 5.8S 5′-CGCTGCGTT CTTCATCG-3′. The amplicons were sequenced using the Illumina system. The results were subjected to bioinformatic analysis. Sequences were compared with reference sequences from the NCBI database using the BLASTn 2.8.0 algorithm. Abundance of fungi was defined as the number of Operational Taxonomic Units (OTUs), and diversity as the number of species in a sample. Differences between the number of OTUs and taxa were analyzed using the chi-squared test (χ2). Diversity in microbial communities was compared using diversity indices. A total of 54,644 OTUs were obtained. Culturable fungi produced 49,808 OTUs (91.15%), fungi not known from culture had 2571 OTUs (4.70%), non-fungal organisms had 1333 (2.44%) and organisms with no reference sequence in NCBI, 934 OTUs (1.71%). The total number of taxa ranged from 120 to 319. Fungi in stump wood were significantly more abundant in July–August than in May, in stumps >5 cm diameter than in stumps <5 cm diameter, in glyphosate-treated than in untreated stumps when sampled in May, and in untreated than in glyphosate-treated stumps when sampled in July–August. Species richness was significantly greater in July–August than in May, and in stumps >5 cm diameter than in stumps <5 cm diameter, either treated or untreated, depending on size. Herbicides can therefore affect the abundance and diversity of fungal communities in deciduous tree wood. The greater frequency of Ascomycota in herbicide-treated than in untreated stumps indicates their greater tolerance of glyphosate.


Genes ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 50
Author(s):  
Axel Barlow ◽  
Stefanie Hartmann ◽  
Javier Gonzalez ◽  
Michael Hofreiter ◽  
Johanna L. A. Paijmans

A standard practise in palaeogenome analysis is the conversion of mapped short read data into pseudohaploid sequences, frequently by selecting a single high-quality nucleotide at random from the stack of mapped reads. This controls for biases due to differential sequencing coverage, but it does not control for differential rates and types of sequencing error, which are frequently large and variable in datasets obtained from ancient samples. These errors have the potential to distort phylogenetic and population clustering analyses, and to mislead tests of admixture using D statistics. We introduce Consensify, a method for generating pseudohaploid sequences, which controls for biases resulting from differential sequencing coverage while greatly reducing error rates. The error correction is derived directly from the data itself, without the requirement for additional genomic resources or simplifying assumptions such as contemporaneous sampling. For phylogenetic and population clustering analysis, we find that Consensify is less affected by artefacts than methods based on single read sampling. For D statistics, Consensify is more resistant to false positives and appears to be less affected by biases resulting from different laboratory protocols than other frequently used methods. Although Consensify is developed with palaeogenomic data in mind, it is applicable for any low to medium coverage short read datasets. We predict that Consensify will be a useful tool for future studies of palaeogenomes.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Ian Salter ◽  
Mourits Joensen ◽  
Regin Kristiansen ◽  
Petur Steingrund ◽  
Poul Vestergaard

AbstractEnvironmental DNA (eDNA) has emerged as a powerful approach for studying marine fisheries and has the potential to negate some of the drawbacks of trawl surveys. However, successful applications in oceanic waters have to date been largely focused on qualitative descriptions of species inventories. Here we conducted a quantitative eDNA survey of Atlantic cod (Gadus morhua) in oceanic waters and compared it with results obtained from a standardized demersal trawl survey. Detection of eDNA originating from Atlantic cod was highly concordant (80%) with trawl catches. We observed significantly positive correlations between the regional integrals of Atlantic cod biomass (kg) and eDNA quantities (copies) (R2 = 0.79, P = 0.003) and between sampling effort-normalised Catch Per Unit Effort (kg hr−1) and eDNA concentrations (copies L−1) (R2 = 0.71, P = 0.008). These findings extend the potential application of environmental DNA to regional biomass assessments of commercially important fish stocks in the ocean.


2012 ◽  
Vol 13 (1) ◽  
pp. 185 ◽  
Author(s):  
Xin Victoria ◽  
Natalie Blades ◽  
Jie Ding ◽  
Razvan Sultana ◽  
Giovanni Parmigiani

2021 ◽  
Vol 4 ◽  
Author(s):  
Haris Zafeiropoulos ◽  
Laura Gargan ◽  
Christina Pavloudi ◽  
Evangelos Pafilis ◽  
Jens Carlsson

Environmental DNA (eDNA) metabarcoding has been commonly used in recent years (Jeunen et al. 2019) for the identification of the species composition of environmental samples. By making use of genetic markers anchored in conserved gene regions, universally present acrooss the species of large taxonomy groups, eDNA metabarcoding exploits both extra- and intra-cellular DNA fragments for biodiversity assessment. However, there is not a truly “universal” marker gene that is capable of amplifying all species across different taxa (Kress et al. 2015). The mitochondrial cytochrome C oxidase subunit I gene (COI) has many of the desirable properties of a “universal" marker and has been widely used for assessing species identity in Eukaryotes, especially metazoans (Andjar et al. 2018). However, a great number of COI Operational Taxonomic Units (OTUs) or/and Amplicon Sequence Variants (ASVs) retrieved from such studies do not match reference sequences and are often referred to as “dark matter” (Deagle et al. 2014). The aim of this study was to discover the origins and identities of these COI dark matter sequences. We built a reference phylogenetic tree that included as many COI-sequence-related information across the tree of life as possible. An overview of the steps followed is presented in Fig. 1a. Briefly, the Midori reference 2 database was used to retrieve eukaryotes sequences (183,330 species). In addition, the API of the BOLD database was used as source for the corresponding Bacteria (559 genera) and Archaea (41 genera) sequences. Consensus sequences at the family level were constructed from each of these three initial COI datasets. The COI-oriented reference phylogenetic tree of life was then built by using 1,240 consensus sequences with more than 80% of those coming from eukaryotic taxa. Phylogeny-based taxonomic assignment was then used to place query sequences. The a) total number of sequences, b) sequences assigned to Eukaryotes and c) unassigned subsets of OTUs, from marine and freshwater samples, retrieved during in-house metabarcoding experiments, were placed in the reference tree (Fig. 1b). It is clear that a large proportion of sequences targeting the COI region of Eukaryotes actually represents bacterial branches in the phylogenetic tree (Fig. 1b). We conclude that COI metabarcoding studies targeting Eukaryotes may come with a great bias derived from amplification and sequencing of bacterial taxa, depending on the primer pair used. However, for the time being, publicly available bacterial COI sequences are far too few to represent the bacterial variability; thus, a reliable taxonomic identification of them is not possible. We suggest that bacterial COI sequences should be included in the reference databases used for the taxonomy assignment of OTUs/ASVs in COI-based eukaryote metabarcoding studies to allow for bacterial sequences that were amplified to be excluded enabling researchers to exclude non-target sequences. Further, the approach presented here allows researchers to better understand the unknown unknowns and shed light on the dark matter of their metabarcoding sequence data.


2021 ◽  
Author(s):  
Barış Ekim ◽  
Bonnie Berger ◽  
Rayan Chikhi

DNA sequencing data continues to progress towards longer reads with increasingly lower sequencing error rates. We focus on the problem of assembling such reads into genomes, which poses challenges in terms of accuracy and computational resources when using cutting-edge assembly approaches, e.g. those based on overlapping reads using minimizer sketches. Here, we introduce the concept of minimizer-space sequencing data analysis, where the minimizers rather than DNA nucleotides are the atomic tokens of the alphabet. By projecting DNA sequences into ordered lists of minimizers, our key idea is to enumerate what we call k-min-mers, that are k-mers over a larger alphabet consisting of minimizer tokens. Our approach, mdBG or minimizer-dBG, achieves orders-of magnitude improvement in both speed and memory usage over existing methods without much loss of accuracy. We demonstrate three uses cases of mdBG: human genome assembly, metagenome assembly, and the representation of large pangenomes. For assembly, we implemented mdBG in software we call rust-mdbg, resulting in ultra-fast, low memory and highly-contiguous assembly of PacBio HiFi reads. A human genome is assembled in under 10 minutes using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 minutes using 1 GB RAM. For pangenome graphs, we newly allow a graphical representation of a collection of 661,405 bacterial genomes as an mdBG and successfully search it (in minimizer-space) for anti-microbial resistance (AMR) genes. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics and pangenomics.


Sign in / Sign up

Export Citation Format

Share Document