Estimating intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269v3 ◽

2018 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Biological Diversity ◽

Intraspecific Diversity ◽

Great Promise ◽

Data Sets ◽

Mock Community ◽

Data Set ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets

Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), losing information on intraspecific diversity in the process. While COI haplotype information is limited in resolution, it is nevertheless useful in a phylogeographic context, helping to formulate hypothesis on taxon dispersal. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach was added to the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate monitoring samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable of capture the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.

Download Full-text

Assessing intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269v2 ◽

2018 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Biological Diversity ◽

Intraspecific Diversity ◽

Great Promise ◽

Data Sets ◽

Mock Community ◽

Data Set ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets

Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), loosing information on intraspecific diversity in the process. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach is implemented in the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable to capture the full diversity present in bulk samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity is recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.

Download Full-text

Estimating intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269 ◽

2018 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Biological Diversity ◽

Intraspecific Diversity ◽

Great Promise ◽

Data Sets ◽

Mock Community ◽

Data Set ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets

Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), losing information on intraspecific diversity in the process. While COI haplotype information is limited in resolution, it is nevertheless useful in a phylogeographic context, helping to formulate hypothesis on taxon dispersal. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach was added to the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate monitoring samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable of capture the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.

Download Full-text

Estimating intraspecific genetic diversity from community DNA metabarcoding data

PeerJ ◽

10.7717/peerj.4644 ◽

2018 ◽

Vol 6 ◽

pp. e4644 ◽

Cited By ~ 51

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Large Scale ◽

High Throughput Sequencing ◽

Intraspecific Diversity ◽

Great Promise ◽

Mock Community ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Primer Sets ◽

Haplotype Information

BackgroundDNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal.MethodsThis study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package “JAMP” and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates.ResultsWe detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177–200 OTUs, each containing an average of 2.40–3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stoneflyTaeniopteryx nebulosaand the caddisflyHydropsyche pellucidulashowed a distinct north–south cline with respect to haplotype distribution, while the beetleOulimnius tuberculatusand the isopodAsellus aquaticusdisplayed no clear population pattern but differed in genetic diversity.DiscussionWe developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity.

Download Full-text

Assessing intraspecific genetic diversity from community DNA metabarcoding data

10.7287/peerj.preprints.3269v1 ◽

2017 ◽

Author(s):

Vasco Elbrecht ◽

Ecaterina Edith Vamos ◽

Dirk Steinke ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Data Analysis ◽

Species Composition ◽

Ecological Monitoring ◽

Intraspecific Diversity ◽

Data Sets ◽

Composition Data ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity

DNA metabarcoding provides species composition data for entire communities, yet information on intraspecific diversity is usually lost during data analysis. The capacity to infer intraspecific genetic diversity within whole communities would, however, represent a leap forward for ecological monitoring and conservation. We developed an amplicon-based sequence denoising approach that allows the identification of haplotypes from metabarcoding data sets and demonstrate its power with two freshwater macroinvertebrate data sets.

Download Full-text

Can metabarcoding resolve intraspecific genetic diversity changes to environmental stressors? A test case using river macrozoobenthos

Metabarcoding and Metagenomics ◽

10.3897/mbmg.4.51925 ◽

2020 ◽

Vol 4 ◽

Author(s):

Vera Marie Alida Zizka ◽

Martina Weiss ◽

Florian Leese

Keyword(s):

Genetic Diversity ◽

Species Diversity ◽

Haplotype Diversity ◽

Environmental Stressors ◽

Intraspecific Diversity ◽

River Systems ◽

Opposite Pattern ◽

Dna Metabarcoding ◽

Intraspecific Genetic Diversity ◽

Promising Source

Genetic diversity is the most basal level of biodiversity and determines the evolutionary capacity of species to adapt to changing environments, yet it is typically neglected in routine biomonitoring and stressor impact assessment. For a comprehensive analysis of stressor impacts on genetic diversity, it is necessary to assess genetic variants simultaneously in many individuals and species. Such an assessment is not as straightforward and usually limited to one or few focal species. However, nowadays species diversity can be assessed by analysing thousands of individuals of a community simultaneously with DNA metabarcoding. Recent bioinformatic advances also allow for the extraction of exact sequence variants (ESVs or haplotypes) in addition to Operational Taxonomic Units (OTUs). By using this new capability, we here evaluated if the analysis of intraspecific mitochondrial diversity in addition to species diversity can provide insights into responses of stream macrozoobenthic communities to environmental stressors. For this purpose, we analysed macroinvertebrate bulk samples of three German river systems with different stressor levels using DNA metabarcoding. While OTU and haplotype number were negatively correlated with stressor impact, this association was not as clear when studying haplotype diversity across all taxa. However, stressor responses were found for sensitive EPT (Ephemeroptera, Plecoptera, Trichoptera) taxa and those exceedingly resistant to organic stress. An increase in haplotype number per OTU and haplotype diversity of sensitive taxa was observed with an increase in ecosystem quality and stability, while the opposite pattern was detected for pollution resistant taxa. However, this pattern was less prominent than expected based on the strong differences in stressor intensity between sites. To compare genetic diversity among communities in river systems, we focussed on OTUs, which were present in all systems. As OTU composition differed strongly between rivers, this led to the exclusion of a high number of OTUs, especially in diverse river systems of good quality, which potentially diminished the increase in intraspecific diversity. To better understand responses of intraspecific genetic diversity to environmental stressors, for example in river ecosystems, it would be important to increase OTU overlap between compared sites, e.g. by sampling a narrower stressor gradient, and to perform calibrated studies controlling for the number of individuals and their haplotypes. However, this pioneer study shows that the extraction of haplotypes from DNA metabarcoding datasets is a promising source of information to simultaneously assess intraspecific diversity changes in response to environmental impacts for a metacommunity.

Download Full-text

Mimicry Embedding Facilitates Advanced Neural Network Training for Image-Based Pathogen Detection

mSphere ◽

10.1128/msphere.00836-20 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Artur Yakimovich ◽

Moona Huttunen ◽

Jerzy Samolej ◽

Barbara Clough ◽

Nagisa Yoshida ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Network Evolution ◽

Great Promise ◽

Data Sets ◽

Imaging Data ◽

Data Set ◽

Novel Strategy

ABSTRACT The use of deep neural networks (DNNs) for analysis of complex biomedical images shows great promise but is hampered by a lack of large verified data sets for rapid network evolution. Here, we present a novel strategy, termed “mimicry embedding,” for rapid application of neural network architecture-based analysis of pathogen imaging data sets. Embedding of a novel host-pathogen data set, such that it mimics a verified data set, enables efficient deep learning using high expressive capacity architectures and seamless architecture switching. We applied this strategy across various microbiological phenotypes, from superresolved viruses to in vitro and in vivo parasitic infections. We demonstrate that mimicry embedding enables efficient and accurate analysis of two- and three-dimensional microscopy data sets. The results suggest that transfer learning from pretrained network data may be a powerful general strategy for analysis of heterogeneous pathogen fluorescence imaging data sets. IMPORTANCE In biology, the use of deep neural networks (DNNs) for analysis of pathogen infection is hampered by a lack of large verified data sets needed for rapid network evolution. Artificial neural networks detect handwritten digits with high precision thanks to large data sets, such as MNIST, that allow nearly unlimited training. Here, we developed a novel strategy we call mimicry embedding, which allows artificial intelligence (AI)-based analysis of variable pathogen-host data sets. We show that deep learning can be used to detect and classify single pathogens based on small differences.

Download Full-text

Environmental DNA analysis shows high potential as a tool for estimating intraspecific genetic diversity in a wild fish population

10.1101/829770 ◽

2019 ◽

Cited By ~ 1

Author(s):

Satsuki Tsuji ◽

Atsushi Maruyama ◽

Masaki Miya ◽

Masayuki Ushio ◽

Hirotoshi Sato ◽

...

Keyword(s):

Genetic Diversity ◽

Water Sample ◽

Sanger Sequencing ◽

Large Scale ◽

High Throughput Sequencing ◽

Dna Analysis ◽

Environmental Dna ◽

Intraspecific Diversity ◽

Survey Method ◽

Intraspecific Genetic Diversity

AbstractEnvironmental DNA (eDNA) analysis has recently been used as a new tool for estimating intraspecific diversity. However, whether known haplotypes contained in a sample can be detected correctly using eDNA-based methods has been examined only by an aquarium experiment. Here, we tested whether the haplotypes of Ayu fish (Plecoglossus altivelis altivelis) detected in a capture survey could also be detected from an eDNA sample derived from the field that contained various haplotypes with low concentrations and foreign substances. A water sample and Ayu specimens collected from a river on the same day were analysed by eDNA analysis and Sanger sequencing, respectively. The 10 L water sample was divided into 20 filters for each of which 15 PCR replications were performed. After high-throughput sequencing, denoising was performed using two of the most widely used denoising packages, UNOISE3 and DADA2. Of the 42 haplotypes obtained from the Sanger sequencing of 96 specimens, 38 (UNOISE3) and 41 (DADA2) haplotypes were detected by eDNA analysis. When DADA2 was used, except for one haplotype, haplotypes owned by at least two specimens were detected from all the filter replications. This study showed that the eDNA analysis for evaluating intraspecific genetic diversity provides comparable results for large-scale capture-based conventional methods, suggesting that it could become a more efficient survey method for investigating intraspecific genetic diversity in the field.

Download Full-text

Assessing strengths and weaknesses of DNA metabarcoding based macroinvertebrate identification for routine stream monitoring

10.7287/peerj.preprints.2759 ◽

2017 ◽

Author(s):

Vasco Elbrecht ◽

Edith Vamos ◽

Kristian Meissner ◽

Jukka Aroviita ◽

Florian Leese

Keyword(s):

Large Scale ◽

Monitoring Program ◽

Ecological Status ◽

Taxonomic Resolution ◽

Great Promise ◽

Morphological Identification ◽

Full Potential ◽

Stream Monitoring ◽

Dna Metabarcoding ◽

Primer Sets

1) DNA metabarcoding holds great promise for the assessment of macroinvertebrates in stream ecosystems. However, few large-scale studies have compared the performance of DNA metabarcoding with that of routine morphological identification. 2) We performed metabarcoding using four primer sets on macroinvertebrate samples from 18 stream sites across Finland. The samples were collected in 2013 and identified based on morphology as part of a Finnish stream monitoring program. Specimens were morphologically classified, following standardised protocols, to the lowest taxonomic level for which identification was feasible in the routine national monitoring. 3) DNA metabarcoding identified more than twice the number of taxa than the morphology-based protocol, and also yielded a higher taxonomic resolution. For each sample, we detected more taxa by metabarcoding than by the morphological method, and all four primer sets exhibited comparably good performance. Sequence read abundance and the number of specimens per taxon (a proxy for biomass) were significantly correlated in each sample, although the adjusted R2 were low. With a few exceptions, the ecological status assessment metrics calculated from morphological and DNA metabarcoding datasets were similar. Given the recent reduction in sequencing costs, metabarcoding is currently approximately as expensive as morphology-based identification. 4) Using samples obtained in the field, we demonstrated that DNA metabarcoding can achieve comparable assessment results to current protocols relying on morphological identification. Thus, metabarcoding represents a feasible and reliable method to identify macroinvertebrates in stream bioassessment, and offers powerful advantage over morphological identification in providing identification for taxonomic groups that are unfeasible to identify in routine protocols. To unlock the full potential of DNA metabarcoding for ecosystem assessment, however, it will be necessary to address key problems with current laboratory protocols and reference databases.

Download Full-text

Analysis of Genetic Diversity in Chrysopogon aciculatus Using Intersimple Sequence Repeat and Sequence-related Amplified Polymorphism Markers

HortScience ◽

10.21273/hortsci.51.8.972 ◽

2016 ◽

Vol 51 (8) ◽

pp. 972-979 ◽

Cited By ~ 2

Author(s):

Xinyi Zhang ◽

Li Liao ◽

Zhiyong Wang ◽

Changjun Bai ◽

Jianxiu Liu

Keyword(s):

Genetic Diversity ◽

Cluster Analysis ◽

Molecular Genetic ◽

Issr Markers ◽

Data Sets ◽

Sequence Repeat ◽

Data Set ◽

Srap Markers ◽

Molecular Genetic Diversity ◽

High Level

Molecular genetic diversity and relationships among 86 Chrysopogon aciculatus (Retz.) Trin. accessions were assessed using intersimple sequence repeat (ISSR) and sequence-related amplified polymorphism (SRAP) markers. Twenty-five ISSR markers generated 283 amplification bands, of which 266 were polymorphic. In addition, 576 polymorphic bands were detected from 627 bands amplified using 30 SRAP primers. Both marker types revealed a high level of genetic diversity, with ISSR markers showing a higher proportion of polymorphic loci (PPL; 94%) than SRAP markers (91.87%). The ISSR and SRAP data were significantly correlated (r = 0.8023). Cluster analysis of the separate ISSR and SRAP data sets clustered the accessions into three groups, which generally were consistent with geographic provenance. Cluster analysis of the combined ISSR and SRAP data set revealed four major groups similar to those based solely on ISSR or SRAP markers. The findings demonstrate that ISSR and SRAP markers are reliable and effective tools for analysis of genetic diversity in C. aciculatus.

Download Full-text