Estimating intraspecific genetic diversity from community DNA metabarcoding data
Background. DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTU), losing information on intraspecific diversity in the process. While COI haplotype information is limited in resolution, it is nevertheless useful in a phylogeographic context, helping to formulate hypothesis on taxon dispersal. Methods. This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotypes from freshwater macroinvertebrate metabarcoding data sets. This novel approach was added to the R package "JAMP" and can be applied to Cytochrome c oxidase subunit I (COI) amplicon datasets. We tested our haplotyping method by sequencing i) a single-species mock community composed of 31 individuals with different haplotypes spanning three orders of magnitude in biomass and ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results. We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177 - 200 OTUs, each containing an average of 2.40 to 3.30 haplotypes per OTU. Population structures were consistent between replicates, and similar between primer pairs, depending on the primer length. A closer look at abundant taxa in the data set revealed various population genetic patterns, e.g. Taeniopteryx nebulosa and Hydropsyche pellucidula with a difference in north-south haplotype distribution, while Oulimnius tuberculatus and Asellus aquaticus display no clear population pattern but differ in genetic diversity. Discussion. We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate monitoring samples using metabarcoding data. It needs to be stressed that at this point metabarcoding-informed haplotyping is not capable of capture the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and potential taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding data sets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about biological diversity but also underlying genetic diversity.