scholarly journals De novo species delimitation in metabarcoding datasets using ecology and phylogeny

Author(s):  
Caitlin Potter ◽  
Cuong Q Tang ◽  
Vera Fonseca ◽  
Delphine Lallias ◽  
John M Gaspar ◽  
...  

Background: Metabarcoding studies allow a wide variety of taxa to be analysed simultaneously in a fraction of the time taken by morphological identification, but currently metabarcoding studies must rely on sequence similarity-based methodologies to delimit operational taxonomic units (OTUs). Similarity-based OTU clustering methodologies can lead to inaccurate estimates of diversity, species’ distributions or responses to change, meaning that there is a critical need for methods to delimit species in metabarcoding datasets. Methods: We introduce SNAPhy (Species delimitation using Niche And PHYlogeny), a novel approach which utilises ecological and phylogenetic information to delimit de novo OTUs in metabarcoding datasets and avoids the problems associated with current OTU clustering methods. Sequencing reads are first divided into ecological groups based on co-occurrence, thereby reducing data complexity and facilitating the use of evolutionary and phylogenetic models (e.g. BEAST and GMYC) to delimit species-level groupings within discrete ecologically informed phylogenies. The utility of SNAPhy is demonstrated using an 18S rDNA nuclear small subunit (nSSU) dataset representing replicated samples taken along the entire length of an estuarine salinity gradient, and SNAPhy is then compared to existing OTU clustering methods. Results: All of the OTU clustering methods compared yielded different numbers of OTUs and a different taxonomic distribution of OTUs, which we suggest is due to the taxon differences that are known to exist in the degree of intraspecific divergence. SNAPhy and UCLUST (with a 98% similarity threshold) gave the most plausible numbers of OTUs, especially within the Nematoda. Additionally, the degree of variation within nematode OTUs delimited by SNAPhy lies within the range of variation in deeply metabarcoded individuals. Discussion: SNAPhy avoids the static clustering threshold problems associated with current OTU clustering methods and instead focuses on genuine biological diversity delimited according to a general lineage species concept. We suggest that the SNAPhy approach should play a crucial role in future sequencing-based biodiversity assessment by providing more accurate estimates of species diversity and distributions than current methods, thereby enabling more accurate impact assessments and better informing managerial decisions.

2017 ◽  
Author(s):  
Caitlin Potter ◽  
Cuong Q Tang ◽  
Vera Fonseca ◽  
Delphine Lallias ◽  
John M Gaspar ◽  
...  

Background: Metabarcoding studies allow a wide variety of taxa to be analysed simultaneously in a fraction of the time taken by morphological identification, but currently metabarcoding studies must rely on sequence similarity-based methodologies to delimit operational taxonomic units (OTUs). Similarity-based OTU clustering methodologies can lead to inaccurate estimates of diversity, species’ distributions or responses to change, meaning that there is a critical need for methods to delimit species in metabarcoding datasets. Methods: We introduce SNAPhy (Species delimitation using Niche And PHYlogeny), a novel approach which utilises ecological and phylogenetic information to delimit de novo OTUs in metabarcoding datasets and avoids the problems associated with current OTU clustering methods. Sequencing reads are first divided into ecological groups based on co-occurrence, thereby reducing data complexity and facilitating the use of evolutionary and phylogenetic models (e.g. BEAST and GMYC) to delimit species-level groupings within discrete ecologically informed phylogenies. The utility of SNAPhy is demonstrated using an 18S rDNA nuclear small subunit (nSSU) dataset representing replicated samples taken along the entire length of an estuarine salinity gradient, and SNAPhy is then compared to existing OTU clustering methods. Results: All of the OTU clustering methods compared yielded different numbers of OTUs and a different taxonomic distribution of OTUs, which we suggest is due to the taxon differences that are known to exist in the degree of intraspecific divergence. SNAPhy and UCLUST (with a 98% similarity threshold) gave the most plausible numbers of OTUs, especially within the Nematoda. Additionally, the degree of variation within nematode OTUs delimited by SNAPhy lies within the range of variation in deeply metabarcoded individuals. Discussion: SNAPhy avoids the static clustering threshold problems associated with current OTU clustering methods and instead focuses on genuine biological diversity delimited according to a general lineage species concept. We suggest that the SNAPhy approach should play a crucial role in future sequencing-based biodiversity assessment by providing more accurate estimates of species diversity and distributions than current methods, thereby enabling more accurate impact assessments and better informing managerial decisions.


2019 ◽  
Vol 7 (11) ◽  
pp. 493 ◽  
Author(s):  
Zhan ◽  
Li ◽  
Xu

Metabarcoding and high-throughput sequencing methods have greatly improved our understanding of protist diversity. Although the V4 region of small subunit ribosomal DNA (SSU-V4 rDNA) is the most widely used marker in DNA metabarcoding of eukaryotic microorganisms, doubts have recently been raised about its suitability. Here, using the widely distributed ciliate genus Pseudokeronopsis as an example, we assessed the potential of SSU-V4 rDNA and four other nuclear and mitochondrial markers for species delimitation and phylogenetic reconstruction. Our studies revealed that SSU-V4 rDNA is too conservative to distinguish species, and a threshold of 97% and 99% sequence similarity detected only one and three OTUs, respectively, from seven species. On the basis of the comparative analysis of the present and previously published data, we proposed the multilocus marker including the nuclear 5.8S rDNA combining the internal transcribed spacer regions (ITS1-5.8S-ITS2) and the hypervariable D2 region of large subunit rDNA (LSU-D2) as an ideal barcode rather than the mitochondrial cytochrome c oxidase subunit 1 gene, and the ITS1-5.8S-ITS2 as a candidate metabarcoding marker for ciliates. Furthermore, the compensating base change and tree-based criteria of ITS2 and LSU-D2 were useful in complementing the DNA barcoding and metabarcoding methods by giving second structure and phylogenetic evidence.


2014 ◽  
Vol 35 (2) ◽  
pp. 243-260 ◽  
Author(s):  
Sahar Khodami ◽  
Pedro Martinez Arbizu ◽  
Sabine Stöhr ◽  
Silke Laakmann

Abstract Brittle stars (Echinodermata: Ophiuroidea) comprise over 2, 000 species, all of which inhabit marine environments and can be abundant in the deep sea. Morphological plasticity in number and shape of skeletal parts, as well as variable colors, can complicate correct species identification. Consequently, DNA sequence analysis can play an important role in species identification. In this study we compared the genetic variability of the mitochondrial cytochrome c subunit I gene (COI) and the nuclear small subunit ribosomal DNA (SSU, 18S rDNA) to morphological identification of 66 specimens of 11 species collected from the North Atlantic in Icelandic waters. Also two species delimitation tools, Automatic Barcode Gap Discovery (ABGD) and General Mixed Yule Coalescence Method (GMYC) were performed to test species hypotheses. The analysis of both gene fragments was successful to discriminate between species and provided new insights into some morphological species hypothesis. Although less divergent than COI, it is helpful to use the SSU region as a complementary fragment to the barcoding gene.


2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2021 ◽  
Vol 4 ◽  
Author(s):  
Kálmán Tapolczai ◽  
François Keck ◽  
Valentin Vasselon ◽  
Géza Selmeczy ◽  
Maria Kahlert ◽  
...  

Diatom biomonitoring and ecological studies can greatly benefit from DNA metabarcoding compared to conventional microscopical analysis by potentially providing more reliable and accurate data in a cost- and time-efficient way. A conventional strategy for the bioinformatic treatment of sequencing data involves the clustering of quality filtered sequences into Operational Taxonomic Units (OTUs) based on a global sequence similarity, and their assignment to taxonomy using a reference library. Then, the obtained species lists of the successfully assigned taxa are used for subsequent analyses or quality index calculation. However, the high diversity of bioinformatic methods and parameters make inter-studies comparison difficult, especially because OTUs are specific to a given study. Clustering sequences into OTUs aims to reduce the biasing effect of sequencing artefacts and to reach an approximate species level delimitation at the price of potentially grouping together sequences with different ecology. A similar bias occurs when sequences that differ from each other by their ecological preference are assigned to the same taxa. The incompleteness of reference libraries can further introduce a bias by not taking into account unassigned sequences, thus losing the ecological information they possess. In order to overcome these biases, our studies tested new approaches on de novo developed diatom indices based on periphytic samples collected from streams in France and Hungary. Index development was performed with the leave-one-out cross validation (LOOCV) technique by building a model on a training dataset containing n-1 samples and testing it on the remaining test sample. Test values were correlated with a reference environmental gradient. The model was based on the calculation of optimum and tolerance of taxonomic units along the reference gradient and a modified Zelinka-Marvan diatom index equation. Taxonomic units tested in the studies were morphospecies, OTUs (95% similarity threshold), Individual Sequence Units (ISUs, via minimal bioinformatic quality filtering) and Exact Sequence Variants (ESVs, via DADA2 denoising algorithm). The “clustering-free” approach (ISU- and ESV-based indices) performed better than the OTU-based one, providing a fine taxonomic resolution where the ecological difference on genetically close sequence variants could be detected. Thus, these indices are more adapted to a standardized and comparable routine bioassessment. The “taxonomy-free” approach revealed the ecological preferences for those molecular taxonomic units (ISUs/ESVs) that otherwise either (i) would have been assigned to the same taxa due to genetic similarity, or (ii) would not have been recognized because of their absence from the reference libraries. However, we also found that taxonomic information cannot be neglected in ecological studies when the presence of organisms under particular environmental conditions is to be explained or interpreted e.g. via the traits they possess. New types of clustering methods are welcome in the future of biomonitoring where the delimitation of taxonomic units should be refined based on a higher emphasis on their ecology rather than on morphological or genetical criteria.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1692 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing studies is an important aspect in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on the discovery of novel diversity, we clustered an environmental marine high-throughput sequencing dataset of protist amplicons together with reference sequences from the taxonomically curated Protist Ribosomal Reference (PR2) database using threede novoapproaches: sequence similarity networks, USEARCH, and Swarm. The potentially novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and in the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as potentially novel by USEARCH and Swarm were more than 97% similar to references of PR2. Using shortest path analyses on sequence similarity network OTUs and Swarm OTUs we found additional novel diversity within OTUs that would have gone unnoticed without further exploiting their underlying network topologies. These results demonstrate that graph theory provides powerful tools for microbial ecology and the analysis of environmental high-throughput sequencing datasets. Furthermore, sequence similarity networks were most accurate in delineating novel diversity from previously discovered diversity.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ryo Matsuzaki ◽  
Shigekatsu Suzuki ◽  
Haruyo Yamaguchi ◽  
Masanobu Kawachi ◽  
Yu Kanesaki ◽  
...  

Abstract Background Pyrenoids are protein microcompartments composed mainly of Rubisco that are localized in the chloroplasts of many photosynthetic organisms. Pyrenoids contribute to the CO2-concentrating mechanism. This organelle has been lost many times during algal/plant evolution, including with the origin of land plants. The molecular basis of the evolutionary loss of pyrenoids is a major topic in evolutionary biology. Recently, it was hypothesized that pyrenoid formation is controlled by the hydrophobicity of the two helices on the surface of the Rubisco small subunit (RBCS), but the relationship between hydrophobicity and pyrenoid loss during the evolution of closely related algal/plant lineages has not been examined. Here, we focused on, the Reticulata group of the unicellular green algal genus Chloromonas, within which pyrenoids are present in some species, although they are absent in the closely related species. Results Based on de novo transcriptome analysis and Sanger sequencing of cloned reverse transcription-polymerase chain reaction products, rbcS sequences were determined from 11 strains of two pyrenoid-lacking and three pyrenoid-containing species of the Reticulata group. We found that the hydrophobicity of the RBCS helices was roughly correlated with the presence or absence of pyrenoids within the Reticulata group and that a decrease in the hydrophobicity of the RBCS helices may have primarily caused pyrenoid loss during the evolution of this group. Conclusions Although we suggest that the observed correlation may only exist for the Reticulata group, this is still an interesting study that provides novel insight into a potential mechanism determining initial evolutionary steps of gain and loss of the pyrenoid.


Toxins ◽  
2018 ◽  
Vol 10 (9) ◽  
pp. 359 ◽  
Author(s):  
Maria Romero-Gutiérrez ◽  
Carlos Santibáñez-López ◽  
Juana Jiménez-Vargas ◽  
Cesar Batista ◽  
Ernesto Ortiz ◽  
...  

To understand the diversity of scorpion venom, RNA from venomous glands from a sawfinger scorpion, Serradigitus gertschi, of the family Vaejovidae, was extracted and used for transcriptomic analysis. A total of 84,835 transcripts were assembled after Illumina sequencing. From those, 119 transcripts were annotated and found to putatively code for peptides or proteins that share sequence similarities with the previously reported venom components of other species. In accordance with sequence similarity, the transcripts were classified as potentially coding for 37 ion channel toxins; 17 host defense peptides; 28 enzymes, including phospholipases, hyaluronidases, metalloproteases, and serine proteases; nine protease inhibitor-like peptides; 10 peptides of the cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 protein superfamily; seven La1-like peptides; and 11 sequences classified as “other venom components”. A mass fingerprint performed by mass spectrometry identified 204 components with molecular masses varying from 444.26 Da to 12,432.80 Da, plus several higher molecular weight proteins whose precise masses were not determined. The LC-MS/MS analysis of a tryptic digestion of the soluble venom resulted in the de novo determination of 16,840 peptide sequences, 24 of which matched sequences predicted from the translated transcriptome. The database presented here increases our general knowledge of the biodiversity of venom components from neglected non-buthid scorpions.


mSystems ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Sean Ting-Shyang Wei ◽  
Yu-Wei Wu ◽  
Tzong-Huei Lee ◽  
Yi-Shiang Huang ◽  
Cheng-Yu Yang ◽  
...  

ABSTRACTThe 2,3-secopathway, the pathway for anaerobic cholesterol degradation, has been established in the denitrifying betaproteobacteriumSterolibacterium denitrificans. However, knowledge of how microorganisms respond to cholesterol at the community level is elusive. Here, we applied mesocosm incubation and 16S rRNA sequencing to reveal that, in denitrifying sludge communities, three betaproteobacterial operational taxonomic units (OTUs) with low (94% to 95%) 16S rRNA sequence similarity toStl. denitrificansare cholesterol degraders and members of the rare biosphere. Metatranscriptomic and metabolite analyses show that these degraders adopt the 2,3-secopathway to sequentially catalyze the side chain and sterane of cholesterol and that two molybdoenzymes—steroid C25 dehydrogenase and 1-testosterone dehydrogenase/hydratase—are crucial for these bioprocesses, respectively. The metatranscriptome further suggests that these betaproteobacterial degraders display chemotaxis and motility toward cholesterol and that FadL-like transporters may be the key components for substrate uptake. Also, these betaproteobacteria are capable of transporting micronutrients and synthesizing cofactors essential for cellular metabolism and cholesterol degradation; however, the required cobalamin is possibly provided by cobalamin-de novo-synthesizing gamma-, delta-, and betaproteobacteria via the salvage pathway. Overall, our results indicate that the ability to degrade cholesterol in sludge communities is reserved for certain rare biosphere members and that C25 dehydrogenase can serve as a biomarker for sterol degradation in anoxic environments.IMPORTANCESteroids are ubiquitous and abundant natural compounds that display recalcitrance. Biodegradation via sludge communities in wastewater treatment plants is the primary removal process for steroids. To date, compared to studies for aerobic steroid degradation, the knowledge of anaerobic degradation of steroids has been based on only a few model organisms. Due to the increase of anthropogenic impacts, steroid inputs may affect microbial diversity and functioning in ecosystems. Here, we first investigated microbial functional responses to cholesterol, the most abundant steroid in sludge, at the community level. Our metagenomic and metatranscriptomic analyses revealed that the capacities for cholesterol approach, uptake, and degradation are unique traits of certain low-abundance betaproteobacteria, indicating the importance of the rare biosphere in bioremediation. Apparent expression of genes involved in cofactorde novosynthesis and salvage pathways suggests that these micronutrients play important roles for cholesterol degradation in sludge communities.


Sign in / Sign up

Export Citation Format

Share Document