scholarly journals Diat.barcode: a DNA tool to decipher diatom communities for the evaluation environmental pressures

2021 ◽  
Vol 4 ◽  
Author(s):  
Frederic Rimet ◽  
Teofana Chonova ◽  
Gilles Gassiole ◽  
Maria Kahlert ◽  
François Keck ◽  
...  

Diatoms (Bacillariophyta) are ubiquitous microalgae, which present a huge taxonomic diversity, changing in correlation with differing environmental conditions. This makes them excellent ecological indicators for various ecosystems and ecological problematics (ecotoxicology, biomonitoring, paleo-environmental reconstruction …). Current standardized methodologies for diatoms are based on microscopic determinations, which is time consuming and prone to identification uncertainties. DNA metabarcoding has been proposed as a way to avoid these flaws, enabling the sequencing of a large quantity of barcodes from natural samples. A taxonomic identity is given to these barcodes by comparing their sequences to a barcoding reference library. However, to identify environmental sequences correctly, the reference database should contain a representative number of reference sequences to ensure a good coverage of diatom diversity. Moreover, the reference database needs to be carefully taxonomically curated by experts, as its content has an obvious impact on species detection. Diat.barcode is an open-access library for diatoms linking diatom taxonomic identities to rbcL barcode sequences (a chloroplast marker suitable for species-level identification of diatoms), which has been maintained since 2012. Data are accumulated from three sources: (1) the NCBI nucleotide database, (2) unpublished sequencing data of culture collections and more recently (3) environmental sequences. Since 2017, an international network of experts in diatom taxonomy curate this library. The last version of the database (version 9.2), includes 8066 entries that correspond to more than 280 different genera and 1490 different species. In addition to the taxonomic information, morphological features (e.g. biovolumes, chloroplasts, etc.), life-forms (mobility, colony-type) and ecological features (taxa preferences to pollution) are given. The database can be downloaded from the website (www6.inrae.fr/carrtel-collection/Barcoding-database/) or directly through the R package diatbarcode. Ready-to-use files for commonly used metabarcoding pipelines (Mothur and DADA2) are also available.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Frédéric Rimet ◽  
Evgenuy Gusev ◽  
Maria Kahlert ◽  
Martyn G. Kelly ◽  
Maxim Kulikovskiy ◽  
...  

Abstract Diatoms (Bacillariophyta) are ubiquitous microalgae which produce a siliceous exoskeleton and which make a major contribution to the productivity of oceans and freshwaters. They display a huge diversity, which makes them excellent ecological indicators of aquatic ecosystems. Usually, diatoms are identified using characteristics of their exoskeleton morphology. DNA-barcoding is an alternative to this and the use of High-Throughput-Sequencing enables the rapid analysis of many environmental samples at a lower cost than analyses under microscope. However, to identify environmental sequences correctly, an expertly curated reference library is needed. Several curated libraries for protists exists; none, however are dedicated to diatoms. Diat.barcode is an open-access library dedicated to diatoms which has been maintained since 2012. Data come from two sources (1) the NCBI nucleotide database and (2) unpublished sequencing data of culture collections. Since 2017, several experts have collaborated to curate this library for rbcL, a chloroplast marker suitable for species-level identification of diatoms. For the latest version of the database (version 7), 605 of the 3482 taxonomical names originally assigned by the authors of the rbcL sequences were modified after curation. The database is accessible at https://www6.inra.fr/carrtel-collection_eng/Barcoding-database.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Yixin Kong ◽  
Ariangela Kozik ◽  
Cindy H. Nakatsu ◽  
Yava L. Jones-Hall ◽  
Hyonho Chun

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.


2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.


2020 ◽  
Author(s):  
Maxim Ivanov ◽  
Albin Sandelin ◽  
Sebastian Marquardt

Abstract Background: The quality of gene annotation determines the interpretation of results obtained in transcriptomic studies. The growing number of genome sequence information calls for experimental and computational pipelines for de novo transcriptome annotation. Ideally, gene and transcript models should be called from a limited set of key experimental data. Results: We developed TranscriptomeReconstructoR, an R package which implements a pipeline for automated transcriptome annotation. It relies on integrating features from independent and complementary datasets: i) full-length RNA-seq for detection of splicing patterns and ii) high-throughput 5' and 3' tag sequencing data for accurate definition of gene borders. The pipeline can also take a nascent RNA-seq dataset to supplement the called gene model with transient transcripts.We reconstructed de novo the transcriptional landscape of wild type Arabidopsis thaliana seedlings as a proof-of-principle. A comparison to the existing transcriptome annotations revealed that our gene model is more accurate and comprehensive than the two most commonly used community gene models, TAIR10 and Araport11. In particular, we identify thousands of transient transcripts missing from the existing annotations. Our new annotation promises to improve the quality of A.thaliana genome research.Conclusions: Our proof-of-concept data suggest a cost-efficient strategy for rapid and accurate annotation of complex eukaryotic transcriptomes. We combine the choice of library preparation methods and sequencing platforms with the dedicated computational pipeline implemented in the TranscriptomeReconstructoR package. The pipeline only requires prior knowledge on the reference genomic DNA sequence, but not the transcriptome. The package seamlessly integrates with Bioconductor packages for downstream analysis.


2021 ◽  
Vol 4 ◽  
Author(s):  
Andreia Mortágua ◽  
Marco Teixeira ◽  
Manuela Sales ◽  
Maria Feio ◽  
Salomé Almeida

The European Water Framework Directive (2000/60/EC) includes biological assessment of water bodies that has been implemented for many years. Indicator organisms such as diatoms respond to geological and hydrological features of rivers by modifying their structure. Therefore, when implementing the WFD, it was necessary to establish type-specific reference conditions to be able to measure the deviations of sampled communities due to anthropogenic impact.HTS-related eDNA metabarcoding has been developed to complement or even replace traditional approaches for its rapid, low-cost and highly accurate identification of communities for assessment of rivers’ ecological status (e.g. Mortágua et al., 2019; Pérez-Burillo et al. 2020) and proved to provide even more in-depth information about biological elements. The use of this information without assignment to species is being addressed once it eliminates the limiting factor of the reference database incompleteness and may provide new ecological information (e.g. Feio et al., 2020; Rivera et al., 2020). Since WFD requires the establishment of reference conditions for each water body type, for eDNA methods’ implementation it will be essential to review, confirm or reformulate, and perhaps create new typologies. Hereupon, the aim of this study is to analyze diatom communities from different typologies of Portuguese rivers resulting from DNA metabarcoding data and compare it with current typology system. To do so, we will verify the consistency of biological groups included in each type, validate the molecular data, analyze the correspondence of OTU/ISU/ESV to environmental characteristics of rivers. A total of 154 sampling sites were selected from central Portugal and northern Portugal in 2017 and 2019. The biofilm was collected for morphological identification and DNA sequencing of diatoms. Reference sites were selected for 4 river types (mountain, littoral, small and medium-large northern rivers) based on a set of pressure information (water quality, hydromorphology, land use and riparian zones). Diatom inventories were obtained from molecular and morphological analysis. DNA sequences were treated using Mothur software which processed two bioinformatic strategies in order to obtain the final ISU and OTU tables, while ESVs were treated with DADA2 package from R. Identification and counting of diatom valves took place under the light microscope concerning the morphological approach. We expect results to validate the molecular data for each typology either when assigning to species or not, and to understand whether it is necessary to establish new typologies for future use of the molecular approach in ecological assessment of rivers. Directive, W. F. (2000). Water Framework Directive. Journal reference OJL, 327, 1-73. Feio, M. J., Serra, S. R., Mortágua, A., Bouchez, A., Rimet, F., Vasselon, V., & Almeida, S. F. P. (2020). A taxonomy-free approach based on machine learning to assess the quality of rivers with diatoms. Science of the Total Environment, 722, 137900. https://doi.org/10.1016/j.scitotenv.2020.137900 Mortágua, A., Vasselon, V., Oliveira, R., Elias, C., Chardon, C., Bouchez, A., ... & Almeida, S. F. P. (2019). Applicability of DNA metabarcoding approach in the bioassessment of Portuguese rivers using diatoms. Ecological indicators, 106, 105470. https://doi.org/10.1016/j.ecolind.2019.105470 Pérez-Burillo, J., Trobajo, R., Vasselon, V., Rimet, F., Bouchez, A., & Mann, D. G. (2020). Evaluation and sensitivity analysis of diatom DNA metabarcoding for WFD bioassessment of Mediterranean rivers. Science of the Total Environment, 727, 138445. https://doi.org/10.1016/j.scitotenv.2020.138445 Rivera, S. F., Vasselon, V., Bouchez, A., & Rimet, F. (2020). Diatom metabarcoding applied to large scale monitoring networks: Optimization of bioinformatics strategies using Mothur software. Ecological indicators, 109, 105775. https://doi.org/10.1016/j.ecolind.2019.105775


2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


Author(s):  
Yura Drach ◽  
Zvenysvala Mamchur

In the article, the bryophytes of the upper reaches of the Western Bug River, which is physically and geographically located within Male Polissya, partly Roztochia, and to a minor extent in the Gologoro-Voronyatsky denudo-structural hills, have been studied. Based on our survey, a list of the bryophytes has been compiled for the first time. Ecological features, substrate preferences and life forms of the bryophytes have been analysed. According to the ecological features, subheliophytes (30.9%) and hemisciophytes (30.9%) predominate in the spectrum of heliomorphs; mesophytes (29.7%), hygromesophytes (21.2%) and xeromesophytes – in the spectrum of hydromorphs (19.4%); cold-tolerant species (59.4%) – in the spectrum of thermomorphs. Based on the analysis of the substrate preferences of the bryophytes, the following groups were identified: epigeans (116 species), epixils (56 species), epiphytes (46 species), epiliths (43 species), aquatic (22 species). The prevailing life forms are turf (30.3%), rough mat (18.2%), weft (15.2%), tuft (10.3%) and smooth mat (9.7%). 3 species that are officially recognised as rare and 16 species that are recognized as regionally rare have been found. In the group of bryophytes associated with wetland ecosystems, 2 officially rare and 6 regionally rare species were found in the study area. Given the large areas of drained land in Lviv Region, these species are of particular value, especially in the context of conservation of the biodiversity and protection of the valuable natural areas in accordance with the Development Strategy of Lviv Region by 2027.


2019 ◽  
Author(s):  
Hsin-Nan Lin ◽  
Yaw-Ling Lin ◽  
Wen-Lian Hsu

ABSTRACTCharacterizing the taxonomic diversity of a microbial community is very important to understand the roles of microorganisms. Next generation sequencing (NGS) provides great potential for investigation of a microbial community and leads to Metagenomic studies. NGS generates DNA fragment sequences directly from microorganism samples, and it requires analysis tools to identify microbial species (or taxonomic composition) and estimate their relative abundance in the studied community. However, only a few tools could achieve strain-level identification and most tools estimate the microbial abundances simply according to the read counts. An evaluation study on metagenomic analysis tools concludes that the predicted abundance differed significantly from the true abundance. In this study, we present StrainPro, a novel metagenomic analysis tool which is highly accurate both at characterizing microorganisms at strain-level and estimating their relative abundances. A unique feature of StrainPro is it identifies representative sequence segments from reference genomes. We generate three simulated datasets using known strain sequences and another three simulated datasets using unknown strain sequences. We compare the performance of StrainPro with seven existing tools. The results show that StrainPro not only identifies metagenomes with high precision and recall, but it is also highly robust even when the metagenomes are not included in the reference database. Moreover, StrainPro estimates the relative abundance with high accuracy. We demonstrate that there is a strong positive linear relationship between observed and predicted abundances.


2021 ◽  
Author(s):  
Renato R. M. Oliveira ◽  
Raissa L S Silva ◽  
Gisele L. Nunes ◽  
Guilherme Oliveira

DNA metabarcoding is an emerging monitoring method capable of assessing biodiversity from environmental samples (eDNA). Advances in computational tools have been required due to the increase of Next-Generation Sequencing data. Tools for DNA metabarcoding analysis, such as MOTHUR, QIIME, Obitools, and mBRAVE have been widely used in ecological studies. However, some difficulties are encountered when there is a need to use custom databases. Here we present PIMBA, a PIpeline for MetaBarcoding Analysis, which allows the use of customized databases, as well as other reference databases used by the softwares mentioned here. PIMBA is an open-source and user-friendly pipeline that consolidates all analyses in just three command lines.


Sign in / Sign up

Export Citation Format

Share Document