LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis

Background: Amplicon sequencing is an established and cost-efficient method for profiling microbiomes. However, many available tools to process this data require both bioinformatics skills and high computational power to process big datasets. Furthermore, there are only few tools that allow for long read amplicon data analysis. To bridge this gap, we developed the LotuS2 (Less OTU Scripts 2) pipeline, enabling user-friendly, resource friendly, and versatile analysis of raw amplicon sequences. Results: In LotuS2, six different sequence clustering algorithms as well as extensive pre- and post-processing options allow for flexible data analysis by both experts, where parameters can be fully adjusted, and novices, where defaults are provided for different scenarios. We benchmarked three independent gut and soil datasets, where LotuS2 was on average 29 times faster compared to other pipelines - yet could better reproduce the alpha- and beta-diversity of technical replicate samples. Further benchmarking a mock community with known taxa composition showed that, compared to the other pipelines, LotuS2 recovered a higher fraction of correctly identified genera and species (98% and 57%, respectively). At ASV/OTU level, precision and F-score were highest for LotuS2, as was the fraction of correctly reconstructed 16S sequences. Conclusion: LotuS2 is a lightweight and user-friendly pipeline that is fast, precise and streamlined. High data usage rates and reliability enable high-throughput microbiome analysis in minutes. Availability: LotuS2 is available from GitHub, conda or via a Galaxy web interface, documented at http://lotus2.earlham.ac.uk/.

Download Full-text

NanoRTax, a real-time pipeline for taxonomic and diversity analysis of nanopore 16S rRNA amplicon sequencing data

10.21203/rs.3.rs-938802/v1 ◽

2021 ◽

Author(s):

Héctor Rodriguez-Perez ◽

Laura Ciuffreda ◽

Carlos Flores

Keyword(s):

16S Rrna ◽

Real Time ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Real Time Analysis ◽

16S Rrna Amplicon Sequencing ◽

Oxford Nanopore ◽

Long Read ◽

Cost Efficient ◽

User Friendly

Abstract The study of microbial communities and their applications have been leveraged by the advances in sequencing techniques and bioinformatics tools. The Oxford Nanopore Technologies long read sequencing by nanopores provides a portable and cost-efficient platform for sequencing assays opening the possibility of its application outside specialized environments and real-time analysis of data. To complement the existing efficient library preparation protocol with a streamlined analytic workflow, here we present NanoRTax, a nextflow pipeline for nanopore 16S rRNA amplicon data that features state-of-art taxonomic classification tools and real-time capability. The pipeline is paired with a web-based visual interface to enable user-friendly inspections of the experiment in progress.

Download Full-text

CoMA – an intuitive and user-friendly pipeline for amplicon-sequencing data analysis

PLoS ONE ◽

10.1371/journal.pone.0243241 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243241

Author(s):

Sebastian Hupfauf ◽

Mohammad Etemadi ◽

Marina Fernández-Delgado Juárez ◽

María Gómez-Brandón ◽

Heribert Insam ◽

...

Keyword(s):

Operating System ◽

Data Analysis ◽

Amplicon Sequencing ◽

Sequencing Data ◽

Taxonomic Assignment ◽

Benchmark Test ◽

Next Generation Sequencing Ngs ◽

User Friendly ◽

Ngs Data ◽

Mock Communities

In recent years, there has been a veritable boost in next-generation sequencing (NGS) of gene amplicons in biological and medical studies. Huge amounts of data are produced and need to be analyzed adequately. Various online and offline analysis tools are available; however, most of them require extensive expertise in computer science or bioinformatics, and often a Linux-based operating system. Here, we introduce “CoMA–Comparative Microbiome Analysis” as a free and intuitive analysis pipeline for amplicon-sequencing data, compatible with any common operating system. Moreover, the tool offers various useful services including data pre-processing, quality checking, clustering to operational taxonomic units (OTUs), taxonomic assignment, data post-processing, data visualization, and statistical appraisal. The workflow results in highly esthetic and publication-ready graphics, as well as output files in standardized formats (e.g. tab-delimited OTU-table, BIOM, NEWICK tree) that can be used for more sophisticated analyses. The CoMA output was validated by a benchmark test, using three mock communities with different sample characteristics (primer set, amplicon length, diversity). The performance was compared with that of Mothur, QIIME and QIIME2-DADA2, popular packages for NGS data analysis. Furthermore, the functionality of CoMA is demonstrated on a practical example, investigating microbial communities from three different soils (grassland, forest, swamp). All tools performed well in the benchmark test and were able to reveal the majority of all genera in the mock communities. Also for the soil samples, the results of CoMA were congruent to those of the other pipelines, in particular when looking at the key microbial players.

Download Full-text

dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology

10.1101/2020.05.17.095679 ◽

2020 ◽

Author(s):

Christina Weiβbecker ◽

Beatrix Schnabel ◽

Anna Heintz-Buschart

Keyword(s):

High Performance ◽

Expert Knowledge ◽

Amplicon Sequencing ◽

Marker Genes ◽

Sequencing Analysis ◽

Sequencing Data ◽

Hand Off ◽

Sequencing Platforms ◽

Computational Resources ◽

User Friendly

AbstractBackgroundAmplicon sequencing of phylogenetic marker genes, e.g. 16S, 18S or ITS rRNA sequences, is still the most commonly used method to determine the composition of microbial communities. Microbial ecologists often have expert knowledge on their biological question and data analysis in general, and most research institutes have computational infrastructures to employ the bioinformatics command line tools and workflows for amplicon sequencing analysis, but requirements of bioinformatics skills often limit the efficient and up-to-date use of computational resources.Resultsdadasnake wraps pre-processing of sequencing reads, delineation of exact sequence variants using the favorably benchmarked, widely-used the DADA2 algorithm, taxonomic classification and post-processing of the resultant tables, and hand-off in standard formats, into a user-friendly, one-command Snakemake pipeline. The suitability of the provided default configurations is demonstrated using mock-community data from bacteria and archaea, as well as fungi.ConclusionsBy use of Snakemake, dadasnake makes efficient use of high-performance computing infrastructures. Easy user configuration guarantees flexibility of all steps, including the processing of data from multiple sequencing platforms. dadasnake facilitates easy installation via conda environments. dadasnake is available at https://github.com/a-h-b/dadasnake.

Download Full-text

Justifying the data analytical choice in single case research in relation to the expected data pattern

10.31234/osf.io/2b9mu ◽

2019 ◽

Author(s):

Rumen Manolov

Keyword(s):

Data Analysis ◽

Visual Analysis ◽

Multilevel Models ◽

Single Case ◽

Analytical Techniques ◽

Real Data ◽

Small Scale ◽

Data User ◽

Single Case Research ◽

User Friendly

The lack of consensus regarding the most appropriate analytical techniques for single-case experimental designs data requires justifying the choice of any specific analytical option. The current text mentions some of the arguments, provided by methodologists and statisticians, in favor of several analytical techniques. Additionally, a small-scale literature review is performed in order to explore if and how applied researchers justify the analytical choices that they make. The review suggests that certain practices are not sufficiently explained. In order to improve the reporting regarding the data analytical decisions, it is proposed to choose and justify the data analytical approach prior to gathering the data. As a possible justification for data analysis plan, we propose using as a basis the expected the data pattern (specifically, the expectation about an improving baseline trend and about the immediate or progressive nature of the intervention effect). Although there are multiple alternatives for single-case data analysis, the current text focuses on visual analysis and multilevel models and illustrates an application of these analytical options with real data. User-friendly software is also developed.

Download Full-text

Enalos Suite of Tools: Enhancing Cheminformatics and Nanoinfor - matics through KNIME

Current Medicinal Chemistry ◽

10.2174/0929867327666200727114410 ◽

2020 ◽

Vol 27 (38) ◽

pp. 6523-6535 ◽

Cited By ~ 3

Author(s):

Antreas Afantitis ◽

Andreas Tsoumanis ◽

Georgia Melagraki

Keyword(s):

Data Analysis ◽

Virtual Screening ◽

In Silico ◽

Model Development ◽

In Silico Analysis ◽

Material Design ◽

Efficient Solutions ◽

Biological Data ◽

Cost Efficient ◽

Silico Analysis

Drug discovery as well as (nano)material design projects demand the in silico analysis of large datasets of compounds with their corresponding properties/activities, as well as the retrieval and virtual screening of more structures in an effort to identify new potent hits. This is a demanding procedure for which various tools must be combined with different input and output formats. To automate the data analysis required we have developed the necessary tools to facilitate a variety of important tasks to construct workflows that will simplify the handling, processing and modeling of cheminformatics data and will provide time and cost efficient solutions, reproducible and easier to maintain. We therefore develop and present a toolbox of >25 processing modules, Enalos+ nodes, that provide very useful operations within KNIME platform for users interested in the nanoinformatics and cheminformatics analysis of chemical and biological data. With a user-friendly interface, Enalos+ Nodes provide a broad range of important functionalities including data mining and retrieval from large available databases and tools for robust and predictive model development and validation. Enalos+ Nodes are available through KNIME as add-ins and offer valuable tools for extracting useful information and analyzing experimental and virtual screening results in a chem- or nano- informatics framework. On top of that, in an effort to: (i) allow big data analysis through Enalos+ KNIME nodes, (ii) accelerate time demanding computations performed within Enalos+ KNIME nodes and (iii) propose new time and cost efficient nodes integrated within Enalos+ toolbox we have investigated and verified the advantage of GPU calculations within the Enalos+ nodes. Demonstration data sets, tutorial and educational videos allow the user to easily apprehend the functions of the nodes that can be applied for in silico analysis of data.

Download Full-text

Retinitis pigmentosa is associated with shifts in the gut microbiome

Scientific Reports ◽

10.1038/s41598-021-86052-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Oksana Kutsyr ◽

Lucía Maestre-Carballa ◽

Mónica Lluesma-Gomez ◽

Manuel Martinez-Garcia ◽

Nicolás Cuenca ◽

...

Keyword(s):

Retinitis Pigmentosa ◽

Gut Microbiome ◽

Functional Decline ◽

Retinal Disease ◽

Photoreceptor Cells ◽

Amplicon Sequencing ◽

Rrna Gene ◽

Sequencing Data ◽

Alpha And Beta Diversity ◽

Potential Biomarker

AbstractThe gut microbiome is known to influence the pathogenesis and progression of neurodegenerative diseases. However, there has been relatively little focus upon the implications of the gut microbiome in retinal diseases such as retinitis pigmentosa (RP). Here, we investigated changes in gut microbiome composition linked to RP, by assessing both retinal degeneration and gut microbiome in the rd10 mouse model of RP as compared to control C57BL/6J mice. In rd10 mice, retinal responsiveness to flashlight stimuli and visual acuity were deteriorated with respect to observed in age-matched control mice. This functional decline in dystrophic animals was accompanied by photoreceptor loss, morphologic anomalies in photoreceptor cells and retinal reactive gliosis. Furthermore, 16S rRNA gene amplicon sequencing data showed a microbial gut dysbiosis with differences in alpha and beta diversity at the genera, species and amplicon sequence variants (ASV) levels between dystrophic and control mice. Remarkably, four fairly common ASV in healthy gut microbiome belonging to Rikenella spp., Muribaculaceace spp., Prevotellaceae UCG-001 spp., and Bacilli spp. were absent in the gut microbiome of retinal disease mice, while Bacteroides caecimuris was significantly enriched in mice with RP. The results indicate that retinal degenerative changes in RP are linked to relevant gut microbiome changes. The findings suggest that microbiome shifting could be considered as potential biomarker and therapeutic target for retinal degenerative diseases.

Download Full-text

Ultra-accurate microbial amplicon sequencing with synthetic long reads

Microbiome ◽

10.1186/s40168-021-01072-3 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Benjamin J. Callahan ◽

Dmitry Grinevich ◽

Siddhartha Thakur ◽

Michael A. Balamotis ◽

Tuval Ben Yehezkel

Keyword(s):

Microbial Community ◽

16S Rrna ◽

Amplicon Sequencing ◽

Species Level ◽

Full Length ◽

16S Rrna Genes ◽

Rrna Genes ◽

Strain Identification ◽

Long Reads ◽

Long Read

Abstract Background Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. Methods Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. Results LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. Conclusions The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics.

Download Full-text

Connecting structure to function with the recovery of over 1000 high-quality metagenome-assembled genomes from activated sludge using long-read sequencing

Nature Communications ◽

10.1038/s41467-021-22203-2 ◽

2021 ◽

Vol 12 (1) ◽

Cited By ~ 2

Author(s):

Caitlin M. Singleton ◽

Francesca Petriglieri ◽

Jannie M. Kristensen ◽

Rasmus H. Kirkegaard ◽

Thomas Y. Michaelsen ◽

...

Keyword(s):

16S Rrna ◽

Wastewater Treatment Plants ◽

In Situ Hybridisation ◽

Amplicon Sequencing ◽

Rrna Genes ◽

Fluorescence In Situ Hybridisation ◽

Sequencing Data ◽

High Quality ◽

16S Rrna Amplicon Sequencing ◽

Long Read

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.

Download Full-text

Identification of Agricultural Management Zones Through Clustering Algorithms with Thermal and Multispectral Satellite Imagery

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488517400062 ◽

2017 ◽

Vol 25 (Suppl. 1) ◽

pp. 121-140

Author(s):

R. B. Arango ◽

A. M. Campos ◽

E. F. Combarro ◽

E. R. Canas ◽

I. Díaz

Keyword(s):

Satellite Imagery ◽

Precision Agriculture ◽

Clustering Algorithms ◽

Economic Benefits ◽

Agricultural Management ◽

Landsat 8 ◽

Special Equipment ◽

Management Zones ◽

Ndvi Index ◽

Cost Efficient

Precision Agriculture entails the appropriate management of the inherent variability of soil and crops, resulting in an increase of economic benefits and a reduction of environmental impact. However, site-specific treatments require maps of the soil variability to identify areas of land that share similar properties. In order to produce these maps, we propose a cost-efficient method that combines clustering algorithms with publicly available satellite imagery. The method does not require exploring the parcels with any special equipment or taking samples of the soil for laboratory analysis. The proposed method was tested in a case study for three vineyard parcels with topographical dissimilarities. The study compares different spectral and thermal bands from the Landsat 8 satellite as well as vegetation and moisture indices to determine which one produces the best clustering. The experimental results seem promising for identification of agricultural management zones. The findings suggest that thermal bands produce better clustering than those based on the NDVI index.

Download Full-text

Microdiversity and phylogeographic diversification of bacterioplankton in pelagic freshwater systems revealed through long-read amplicon sequencing

Microbiome ◽

10.1186/s40168-020-00974-y ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yusuke Okazaki ◽

Shohei Fujinaga ◽

Michaela M. Salcher ◽

Cristiana Callieri ◽

Atsushi Tanaka ◽

...

Keyword(s):

16S Rrna ◽

Regional Scale ◽

Scale Up ◽

Amplicon Sequencing ◽

Freshwater Ecosystems ◽

16S Rrna Genes ◽

Rrna Genes ◽

Rrna Gene ◽

Metagenomic Sequencing ◽

Long Read

Abstract Background Freshwater ecosystems are inhabited by members of cosmopolitan bacterioplankton lineages despite the disconnected nature of these habitats. The lineages are delineated based on > 97% 16S rRNA gene sequence similarity, but their intra-lineage microdiversity and phylogeography, which are key to understanding the eco-evolutional processes behind their ubiquity, remain unresolved. Here, we applied long-read amplicon sequencing targeting nearly full-length 16S rRNA genes and the adjacent ribosomal internal transcribed spacer sequences to reveal the intra-lineage diversities of pelagic bacterioplankton assemblages in 11 deep freshwater lakes in Japan and Europe. Results Our single nucleotide-resolved analysis, which was validated using shotgun metagenomic sequencing, uncovered 7–101 amplicon sequence variants for each of the 11 predominant bacterial lineages and demonstrated sympatric, allopatric, and temporal microdiversities that could not be resolved through conventional approaches. Clusters of samples with similar intra-lineage population compositions were identified, which consistently supported genetic isolation between Japan and Europe. At a regional scale (up to hundreds of kilometers), dispersal between lakes was unlikely to be a limiting factor, and environmental factors or genetic drift were potential determinants of population composition. The extent of microdiversification varied among lineages, suggesting that highly diversified lineages (e.g., Iluma-A2 and acI-A1) achieve their ubiquity by containing a consortium of genotypes specific to each habitat, while less diversified lineages (e.g., CL500-11) may be ubiquitous due to a small number of widespread genotypes. The lowest extent of intra-lineage diversification was observed among the dominant hypolimnion-specific lineage (CL500-11), suggesting that their dispersal among lakes is not limited despite the hypolimnion being a more isolated habitat than the epilimnion. Conclusions Our novel approach complemented the limited resolution of short-read amplicon sequencing and limited sensitivity of the metagenome assembly-based approach, and highlighted the complex ecological processes underlying the ubiquity of freshwater bacterioplankton lineages. To fully exploit the performance of the method, its relatively low read throughput is the major bottleneck to be overcome in the future.

Download Full-text