scholarly journals rANOMALY: AmplicoN wOrkflow for Microbial community AnaLYsis

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 7
Author(s):  
Sebastien Theil ◽  
Etienne Rifa

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.

2018 ◽  
Author(s):  
Maziyar Baran Pouyan ◽  
Dennis Kostka

AbstractMotivationGenome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore obtaining accurate cell–cell similarities from scRNA-seq data is critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.ResultsHere we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization, and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.Availability and ImplementationThe RAFSIL R package is available online at www.kostkalab.net/software.html


2020 ◽  
Author(s):  
Ellen S. Cameron ◽  
Philip J. Schmidt ◽  
Benjamin J.-M. Tremblay ◽  
Monica B. Emelko ◽  
Kirsten M. Müller

AbstractThe application of amplicon sequencing in water research provides a rapid and sensitive technique for microbial community analysis in a variety of environments ranging from freshwater lakes to water and wastewater treatment plants. It has revolutionized our ability to study DNA collected from environmental samples by eliminating the challenges associated with lab cultivation and taxonomic identification. DNA sequencing data consist of discrete counts of sequence reads, the total number of which is the library size. Samples may have different library sizes and thus, a normalization technique is required to meaningfully compare them. The process of randomly subsampling sequences to a selected normalized library size from the sample library—rarefying—is one such normalization technique. However, rarefying has been criticized as a normalization technique because data can be omitted through the exclusion of either excess sequences or entire samples, depending on the rarefied library size selected. Although it has been suggested that rarefying should be avoided altogether, we propose that repeatedly rarefying enables (i) characterization of the variation introduced to diversity analyses by this random subsampling and (ii) selection of smaller library sizes where necessary to incorporate all samples in the analysis. Rarefying may be a statistically valid normalization technique, but researchers should evaluate their data to make appropriate decisions regarding library size selection and subsampling type. The impact of normalized library size selection and rarefying with or without replacement in diversity analyses were evaluated herein.Highlights▪ Amplicon sequencing technology for environmental water samples is reviewed▪ Sequencing data must be normalized to allow comparison in diversity analyses▪ Rarefying normalizes library sizes by subsampling from observed sequences▪ Criticisms of data loss through rarefying can be resolved by rarefying repeatedly▪ Rarefying repeatedly characterizes errors introduced by subsampling sequences


Author(s):  
Chi Liu ◽  
Yaoming Cui ◽  
Xiangzhen Li ◽  
Minjie Yao

Abstract A large amount of sequencing data is produced in microbial community ecology studies using the high-throughput sequencing technique, especially amplicon-sequencing-based community data. After conducting the initial bioinformatic analysis of amplicon sequencing data, performing the subsequent statistics and data mining based on the operational taxonomic unit and taxonomic assignment tables is still complicated and time-consuming. To address this problem, we present an integrated R package-‘microeco’ as an analysis pipeline for treating microbial community and environmental data. This package was developed based on the R6 class system and combines a series of commonly used and advanced approaches in microbial community ecology research. The package includes classes for data preprocessing, taxa abundance plotting, venn diagram, alpha diversity analysis, beta diversity analysis, differential abundance test and indicator taxon analysis, environmental data analysis, null model analysis, network analysis and functional analysis. Each class is designed to provide a set of approaches that can be easily accessible to users. Compared with other R packages in the microbial ecology field, the microeco package is fast, flexible and modularized to use, and provides powerful and convenient tools for researchers. The microeco package can be installed from CRAN (The Comprehensive R Archive Network) or github (https://github.com/ChiLiubio/microeco).


Author(s):  
A. S. Glotov ◽  
P. Yu. Kozyulina ◽  
E. S. Vashukova ◽  
R. A. Illarionov ◽  
N. O. Yurkina ◽  
...  

Aim. To study changes in the level of piRNA in plasma and serum of pregnant women at different stages of gestation.Material and Methods. A total of 42 samples of plasma and blood serum were obtained from seven women with physiological singleton pregnancy without obstetric and gynecological pathology. The study was carried out at three time points corresponding to 8–13, 18–25, and 30–35 weeks of pregnancy, respectively. To assess the spectrum and levels of piRNA by the NGS method, whole genome sequencing of small RNAs was carried out. Sequencing data analysis was performed using the GeneGlobe Data Analysis Center web application. Differential expression was assessed using the DESeq2 R package.Results and Discussion. The piRNA contents among all small RNAs were 2.29%, 2.61%, and 4.16% in plasma and 7.29%, 7.02%, and 10.82% in serum during the first, second, and third trimesters, respectively. The contents of the following piRNAs increased in blood plasma from the first to the third trimester: piR 000765, piR 020326, piR 019825, piR 020497, piR 015026, piR 001312, and piR 017716. The study showed that the levels of piR 000765, piR 020326, piR 019825, piR 015026, piR 020497, piR 001312, piR 017716, and piR 004153 were significantly higher in serum compared with the corresponding values in plasma whereas the content of only one molecule, piR 018849, was higher in plasma.Conclusion. This pilot work created a basis for understanding the processes of piRNA expression in plasma and serum of pregnant women and can become the foundation for the search for biomarkers of various complications in pregnancy.


Author(s):  
Lauren V. Alteio ◽  
Joana Séneca ◽  
Alberto Canarini ◽  
Roey Angel ◽  
Ksenia Guseva ◽  
...  

Microbial community analysis via marker gene amplicon sequencing has become a routine method in the field of soil research. In this perspective, we discuss technical challenges and limitations of amplicon sequencing studies in soil and present statistical and experimental approaches that can help addressing the spatio-temporal complexity of soil and the high diversity of organisms therein. We illustrate the impact of compositionality on the interpretation of relative abundance data and discuss effects of sample replication on the statistical power in soil community analysis. Additionally, we argue for the need of increased study reproducibility and data availability, as well as complementary techniques for generating deeper ecological insights into microbial roles and our understanding thereof in soil ecosystems. At this stage, we call upon researchers and specialized soil journals to consider the current state of data analysis, interpretation and availability to improve the rigor of future studies.


2017 ◽  
Author(s):  
Mengbiao Guo ◽  
Jing Yang ◽  
Yu lung Lau ◽  
Wanling Yang

AbstractWhole exome and targeted sequencing have been playing a major role in diagnoses of Mendelian diseases, but analysis of these data involves using many complicated tools and comprehensive understanding of the analysis results is difficult.Here, we report RETA, an R package to provide a one-stop analysis of these data and a comprehensive, interactive and easy-to-understand report with many advanced visualization features. It facilitates clinicians and scientists alike to better analyze and interpret this type of sequencing data for disease diagnoses.Availability and implementationhttps://github.com/reta-s/reta/[email protected]


2021 ◽  
Author(s):  
Christian Vogeley ◽  
Thach Nguyen ◽  
Selina Woeste ◽  
Jean Krutmann ◽  
Thomas Haarmann-Stemmann ◽  
...  

Genome-wide analysis of transcriptomes offers extensive insights into the molecular mechanisms underlying the physiology of all known species and discover those that are still hidden. Oxford Nanopore Technologies (ONT) has recently been developed as a fast, miniaturized, portable and a cost effective alternative to Next Generation Sequencing. However, RNA-seq data analysis software that exploit ONT portability and allows scientists to easily analyze ONT data everywhere without bioinformatic expertise is not widely available. We developed DuesselporeTM, an easy-to-follow deep sequencing workflow that runs as a local webserver and allows the analysis of ONT data everywhere without requiring additional bioinformatic tools or internet connection. DuesselporeTM output includes differentially expressed genes and further downstream analyses, such as variance heatmap, disease and gene ontology plots, gene concept network plots and exports customized pathways for different cellular processes. We validated DuesselporeTM by analyzing the transcriptomic changes induced by PCB126, a dioxin-like PCB and a potent aryl hydrocarbon receptor (AhR) agonist in human HaCaT keratinocytes, a well characterized model system. DuesselporeTM was specifically developed to analyze ONT data but we also implemented NGS data analysis. DuesselporeTM is compatible with Microsoft and Mac operating systems, allows convenient, reliable and cost-effective analysis of ONT and NGS data.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Renmao Tian ◽  
Behzad Imanian

Abstract Background Amplicon sequencing of marker genes such as 16S rDNA have been widely used to survey and characterize microbial community. However, the complex data analyses have required many interfering manual steps often leading to inconsistencies in results. Results Here, we have developed a pipeline, amplicon sequence analysis pipeline 2 (ASAP 2), to automate and glide through the processes without the usual manual inspections and user’s interference, for instance, in the detection of barcode orientation, selection of high-quality region of reads, and determination of resampling depth and many more. The pipeline integrates all the analytical processes such as importing data, demultiplexing, summarizing read profiles, trimming quality, denoising, removing chimeric sequences and making the feature table among others. The pipeline accepts multiple file formats as input including multiplexed or demultiplexed, paired-end or single-end, barcode inside or outside and raw or intermediate data (e.g. feature table). The outputs include taxonomic classification, alpha/beta diversity, community composition, ordination analysis and statistical tests. ASAP 2 supports merging multiple sequencing runs which helps integrate and compare data from different sources (public databases and collaborators). Conclusions Our pipeline minimizes hands-on interference and runs amplicon sequence variant (ASV)-based amplicon sequencing analysis automatically and consistently. Our web server assists researchers that have no access to high performance computer (HPC) or have limited bioinformatics skills. The pipeline and web server can be accessed at https://github.com/tianrenmaogithub/asap2 and https://hts.iit.edu/asap2, respectively.


Sign in / Sign up

Export Citation Format

Share Document