scholarly journals Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results

2020 ◽  
Vol 11 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A. Besseling ◽  
Sergio Balzano ◽  
Judith D. L. van Bleijswijk ◽  
Harry J. Witte ◽  
...  

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.

2019 ◽  
Author(s):  
Alejandro Abdala Asbun ◽  
Marc A Besseling ◽  
Sergio Balzano ◽  
Judith van Bleijswijk ◽  
Harry Witte ◽  
...  

ABSTRACTMarker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene rather than the entire genome, the number of reads needed per sample is lower than that required for metagenome sequencing, making marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a flexible and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) and a representative sequence tree. Our pipeline allows customizing the analyses by offering several choices for most of the steps, for example different OTU generating methods. The pipeline can make use of multiple computing nodes and scales from personal computers to computing servers. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL and licensed under GNU GPLv3.


2015 ◽  
pp. 27-43 ◽  
Author(s):  
Rui Yamaguchi ◽  
Seiya Imoto ◽  
Satoru Miyano

2019 ◽  
Vol 24 (3) ◽  
pp. 213-223 ◽  
Author(s):  
Raimo Franke ◽  
Bettina Hinkelmann ◽  
Verena Fetz ◽  
Theresia Stradal ◽  
Florenz Sasse ◽  
...  

Mode of action (MoA) identification of bioactive compounds is very often a challenging and time-consuming task. We used a label-free kinetic profiling method based on an impedance readout to monitor the time-dependent cellular response profiles for the interaction of bioactive natural products and other small molecules with mammalian cells. Such approaches have been rarely used so far due to the lack of data mining tools to properly capture the characteristics of the impedance curves. We developed a data analysis pipeline for the xCELLigence Real-Time Cell Analysis detection platform to process the data, assess and score their reproducibility, and provide rank-based MoA predictions for a reference set of 60 bioactive compounds. The method can reveal additional, previously unknown targets, as exemplified by the identification of tubulin-destabilizing activities of the RNA synthesis inhibitor actinomycin D and the effects on DNA replication of vioprolide A. The data analysis pipeline is based on the statistical programming language R and is available to the scientific community through a GitHub repository.


2016 ◽  
Vol 7 ◽  
Author(s):  
Li Guo ◽  
Kelly S. Allen ◽  
Greg Deiulio ◽  
Yong Zhang ◽  
Angela M. Madeiras ◽  
...  

ChemInform ◽  
2003 ◽  
Vol 34 (21) ◽  
Author(s):  
Muenevver Koekueer ◽  
Fionn Murtagh ◽  
Norman D. McMillan ◽  
Sven Riedel ◽  
Brian O'Rourke ◽  
...  

2022 ◽  
Author(s):  
Andreas B Diendorfer ◽  
Kseniya.Khamina not provided ◽  
marianne.pultar not provided

miND is a NGS data analysis pipeline for smallRNA sequencing data. In this protocol, the pipeline is setup and run on an AWS EC2 instance with example data from a public repository. Please see the publication paper on F1000 for more details on the pipeline and how to use it.


2015 ◽  
Vol 31 (19) ◽  
pp. 3198-3206 ◽  
Author(s):  
Chalini D. Wijetunge ◽  
Isaam Saeed ◽  
Berin A. Boughton ◽  
Jeffrey M. Spraggins ◽  
Richard M. Caprioli ◽  
...  

2015 ◽  
Vol 10 ◽  
pp. BMI.S25132 ◽  
Author(s):  
Jun-ichi Satoh ◽  
Yoshihiro Kino ◽  
Shumpei Niida

Background Alzheimer's disease (AD) is the most common cause of dementia with no curative therapy currently available. Establishment of sensitive and non-invasive biomarkers that promote an early diagnosis of AD is crucial for the effective administration of disease-modifying drugs. MicroRNAs (miRNAs) mediate posttranscriptional repression of numerous target genes. Aberrant regulation of miRNA expression is implicated in AD pathogenesis, and circulating miRNAs serve as potential biomarkers for AD. However, data analysis of numerous AD-specific miRNAs derived from small RNA-sequencing (RNA-Seq) is most often laborious. Methods To identify circulating miRNA biomarkers for AD, we reanalyzed a publicly available small RNA-Seq dataset, composed of blood samples derived from 48 AD patients and 22 normal control (NC) subjects, by a simple web-based miRNA data analysis pipeline that combines omiRas and DIANA miRPath. Results By using omiRas, we identified 27 miRNAs expressed differentially between both groups, including upregulation in AD of miR-26b-3p, miR-28–3p, miR-30c-5p, miR-30d-5p, miR-148b-5p, miR-151a-3p, miR-186–5p, miR-425–5p, miR-550a-5p, miR-1468, miR-4781–3p, miR-5001–3p, and miR-6513–3p and downregulation in AD of let-7a-5p, let-7e-5p, let-7f-5p, let-7g-5p, miR-15a-5p, miR-17–3p, miR-29b-3p, miR-98–5p, miR-144–5p, miR-148a-3p, miR-502–3p, miR-660–5p, miR-1294, and miR-3200–3p. DIANA miRPath indicated that miRNA-regulated pathways potentially down– regulated in AD are linked with neuronal synaptic functions, while those upregulated in AD are implicated in cell survival and cellular communication. Conclusions The simple web-based miRNA data analysis pipeline helps us to effortlessly identify candidates for miRNA biomarkers and pathways of AD from the complex small RNA–Seq data.


Sign in / Sign up

Export Citation Format

Share Document