SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis

Comparing the composition of microbial communities among groups of interest (e.g., patients vs healthy individuals) is a central aspect in microbiome research. It typically involves sequencing, data processing, statistical analysis and graphical representation of the detected signatures. Such an analysis is normally obtained by using a set of different applications that require specific expertise for installation, data processing and in some case, programming skills. Here, we present SHAMAN, an interactive web application we developed in order to facilitate the use of (i) a bioinformatic workflow for metataxonomic analysis, (ii) a reliable statistical modelling and (iii) to provide among the largest panels of interactive visualizations as compared to the other options that are currently available. SHAMAN is specifically designed for non-expert users who may benefit from using an integrated version of the different analytic steps underlying a proper metagenomic analysis. The application is freely accessible at http://shaman.pasteur.fr/, and may also work as a standalone application with a Docker container (aghozlane/shaman), conda and R. The source code is written in R and is available at https://github.com/aghozlane/shaman. Using two datasets (a mock community sequencing and published 16S metagenomic data), we illustrate the strengths of SHAMAN in quickly performing a complete metataxonomic analysis.

Download Full-text

SHAMAN: a user-friendly website for metataxonomic analysis from raw reads to statistical analysis

10.21203/rs.2.23213/v1 ◽

2020 ◽

Author(s):

Stevenn Volant ◽

Pierre Lechat ◽

Perrine Woringer ◽

Laurence Motreff ◽

Christophe Malabat ◽

...

Keyword(s):

Statistical Analysis ◽

Data Processing ◽

Web Application ◽

Graphical Representation ◽

Statistical Modelling ◽

Metagenomic Data ◽

Sequencing Data ◽

Microbiome Research ◽

Interactive Visualizations ◽

User Friendly

Abstract BackgroundComparing the composition of microbial communities among groups of interest (e.g., patients vs healthy individuals) is a central aspect in microbiome research. It typically involves sequencing, data processing, statistical analysis and graphical representation of the detected signatures. Such an analysis is normally obtained by using a set of different applications that require specific expertise for installation, data processing and in some case, programming skills. ResultsHere, we present SHAMAN, an interactive web application we developed in order to facilitate the use of (i) a bioinformatic workflow for metataxonomic analysis, (ii) a reliable statistical modelling and (iii) to provide among the largest panels of interactive visualizations as compared to the other options that are currently available. SHAMAN is specifically designed for non-expert users who may benefit from using an integrated version of the different analytic steps underlying a proper metagenomic analysis. The application is freely accessible at http://shaman.pasteur.fr/, and may also work as a standalone application with a Docker container (aghozlane/shaman), conda and R. The source code is written in R and is available at https://github.com/aghozlane/shaman. Using two datasets (a mock community sequencing and published 16S rRNA metagenomic data), we illustrate the strengths of SHAMAN in quickly performing a complete metataxonomic analysis. ConclusionsWe aim with SHAMAN to provide the scientific community with a platform that simplifies reproducible quantitative analysis of metagenomic data.

Download Full-text

re-Searcher: GUI-based bioinformatics tool for simplified genomics data mining of VCF files

PeerJ ◽

10.7717/peerj.11333 ◽

2021 ◽

Vol 9 ◽

pp. e11333

Author(s):

Daniyar Karabayev ◽

Askhat Molkenov ◽

Kaiyrgali Yerulanuly ◽

Ilyas Kabimoldayev ◽

Asset Daniyarov ◽

...

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Data Types ◽

Standard Format ◽

Standard Data ◽

Additional Information ◽

Link Type ◽

Sequencing Platforms ◽

User Friendly

Background High-throughput sequencing platforms generate a massive amount of high-dimensional genomic datasets that are available for analysis. Modern and user-friendly bioinformatics tools for analysis and interpretation of genomics data becomes essential during the analysis of sequencing data. Different standard data types and file formats have been developed to store and analyze sequence and genomics data. Variant Call Format (VCF) is the most widespread genomics file type and standard format containing genomic information and variants of sequenced samples. Results Existing tools for processing VCF files don’t usually have an intuitive graphical interface, but instead have just a command-line interface that may be challenging to use for the broader biomedical community interested in genomics data analysis. re-Searcher solves this problem by pre-processing VCF files by chunks to not load RAM of computer. The tool can be used as standalone user-friendly multiplatform GUI application as well as web application (https://nla-lbsb.nu.edu.kz). The software including source code as well as tested VCF files and additional information are publicly available on the GitHub repository (https://github.com/LabBandSB/re-Searcher).

Download Full-text

PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens

10.1101/147462 ◽

2017 ◽

Author(s):

Philipp N. Spahn ◽

Tyler Bath ◽

Ryan J. Weiss ◽

Jihoon Kim ◽

Jeffrey D. Esko ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Sequencing Data ◽

Bioinformatic Tools ◽

Link Type ◽

Screening Experiments ◽

Independent Analysis ◽

Wide Range ◽

Set Up ◽

Sequence Quality

AbstractBackgroundLarge-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise.ResultsTo make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots.ConclusionsPinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions, documentation and test datasets. The source code is available at https://github.com/LewisLabUCSD/PinAPL-Py

Download Full-text

An Extensive Meta-Metagenomic Search Identifies SARS-CoV-2-Homologous Sequences in Pangolin Lung Viromes

mSphere ◽

10.1128/msphere.00160-20 ◽

2020 ◽

Vol 5 (3) ◽

Cited By ~ 9

Author(s):

Lamia Wahba ◽

Nimit Jain ◽

Andrew Z. Fire ◽

Massa J. Shoura ◽

Karen L. Artiles ◽

...

Keyword(s):

Nucleic Acid ◽

High Speed ◽

High Throughput Sequencing ◽

Biological Significance ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Link Type ◽

Recent Emergence

ABSTRACT In numerous instances, tracking the biological significance of a nucleic acid sequence can be augmented through the identification of environmental niches in which the sequence of interest is present. Many metagenomic data sets are now available, with deep sequencing of samples from diverse biological niches. While any individual metagenomic data set can be readily queried using web-based tools, meta-searches through all such data sets are less accessible. In this brief communication, we demonstrate such a meta-metagenomic approach, examining close matches to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in all high-throughput sequencing data sets in the NCBI Sequence Read Archive accessible with the “virome” keyword. In addition to the homology to bat coronaviruses observed in descriptions of the SARS-CoV-2 sequence (F. Wu, S. Zhao, B. Yu, Y. M. Chen, et al., Nature 579:265–269, 2020, https://doi.org/10.1038/s41586-020-2008-3; P. Zhou, X. L. Yang, X. G. Wang, B. Hu, et al., Nature 579:270–273, 2020, https://doi.org/10.1038/s41586-020-2012-7), we note a strong homology to numerous sequence reads in metavirome data sets generated from the lungs of deceased pangolins reported by Liu et al. (P. Liu, W. Chen, and J. P. Chen, Viruses 11:979, 2019, https://doi.org/10.3390/v11110979). While analysis of these reads indicates the presence of a similar viral sequence in pangolin lung, the similarity is not sufficient to either confirm or rule out a role for pangolins as an intermediate host in the recent emergence of SARS-CoV-2. In addition to the implications for SARS-CoV-2 emergence, this study illustrates the utility and limitations of meta-metagenomic search tools in effective and rapid characterization of potentially significant nucleic acid sequences. IMPORTANCE Meta-metagenomic searches allow for high-speed, low-cost identification of potentially significant biological niches for sequences of interest.

Download Full-text

Iroki: automatic customization and visualization of phylogenetic trees

10.1101/106138 ◽

2017 ◽

Cited By ~ 7

Author(s):

Ryan M. Moore ◽

Amelia O. Harrison ◽

Sean M. McAllister ◽

Shawn W. Polson ◽

K. Eric Wommack

Keyword(s):

Web Application ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Phylogenetic Analyses ◽

Community Diversity ◽

Sequencing Data ◽

Link Type ◽

Large Trees ◽

History Of ◽

Microbial Groups

ABSTRACTPhylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal (http://virome.dbi.udel.edu). Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.

Download Full-text

CRISPRAnalyzeR: Interactive analysis, annotation and documentation of pooled CRISPR screens

10.1101/109967 ◽

2017 ◽

Cited By ~ 12

Author(s):

Jan Winter ◽

Marc Schwering ◽

Oliver Pelz ◽

Benedikt Rauscher ◽

Tianzuo Zhan ◽

...

Keyword(s):

Web Application ◽

Systematic Investigation ◽

Cellular Processes ◽

Interactive Analysis ◽

Link Type ◽

External Data ◽

Versatile Tool ◽

Interactive Visualizations ◽

Meta Information ◽

Data Tables

AbstractPooled CRISPR/Cas9 screens are a powerful and versatile tool for the systematic investigation of cellular processes in a variety of organisms. Such screens generate large amounts of data that present a new challenge to analyze and interpret. Here, we developed a web application to analyze, document and explore pooled CRISR/Cas9 screens using a unified single workflow. The end-to-end analysis pipeline features eight different hit calling strategies based on state-of-the-art methods, including DESeq2, MAGeCK, edgeR, sgRSEA, Z-Ratio, Mann-Whitney test, ScreenBEAM and BAGEL. Results can be compared with interactive visualizations and data tables. CRISPRAnalyzeR integrates meta-information from 26 external data resources, providing a wide array of options for the annotation and documentation of screens. The application was developed with user experience in mind, requiring no previous knowledge in bioinformatics. All modern operating systems are supported.Availability and online documentation: The source code, a pre-configured docker application, sample data and a documentation can be found on our GitHub page (http://www.github.com/boutroslab/CRISPRAnalyzeR). A tutorial video can be found at http://www.crispr-analyzer.org.

Download Full-text

Iroki: automatic customization and visualization of phylogenetic trees

PeerJ ◽

10.7717/peerj.8584 ◽

2020 ◽

Vol 8 ◽

pp. e8584 ◽

Cited By ~ 5

Author(s):

Ryan M. Moore ◽

Amelia O. Harrison ◽

Sean M. McAllister ◽

Shawn W. Polson ◽

K. Eric Wommack

Keyword(s):

Web Application ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Phylogenetic Analyses ◽

Community Diversity ◽

Sequencing Data ◽

Link Type ◽

Large Trees ◽

History Of ◽

Microbial Groups

Phylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal http://virome.dbi.udel.edu. Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.

Download Full-text

myCircos: Facilitating the Creation and Use of Circos Plots Online

10.1101/052605 ◽

2016 ◽

Author(s):

Caroline Labelle ◽

Geneviève Boucher ◽

Sébastien Lemieux

Keyword(s):

User Interface ◽

Web Application ◽

Graphical Representation ◽

Source Code ◽

Command Line ◽

Genomic Information ◽

Link Type ◽

Intuitive User Interface ◽

The Creation

AbstractCircos plots were designed to display large amounts of processed genomic information on a single graphical representation. The creation of such plots remains challenging for less technical users as the leading tool requires command-line proficiency. Here, we introduce myCircos, a web application that facilitates the generation of Circos plots by providing an intuitive user interface, adding interactive functionalities to the representation and providing persistence of previous requests. myCircos is available at: http://mycircos.iric.ca. Non registered users can explore the application through the Guest user. Source code (for local server installation) is available upon request.

Download Full-text

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

10.1101/230474 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kemal Eren ◽

Steven Weaver ◽

Robert Ketteringham ◽

Morné Valentyn ◽

Melissa Laird Smith ◽

...

Keyword(s):

Web Application ◽

Evolutionary Dynamics ◽

Full Length ◽

Viral Population ◽

Sequencing Data ◽

Link Type ◽

Long Read ◽

Web App ◽

Client Side ◽

Hiv 1

AbstractNext generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV-1 env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data.FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN /dS) across time and across protein structure, and a phylogenetic tree browser.We demonstrate how FLEA may be used to process Pacific Biosciences HIV-1 env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV-1 env populations.A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.

Download Full-text

Aneuvis: Web-based exploration of numerical chromosomal variation in single cells

10.1101/459735 ◽

2018 ◽

Author(s):

Daniel G Piqué ◽

Grasiella A Andriani ◽

Elaine Maggi ◽

Samuel E Zimmerman ◽

John M Greally ◽

...

Keyword(s):

Web Application ◽

Single Cells ◽

Treatment Resistance ◽

Experimental Treatment ◽

Whole Genome Sequencing Data ◽

Chromosomal Variation ◽

Sequencing Data ◽

Link Type ◽

Molecular Features ◽

Chromosomal Variability

AbstractMotivationAberrations in chromosomal copy number are one of the most common molecular features observed in cancer. Quantifying the degree of numerical chromosomal variation in single cells across a population of cells is of interest to researchers studying whole chromosomal instability (W-CIN). W-CIN, a state of high numerical chromosomal variation, contributes to treatment resistance in cancer.ResultsHere, we introduce aneuvis, a web application that allows users to determine whether numerical chromosomal variation exists between experimental treatment groups. The web interface allows users to upload molecular cytogenetic or processed whole-genome sequencing data in a cell-by-chromosome matrix format and automatically generates visualizations and summary statistics that reflect the degree of numeric chromosomal variability. Aneuvis is the first user-friendly web application to help researchers identify the genetic and environmental perturbations that promote numerical chromosomal variation.Availability and ImplementationAneuvis is freely available as a web application at https://dpique.shinyapps.io/aneuvis/. Website implemented using Shiny version 1.0.5 with all major browsers supported. All source code for the application is available at https://github.com/dpique/aneuvis.

Download Full-text