DOSCHEDA: a web application for interactive chemoproteomics data analysis

PeerJ Computer Science ◽

10.7717/peerj-cs.129 ◽

2017 ◽

Vol 3 ◽

pp. e129 ◽

Cited By ~ 2

Author(s):

Bruno Contrino ◽

Eric Miele ◽

Ronald Tomlinson ◽

M. Paola Castaldi ◽

Piero Ricchiuto

Keyword(s):

Data Analysis ◽

Protein Interactions ◽

Web Application ◽

Main Tool ◽

Bioinformatics Analyses ◽

User Input ◽

R Language ◽

Link Type ◽

Input Dataset ◽

Programming Knowledge

Background Mass Spectrometry (MS) based chemoproteomics has recently become a main tool to identify and quantify cellular target protein interactions with ligands/drugs in drug discovery. The complexity associated with these new types of data requires scientists with a limited computational background to perform systematic data quality controls as well as to visualize the results derived from the analysis to enable rapid decision making. To date, there are no readily accessible platforms specifically designed for chemoproteomics data analysis. Results We developed a Shiny-based web application named DOSCHEDA (Down Stream Chemoproteomics Data Analysis) to assess the quality of chemoproteomics experiments, to filter peptide intensities based on linear correlations between replicates, and to perform statistical analysis based on the experimental design. In order to increase its accessibility, DOSCHEDA is designed to be used with minimal user input and it does not require programming knowledge. Typical inputs can be protein fold changes or peptide intensities obtained from Proteome Discover, MaxQuant or other similar software. DOSCHEDA aggregates results from bioinformatics analyses performed on the input dataset into a dynamic interface, it encompasses interactive graphics and enables customized output reports. Conclusions DOSCHEDA is implemented entirely in R language. It can be launched by any system with R installed, including Windows, Mac OS and Linux distributions. DOSCHEDA is hosted on a shiny-server at https://doscheda.shinyapps.io/doscheda and is also available as a Bioconductor package (http://www.bioconductor.org/).

Download Full-text

Pavian: Interactive analysis of metagenomics data for microbiomics and pathogen identification

10.1101/084715 ◽

2016 ◽

Cited By ~ 25

Author(s):

Florian P. Breitwieser ◽

Steven L. Salzberg

Keyword(s):

Web Application ◽

Disease Diagnosis ◽

Supplementary Information ◽

Special Focus ◽

Web Browser ◽

R Language ◽

Interactive Analysis ◽

Link Type ◽

Metagenomics Data ◽

Flow Diagrams

AbstractSummaryPavian is a web application for exploring metagenomics classification results, with a special focus on infectious disease diagnosis. Pinpointing pathogens in metagenomics classification results is often complicated by host and laboratory contaminants as well as many non-pathogenic microbiota. With Pavian, researchers can analyze, display and transform results from the Kraken and Centrifuge classifiers using interactive tables, heatmaps and flow diagrams. Pavian also provides an alignment viewer for validation of matches to a particular genome.Availability and implementationPavian is implemented in the R language and based on the Shiny framework. It can be hosted on Windows, Mac OS X and Linux systems, and used with any contemporary web browser. It is freely available under a GPL-3 license from http://github.com/fbreitwieser/pavian. Furthermore a Docker image is provided at https://hub.docker.com/r/florianbw/[email protected] informationSupplementary data is available at Bioinformatics online.

Download Full-text

Annot: a Django-based sample, reagent, and experiment metadata tracking system

BMC Bioinformatics ◽

10.1186/s12859-019-3147-0 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Elmar Bucher ◽

Cheryl J. Claunch ◽

Derrick Hee ◽

Rebecca L. Smith ◽

Kaylyn Devlin ◽

...

Keyword(s):

Data Analysis ◽

Web Application ◽

Large Scale ◽

Tracking System ◽

Controlled Vocabulary ◽

File Format ◽

Robust Solution ◽

Microarray Study ◽

Link Type ◽

Experimental Protocols

Abstract Background In biological experiments, comprehensive experimental metadata tracking – which comprises experiment, reagent, and protocol annotation with controlled vocabulary from established ontologies – remains a challenge, especially when the experiment involves multiple laboratory scientists who execute different steps of the protocol. Here we describe Annot, a novel web application designed to provide a flexible solution for this task. Results Annot enforces the use of controlled vocabulary for sample and reagent annotation while enabling robust investigation, study, and protocol tracking. The cornerstone of Annot’s implementation is a json syntax-compatible file format, which can capture detailed metadata for all aspects of complex biological experiments. Data stored in this json file format can easily be ported into spreadsheet or data frame files that can be loaded into R (https://www.r-project.org/) or Pandas, Python’s data analysis library (https://pandas.pydata.org/). Annot is implemented in Python3 and utilizes the Django web framework, Postgresql, Nginx, and Debian. It is deployed via Docker and supports all major browsers. Conclusions Annot offers a robust solution to annotate samples, reagents, and experimental protocols for established assays where multiple laboratory scientists are involved. Further, it provides a framework to store and retrieve metadata for data analysis and integration, and therefore ensures that data generated in different experiments can be integrated and jointly analyzed. This type of solution to metadata tracking can enhance the utility of large-scale datasets, which we demonstrate here with a large-scale microenvironment microarray study.

Download Full-text

ProfPPIdb: pairs of physical protein-protein interactions predicted for entire proteomes

10.1101/332510 ◽

2018 ◽

Author(s):

Linh Tran ◽

Tobias Hamp ◽

Burkhard Rost

Keyword(s):

Experimental Data ◽

Protein Interactions ◽

Web Application ◽

Domain Knowledge ◽

Prediction Method ◽

Model Organisms ◽

Evolutionary Information ◽

Support Vector ◽

Protein Protein Interactions ◽

Link Type

AbstractMotivationProtein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods.ResultsWe extracted reliable experimental data about which proteins interact (binary) for eight diverse model organisms from public databases, namely from Escherichia coli, Schizosaccharomyces pombe, Plasmodium falciparum, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, and for the previously used Homo sapiens and Saccharomyces cerevisiae. Those data were the base to develop a PPI prediction method for each model organism. The method used evolutionary information through a profile-kernel Support Vector Machine (SVM). With the resulting eight models, we predicted all possible protein pairs in each organism and made the top predictions available through a web application. Almost all of the PPIs made available were predicted between proteins that have not been observed in any interaction, in particular for less well-studied organisms. Thus, our work complements existing resources and is particularly helpful for designing experiments because of its uniqueness. Experimental annotations and computational predictions are strongly influenced by the fact that some proteins have many partners and others few. To optimize machine learning, recent methods explicitly ignored such a network-structure and rely either on domain knowledge or sequence-only methods. Our approach is independent of domain-knowledge and leverages evolutionary information. The database interface representing our results is accessible from https://rostlab.org/services/ppipair/. The data can also be downloaded from https://figshare.com/collections/ProfPPI-DB/4141784.

Download Full-text

AB0210 ACREULAR: AN R PACKAGE FOR THE CALCULATION AND VISUALISATION OF ACR/EULAR RELATED RHEUMATOID ARTHRITIS MEASURES

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.2326 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1405.1-1406

Author(s):

F. Morton ◽

J. Nijjar ◽

C. Goodyear ◽

D. Porter

Keyword(s):

Rheumatoid Arthritis ◽

Functional Status ◽

Rheumatic Diseases ◽

Web Application ◽

R Package ◽

Diagnostic Classification ◽

Microsoft Excel ◽

Link Type ◽

Large Joint ◽

Programming Skills

Background:The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) individually and collaboratively have produced/recommended diagnostic classification, response and functional status criteria for a range of different rheumatic diseases. While there are a number of different resources available for performing these calculations individually, currently there are no tools available that we are aware of to easily calculate these values for whole patient cohorts.Objectives:To develop a new software tool, which will enable both data analysts and also researchers and clinicians without programming skills to calculate ACR/EULAR related measures for a number of different rheumatic diseases.Methods:Criteria that had been developed by ACR and/or EULAR that had been approved for the diagnostic classification, measurement of treatment response and functional status in patients with rheumatoid arthritis were identified. Methods were created using the R programming language to allow the calculation of these criteria, which were incorporated into an R package. Additionally, an R/Shiny web application was developed to enable the calculations to be performed via a web browser using data presented as CSV or Microsoft Excel files.Results:acreular is a freely available, open source R package (downloadable fromhttps://github.com/fragla/acreular) that facilitates the calculation of ACR/EULAR related RA measures for whole patient cohorts. Measures, such as the ACR/EULAR (2010) RA classification criteria, can be determined using precalculated values for each component (small/large joint counts, duration in days, normal/abnormal acute-phase reactants, negative/low/high serology classification) or by providing “raw” data (small/large joint counts, onset/assessment dates, ESR/CRP and CCP/RF laboratory values). Other measures, including EULAR response and ACR20/50/70 response, can also be calculated by providing the required information. The accompanying web application is included as part of the R package but is also externally hosted athttps://fragla.shinyapps.io/shiny-acreular. This enables researchers and clinicians without any programming skills to easily calculate these measures by uploading either a Microsoft Excel or CSV file containing their data. Furthermore, the web application allows the incorporation of additional study covariates, enabling the automatic calculation of multigroup comparative statistics and the visualisation of the data through a number of different plots, both of which can be downloaded.Figure 1.The Data tab following the upload of data. Criteria are calculated by the selecting the appropriate checkbox.Figure 2.A density plot of DAS28 scores grouped by ACR/EULAR 2010 RA classification. Statistical analysis has been performed and shows a significant difference in DAS28 score between the two groups.Conclusion:The acreular R package facilitates the easy calculation of ACR/EULAR RA related disease measures for whole patient cohorts. Calculations can be performed either from within R or by using the accompanying web application, which also enables the graphical visualisation of data and the calculation of comparative statistics. We plan to further develop the package by adding additional RA related criteria and by adding ACR/EULAR related measures for other rheumatic disorders.Disclosure of Interests:Fraser Morton: None declared, Jagtar Nijjar Shareholder of: GlaxoSmithKline plc, Consultant of: Janssen Pharmaceuticals UK, Employee of: GlaxoSmithKline plc, Paid instructor for: Janssen Pharmaceuticals UK, Speakers bureau: Janssen Pharmaceuticals UK, AbbVie, Carl Goodyear: None declared, Duncan Porter: None declared

Download Full-text

Reproducible and accessible analysis of transposon insertion sequencing in Galaxy for qualitative essentiality analyses

BMC Microbiology ◽

10.1186/s12866-021-02184-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Delphine Larivière ◽

Laura Wickham ◽

Kenneth Keiler ◽

Anton Nekrutenko ◽

Keyword(s):

Data Analysis ◽

Promoter Sequence ◽

Entire Genome ◽

Link Type ◽

Transposon Insertion ◽

Control Procedures ◽

Reproducible Analysis ◽

Using Data ◽

Transposon Insertion Sequencing ◽

The Impact

Abstract Background Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis. Results Here we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination. Conclusions Our work provides an assessment of the currently available tools for TIS data analysis. It offers ready to use workflows that can be invoked by anyone in the world using our public Galaxy platform (https://usegalaxy.org). To lower the entry barriers, we have also developed interactive tutorials explaining details of TIS data analysis procedures at https://bit.ly/gxy-tis.

Download Full-text

RiboA: a web application to identify ribosome A-site locations in ribosome profiling data

BMC Bioinformatics ◽

10.1186/s12859-021-04068-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Danying Shao ◽

Nabeel Ahmed ◽

Nishant Soni ◽

Edward P. O’Brien

Keyword(s):

Integer Programming ◽

Web Application ◽

Stop Codon ◽

Ribosome Profiling ◽

Programming Method ◽

Analysis Tool ◽

Site Location ◽

Link Type ◽

Wide Range ◽

A Site

Abstract Background Translation is a fundamental process in gene expression. Ribosome profiling is a method that enables the study of transcriptome-wide translation. A fundamental, technical challenge in analyzing Ribo-Seq data is identifying the A-site location on ribosome-protected mRNA fragments. Identification of the A-site is essential as it is at this location on the ribosome where a codon is translated into an amino acid. Incorrect assignment of a read to the A-site can lead to lower signal-to-noise ratio and loss of correlations necessary to understand the molecular factors influencing translation. Therefore, an easy-to-use and accurate analysis tool is needed to accurately identify the A-site locations. Results We present RiboA, a web application that identifies the most accurate A-site location on a ribosome-protected mRNA fragment and generates the A-site read density profiles. It uses an Integer Programming method that reflects the biological fact that the A-site of actively translating ribosomes is generally located between the second codon and stop codon of a transcript, and utilizes a wide range of mRNA fragment sizes in and around the coding sequence (CDS). The web application is containerized with Docker, and it can be easily ported across platforms. Conclusions The Integer Programming method that RiboA utilizes is the most accurate in identifying the A-site on Ribo-Seq mRNA fragments compared to other methods. RiboA makes it easier for the community to use this method via a user-friendly and portable web application. In addition, RiboA supports reproducible analyses by tracking all the input datasets and parameters, and it provides enhanced visualization to facilitate scientific exploration. RiboA is available as a web service at https://a-site.vmhost.psu.edu/. The code is publicly available at https://github.com/obrien-lab/aip_web_docker under the MIT license.

Download Full-text

LandScape: a web application for interactive genomic summary visualization

10.1101/866087 ◽

2019 ◽

Author(s):

Wenlong Jia ◽

Hechen Li ◽

Shiying Li ◽

Shuaicheng Li

Keyword(s):

Genetic Information ◽

Web Application ◽

Genomic Research ◽

File Format ◽

Data Types ◽

Web Based ◽

Link Type ◽

Level Data ◽

Real Time Visualization ◽

Information Landscape

ABSTRACTSummaryVisualizing integrated-level data from genomic research remains a challenge, as it requires sufficient coding skills and experience. Here, we present LandScapeoviz, a web-based application for interactive and real-time visualization of summarized genetic information. LandScape utilizes a well-designed file format that is capable of handling various data types, and offers a series of built-in functions to customize the appearance, explore results, and export high-quality diagrams that are available for publication.Availability and implementationLandScape is deployed at bio.oviz.org/demo-project/analyses/landscape for online use. Documentation and demo data are freely available on this website and GitHub (github.com/Nobel-Justin/Oviz-Bio-demo)[email protected]

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

ORFhunteR: an accurate approach for the automatic identification and annotation of open reading frames in human mRNA molecules

10.1101/2021.02.05.429963 ◽

2021 ◽

Author(s):

Vasily V. Grinev ◽

Mikalai M. Yatskou ◽

Victor V. Skakun ◽

Maryna K. Chepeleva ◽

Petr V. Nazarov

Keyword(s):

Single Molecule ◽

Web Application ◽

R Package ◽

Nucleotide Sequences ◽

Open Reading Frames ◽

Classification Model ◽

Automatic Identification ◽

Large Set ◽

Link Type ◽

Reading Frames

AbstractMotivationModern methods of whole transcriptome sequencing accurately recover nucleotide sequences of RNA molecules present in cells and allow for determining their quantitative abundances. The coding potential of such molecules can be estimated using open reading frames (ORF) finding algorithms, implemented in a number of software packages. However, these algorithms show somewhat limited accuracy, are intended for single-molecule analysis and do not allow selecting proper ORFs in the case of long mRNAs containing multiple ORF candidates.ResultsWe developed a computational approach, corresponding machine learning model and a package, dedicated to automatic identification of the ORFs in large sets of human mRNA molecules. It is based on vectorization of nucleotide sequences into features, followed by classification using a random forest. The predictive model was validated on sets of human mRNA molecules from the NCBI RefSeq and Ensembl databases and demonstrated almost 95% accuracy in detecting true ORFs. The developed methods and pre-trained classification model were implemented in a powerful ORFhunteR computational tool that performs an automatic identification of true ORFs among large set of human mRNA molecules.Availability and implementationThe developed open-source R package ORFhunteR is available for the community at GitHub repository (https://github.com/rfctbio-bsu/ORFhunteR), from Bioconductor (https://bioconductor.org/packages/devel/bioc/html/ORFhunteR.html) and as a web application (http://orfhunter.bsu.by).

Download Full-text

COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM

10.1101/2020.05.25.114884 ◽

2020 ◽

Cited By ~ 2

Author(s):

Frédéric Lemoine ◽

Luc Blassel ◽

Jakub Voznica ◽

Olivier Gascuel

Keyword(s):

Daily Basis ◽

Supplementary Information ◽

Summary Statistics ◽

Evolutionary Novelty ◽

Bioinformatics Analyses ◽

Link Type ◽

Sequencing Quality ◽

User Friendly ◽

Profile Hmm ◽

New Mutations

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.

Download Full-text