Pout2Prot: an efficient tool to create protein (sub)groups from Percolator output files

The protein inference problem is complicated in metaproteomics due to the presence of homologous proteins from closely related species. Nevertheless, this process is vital to assign taxonomy and functions to identified proteins of microbial species, a task for which specialized tools such as Prophane have been developed. We here present Pout2Prot, which takes Percolator Output (.pout) files from multiple experiments and creates protein (sub)group output files (.tsv) that can be used directly with Prophane. Pout2Prot offers different grouping strategies, allows distinction between sample categories and replicates for multiple files, and uses a weighted spectral count for protein (sub)groups to reflect (sub)group abundance. Pout2Prot is available as a web application at https://pout2prot.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the Apache License 2.0 and is available at https://github.com/compomics/pout2prot.

Download Full-text

dbHT-Trans: An Efficient Tool for Filtering the Protein-Encoding Transcripts Assembled by RNA-Seq According to Search for Homologous Proteins

Journal of Computational Biology ◽

10.1089/cmb.2015.0137 ◽

2016 ◽

Vol 23 (1) ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Feilong Deng ◽

Shi-Yi Chen

Keyword(s):

Rna Seq ◽

Homologous Proteins ◽

Efficient Tool ◽

Protein Encoding

Download Full-text

A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics

Journal of Computational Biology ◽

10.1089/cmb.2009.0018 ◽

2009 ◽

Vol 16 (8) ◽

pp. 1183-1193 ◽

Cited By ~ 52

Author(s):

Yong Fuga Li ◽

Randy J. Arnold ◽

Yixue Li ◽

Predrag Radivojac ◽

Quanhu Sheng ◽

...

Keyword(s):

Bayesian Approach ◽

Shotgun Proteomics ◽

Inference Problem ◽

Protein Inference

Download Full-text

Golden Mutagenesis: An efficient multi-sitesaturation mutagenesis approach by Golden Gate cloning with automated primer design

10.1101/453621 ◽

2018 ◽

Author(s):

Pascal Püllmann ◽

Chris Ulpinnis ◽

Sylvestre Marillonnet ◽

Ramona Gruetzner ◽

Steffen Neumann ◽

...

Keyword(s):

Web Application ◽

Pcr Amplification ◽

Restriction Enzymes ◽

Primer Design ◽

Golden Gate ◽

Efficient Tool ◽

Multiple Gene ◽

Reading Frame ◽

Cloning Technique ◽

Golden Gate Cloning

Site-directed methods for the generation of genetic diversity are essential tools in the field of directed enzyme evolution. The Golden Gate cloning technique has been proven to be an efficient tool for a variety of cloning setups. The utilization of restriction enzymes which cut outside of their recognition domain allows the assembly of multiple gene fragments obtained by PCR amplification without altering the open reading frame of the reconstituted gene. We have developed a protocol, termed Golden Muta-genesis that allows the rapid, straightforward, reliable and inexpensive construction of mutagenesis libraries. One to five amino acid positions within a coding sequence could be altered simultaneously using a protocol which can be performed within one day. To facilitate the implementation of this technique, a software library and web application for automated primer design and for the graphical evaluation of the randomization success based on the sequencing results was developed. This allows facile primer design and application of Golden Mutagenesis also for laboratories, which are not specialized in molecular biology.

Download Full-text

ProteinLasso: A Lasso regression approach to protein inference problem in shotgun proteomics

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2012.12.008 ◽

2013 ◽

Vol 43 ◽

pp. 46-54 ◽

Cited By ~ 18

Author(s):

Ting Huang ◽

Haipeng Gong ◽

Can Yang ◽

Zengyou He

Keyword(s):

Shotgun Proteomics ◽

Lasso Regression ◽

Inference Problem ◽

Protein Inference ◽

Regression Approach

Download Full-text

Annotating eukaryotic and toxin-specific signal peptides using Razor

10.1101/2020.11.30.405613 ◽

2020 ◽

Author(s):

Bikash K. Bhandari ◽

Paul P. Gardner ◽

Chun Shen Lim

Keyword(s):

Signal Peptide ◽

Web Application ◽

Protein Transport ◽

Protein Translocation ◽

Recombinant Protein Expression ◽

Disease Diagnosis ◽

Signal Peptides ◽

Specific Signal ◽

Link Type ◽

Command Line Tool

ABSTRACTMotivationSignal peptides are responsible for protein transport and secretion and are ubiquitous to all forms of life. The annotation of signal peptides is important for understanding protein translocation and toxin secretion, optimising recombinant protein expression, as well as for disease diagnosis and metagenomics.ResultsHere we explore the features of these signal sequences across eukaryotes. We find that different kingdoms have their characteristic distributions of signal peptide residues. Additionally, the signal peptides of secretory toxins have common features across kingdoms. We leverage these subtleties to build Razor, a simple yet powerful tool for annotating signal peptides, which additionally predicts toxin- and fungal-specific signal peptides based on the first 23 N-terminal residues. Finally, we demonstrate the usability of Razor by scanning all reviewed sequences from UniProt. Indeed, Razor is able to identify toxins using their signal peptide sequences only. Strikingly, we discover that many defensive proteins across kingdoms harbour a toxin-like signal peptide; some of these defensive proteins have emerged through convergent evolution, e.g. defensin and defensin-like protein families, and phospholipase families.Availability and implementationRazor is available as a web application (https://tisigner.com/razor) and a command-line tool (https://github.com/Gardner-BinfLab/Razor).

Download Full-text

Decomposing metabolite set activity levels with PALS

10.1101/2020.06.07.138974 ◽

2020 ◽

Author(s):

Karen McLuskey ◽

Joe Wandy ◽

Isabel Vincent ◽

Justin J.J. van der Hooft ◽

Simon Rogers ◽

...

Keyword(s):

Metabolic Pathways ◽

Web Application ◽

Enrichment Analysis ◽

Activity Level ◽

Activity Levels ◽

Experimental Conditions ◽

Metabolomics Data ◽

Command Line Tool ◽

Plant Data ◽

Project Data

AbstractMotivationRelated metabolites can be grouped into metabolite sets in many ways. Examples of these include the grouping of metabolites through their participation in a series of chemical reactions (forming metabolic pathways); or based on fragmentation spectral similarities and shared chemical substructures. Understanding how such metabolite sets change across samples can be incredibly useful in the interpretation and understanding of complex metabolomics data. However many of the available tools suitable for the enrichment analysis of metabolite sets are based on simple methods that badly handle the missing features inherent in untargeted metabolomics measurements and can be difficult to integrate into existing applications.ResultsWe present PALS (Pathway Activity Level Scoring), a Python library, command-line tool and Web application that performs the ranking of significantly-changing metabolite sets over different experimental conditions. As example applications, PALS is used to analyse metabolites grouped as pathways and by common MS-MS fragmentation structures. A comparison of PALS with two other commonly used methods (ORA and GSEA) is also given, and reveals that PALS is more robust to missing peaks and noisy data than the alternatives. We report results from using PALS to analyse pathways from a study of Human African Trypanosomiasis. Finally, we also report how PALS used tandem MS fragmentation structures to reveal enriched metabolite sets between clades in Rhamnaceae plant data, and on American Gut Project data.AvailabilityPALS is freely available from our project Web site at https://pals.glasgowcompbio.org/. It can be imported as a Python library, run as a stand-alone tool or used as a web application.

Download Full-text

MegaGO: a fast yet powerful approach to assess functional similarity across meta-omics data sets

10.1101/2020.11.16.384834 ◽

2020 ◽

Author(s):

Pieter Verschaffelt ◽

Tim Van Den Bossche ◽

Wassim Gabriel ◽

Michał Burdukiewicz ◽

Alessio Soggiu ◽

...

Keyword(s):

Web Application ◽

Functional Similarity ◽

Data Sets ◽

Link Type ◽

Large Sets ◽

Powerful Approach ◽

Command Line Tool ◽

Complete Set ◽

User Friendly ◽

Go Terms

AbstractThe study of microbiomes has gained in importance over the past few years, and has led to the fields of metagenomics, metatranscriptomics and metaproteomics. While initially focused on the study of biodiversity within these communities the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute functional similarity between two data sets. MegaGO is highly performant: each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license, and is available at https://github.com/MEGA-GO/.

Download Full-text

Radiator: a cloud-based framework for deploying re-usable bioinformatics tools

10.1101/614594 ◽

2019 ◽

Author(s):

Emily K.W. Lo ◽

Remy M. Schwab ◽

Zak Burke ◽

Patrick Cahan

Keyword(s):

User Interfaces ◽

Web Application ◽

Web Applications ◽

Cloud Provider ◽

Web Based ◽

Lightweight Framework ◽

Link Type ◽

Bioinformatics Tools ◽

Command Line Tool ◽

Amazon Web Services

AbstractSummaryAccessibility and usability of compute-intensive bioinformatics tools can be increased with simplified web-based graphic user interfaces. However, deploying such tools as web applications presents additional barriers, including the complexity of developing a usable interface, network latency in transferring large datasets, and cost, which we encountered in developing a web-based version of our command-line tool CellNet. Learning and generalizing from this experience, we have devised a lightweight framework, Radiator, to facilitate deploying bioinformatics tools as web applications. To achieve reproducibility, usability, consistent accessibility, throughput, and cost-efficiency, Radiator is designed to be deployed on the cloud. Here, we describe the internals of Radiator and how to use it.Availability and ImplementationCode for Radiator and the CellNet Web Application are freely available at https://github.com/pcahan1 under the MIT license. The CellNet WebApp, Radiator, and Radiator-derived applications can be launched through public Amazon Machine Images from the cloud provider Amazon Web Services (AWS) (https://aws.amazon.com/).

Download Full-text

FlashFry: a fast and flexible tool for large-scale CRISPR target design

10.1101/189068 ◽

2017 ◽

Cited By ~ 4

Author(s):

Aaron McKenna ◽

Jay Shendure

Keyword(s):

Web Application ◽

Large Scale ◽

Java Virtual Machine ◽

Flexible Tool ◽

Lightweight Framework ◽

Methods Development ◽

Large Numbers ◽

Genome Wide ◽

Command Line Tool ◽

Target Design

AbstractFlashFry is a fast and flexible command-line tool for characterizing large numbers of CRISPR target sequences. While several CRISPR web application exist, genome-wide knockout studies, noncoding deletion scans, and other large-scale studies or methods development projects require a simple and lightweight framework that can quickly discover and score thousands of candidates guides targeting an arbitrary DNA sequence. With FlashFry, users can specify an unconstrained number of mismatches to putative off-targets, richly annotate discovered sites, and tag potential guides with commonly used on target and off-target scoring metrics. FlashFry runs at speeds comparable to widely used genome-wide sequence aligners, and output is provided as an easy-to-manipulate text file.AvailabilityFlashFry is written in Scala and bundled as a stand-alone Jar file, easily run on any system with an installed Java virtual machine (JVM). The tool is freely licensed under version 3 of the GPL, and code, documentation, and tutorials are available on the GitHub page: http://aaronmck.github.io/FlashFry/

Download Full-text

VIQoR: a web service for Visually supervised protein Inference and protein Quantification

10.1101/2021.06.01.446512 ◽

2021 ◽

Author(s):

Vasileios Tsiamis ◽

Veit Schwammle

Keyword(s):

Quantitative Analysis ◽

Web Service ◽

Quantitative Data ◽

Missing Values ◽

Interactive Visualization ◽

Weighted Average ◽

Protein Quantification ◽

Inference Problem ◽

Protein Inference ◽

Concentration Changes

Motivation: In quantitative bottom-up mass spectrometry (MS)-based proteomics the reliable estimation of protein concentration changes from peptide quantifications between different biological samples is essential. This estimation is not a single task but comprises the two processes of protein inference and protein abundance summarization. Furthermore, due to the high complexity of proteomics data and associated uncertainty about the performance of these processes, there is a demand for comprehensive visualization methods able to integrate protein with peptide quantitative data including their post-translational modifications. Hence, there is a lack of a suitable tool that provides post-identification quantitative analysis of proteins with simultaneous interactive visualization. Results: In this article, we present VIQoR, a user-friendly web service that accepts peptide quantitative data of both labeled and label-free experiments and accomplishes the processes for relative protein quantification, along with interactive visualization modules, including the novel VIQoR plot. We implemented two parsimonious algorithms to solve the protein inference problem, while protein summarization is facilitated by a well established factor analysis algorithm called fast-FARMS followed by a weighted average summarization function that minimizes the effect of missing values. In addition, summarization is optimized by the so-called Global Correlation Indicator (GCI). We test the tool on three publicly available ground truth datasets and demonstrate the ability of the protein inference algorithms to handle degenerate peptides. We furthermore show that GCI increases the accuracy of the quantitative analysis in data sets with replicated design. Availability and implementation: VIQoR is accessible at: http://computproteomics.bmb.sdu.dk:8192/app_direct/VIQoR/ . The source code is available at: https://bitbucket.org/vtsiamis/viqor/ .

Download Full-text