MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction

Nan Li; Richard I. Ainsworth; Meixin Wu; Bo Ding; Wei Wang

doi:10.1093/bioinformatics/btv666

MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction

Bioinformatics ◽

10.1093/bioinformatics/btv666 ◽

2015 ◽

Vol 32 (6) ◽

pp. 940-942 ◽

Cited By ~ 5

Author(s):

Nan Li ◽

Richard I. Ainsworth ◽

Meixin Wu ◽

Bo Ding ◽

Wei Wang

Keyword(s):

Small Molecules ◽

Supplementary Information ◽

Protein Recognition ◽

Peptide Ligand ◽

Ligand Interaction ◽

Post Translational Modifications ◽

Interaction Prediction ◽

Recognition Specificity ◽

Automated Pipeline ◽

User Friendly

Abstract Motivation: MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. Availability and implementation: The program is available at http://wanglab.ucsd.edu/MIEC-SVM. Contact: [email protected] Supplementary information : Supplementary data available at Bioinformatics online.

Download Full-text

MetumpX—a metabolomics support package for untargeted mass spectrometry

Bioinformatics ◽

10.1093/bioinformatics/btz765 ◽

2019 ◽

Vol 36 (5) ◽

pp. 1647-1648 ◽

Cited By ~ 1

Author(s):

Bilal Wajid ◽

Hasan Iqbal ◽

Momina Jamil ◽

Hafsa Rafique ◽

Faria Anwar

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Small Molecules ◽

Software Package ◽

Life Sciences ◽

Supplementary Information ◽

Supplementary Data ◽

Software Packages ◽

Develop Software ◽

User Friendly

Abstract Motivation Metabolomics is a data analysis and interpretation field aiming to study functions of small molecules within the organism. Consequently Metabolomics requires researchers in life sciences to be comfortable in downloading, installing and scripting of software that are mostly not user friendly and lack basic GUIs. As the researchers struggle with these skills, there is a dire need to develop software packages that can automatically install software pipelines truly speeding up the learning curve to build software workstations. Therefore, this paper aims to provide MetumpX, a software package that eases in the installation of 103 software by automatically resolving their individual dependencies and also allowing the users to choose which software works best for them. Results MetumpX is a Ubuntu-based software package that facilitate easy download and installation of 103 tools spread across the standard metabolomics pipeline. As far as the authors know MetumpX is the only solution of its kind where the focus lies on automating development of software workstations. Availability and implementation https://github.com/hasaniqbal777/MetumpX-bin. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features

Bioinformatics ◽

10.1093/bioinformatics/btaa702 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dan Zhang ◽

Zhao-Chun Xu ◽

Wei Su ◽

Yu-He Yang ◽

Hao Lv ◽

...

Keyword(s):

Protein Carbonylation ◽

Supplementary Information ◽

Sequence Information ◽

Biological Processes ◽

Feature Selection Technique ◽

Post Translational Modifications ◽

Proposed Model ◽

Feature Encoding ◽

Feature Encoding Scheme ◽

User Friendly

Abstract Motivation Protein carbonylation is one of the most important oxidative stress-induced post-translational modifications, which is generally characterized as stability, irreversibility and relative early formation. It plays a significant role in orchestrating various biological processes and has been already demonstrated to be related to many diseases. However, the experimental technologies for carbonylation sites identification are not only costly and time consuming, but also unable of processing a large number of proteins at a time. Thus, rapidly and effectively identifying carbonylation sites by computational methods will provide key clues for the analysis of occurrence and development of diseases. Results In this study, we developed a predictor called iCarPS to identify carbonylation sites based on sequence information. A novel feature encoding scheme called residues conical coordinates combined with their physicochemical properties was proposed to formulate carbonylated protein and non-carbonylated protein samples. To remove potential redundant features and improve the prediction performance, a feature selection technique was used. The accuracy and robustness of iCarPS were proved by experiments on training and independent datasets. Comparison with other published methods demonstrated that the proposed method is powerful and could provide powerful performance for carbonylation sites identification. Availability and implementation Based on the proposed model, a user-friendly webserver and a software package were constructed, which can be freely accessed at http://lin-group.cn/server/iCarPS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Clin-mNGS: Automated Pipeline for Pathogen Detection from Clinical Metagenomic Data

Current Bioinformatics ◽

10.2174/1574893615999200608130029 ◽

2020 ◽

Vol 15 ◽

Author(s):

Akshatha Prasanna ◽

Vidya Niranjan

Keyword(s):

Antimicrobial Resistance ◽

High Performance ◽

Pathogen Detection ◽

Bacterial Species ◽

Workflow Management ◽

Metagenomic Data ◽

Antimicrobial Resistance Genes ◽

Culture Independent ◽

Automated Pipeline ◽

User Friendly

Background: Since bacteria are the earliest known organisms, there has been significant interest in their variety and biology, most certainly concerning human health. Recent advances in Metagenomics sequencing (mNGS), a culture-independent sequencing technology have facilitated an accelerated development in clinical microbiology and our understanding of pathogens. Objective: For the implementation of mNGS in routine clinical practice to become feasible, a practical and scalable strategy for the study of mNGS data is essential. This study presents a robust automated pipeline to analyze clinical metagenomic data for pathogen identification and classification. Method: The proposed Clin-mNGS pipeline is an integrated, open-source, scalable, reproducible, and user-friendly framework scripted using the Snakemake workflow management software. The implementation avoids the hassle of manual installation and configuration of the multiple command-line tools and dependencies. The approach directly screens pathogens from clinical raw reads and generates consolidated reports for each sample. Results: The pipeline is demonstrated using publicly available data and is tested on a desktop Linux system and a High-performance cluster. The study compares variability in results from different tools and versions. The versions of the tools are made user modifiable. The pipeline results in quality check, filtered reads, host subtraction, assembled contigs, assembly metrics, relative abundances of bacterial species, antimicrobial resistance genes, plasmid finding, and virulence factors identification. The results obtained from the pipeline are evaluated based on sensitivity and positive predictive value. Conclusion: Clin-mNGS is an automated Snakemake pipeline validated for the analysis of microbial clinical metagenomics reads to perform taxonomic classification and antimicrobial resistance prediction.

Download Full-text

Global mapping of protein–metabolite interactions in Saccharomyces cerevisiae reveals that Ser-Leu dipeptide regulates phosphoglycerate kinase activity

Communications Biology ◽

10.1038/s42003-021-01684-3 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Marcin Luzarowski ◽

Rubén Vicente ◽

Andrei Kiselev ◽

Mateusz Wagner ◽

Dennis Schlossarek ◽

...

Keyword(s):

Saccharomyces Cerevisiae ◽

Small Molecules ◽

Phosphoglycerate Kinase ◽

Glycolytic Enzyme ◽

Diauxic Shift ◽

Cellular Processes ◽

Targeted Analysis ◽

Global Mapping ◽

Biochemical Approach ◽

User Friendly

AbstractProtein–metabolite interactions are of crucial importance for all cellular processes but remain understudied. Here, we applied a biochemical approach named PROMIS, to address the complexity of the protein–small molecule interactome in the model yeast Saccharomyces cerevisiae. By doing so, we provide a unique dataset, which can be queried for interactions between 74 small molecules and 3982 proteins using a user-friendly interface available at https://promis.mpimp-golm.mpg.de/yeastpmi/. By interpolating PROMIS with the list of predicted protein–metabolite interactions, we provided experimental validation for 225 binding events. Remarkably, of the 74 small molecules co-eluting with proteins, 36 were proteogenic dipeptides. Targeted analysis of a representative dipeptide, Ser-Leu, revealed numerous protein interactors comprising chaperones, proteasomal subunits, and metabolic enzymes. We could further demonstrate that Ser-Leu binding increases activity of a glycolytic enzyme phosphoglycerate kinase (Pgk1). Consistent with the binding analysis, Ser-Leu supplementation leads to the acute metabolic changes and delays timing of a diauxic shift. Supported by the dipeptide accumulation analysis our work attests to the role of Ser-Leu as a metabolic regulator at the interface of protein degradation and central metabolism.

Download Full-text

Ribo-ODDR: Oligo design pipeline for experiment-specific rRNA depletion in ribo-seq

Bioinformatics ◽

10.1093/bioinformatics/btab171 ◽

2021 ◽

Author(s):

Ferhat Alkan ◽

Joana Silva ◽

Eric Pintó Barberà ◽

William J Faller

Keyword(s):

Ribosome Profiling ◽

Supplementary Information ◽

Experimental Conditions ◽

Computational Framework ◽

Rna Translation ◽

Rrna Depletion ◽

Selection For ◽

Nucleotide Resolution ◽

User Friendly ◽

Oligo Design

Abstract Motivation Ribosome Profiling (Ribo-seq) has revolutionized the study of RNA translation by providing information on ribosome positions across all translated RNAs with nucleotide-resolution. Yet several technical limitations restrict the sequencing depth of such experiments, the most common of which is the overabundance of rRNA fragments. Various strategies can be employed to tackle this issue, including the use of commercial rRNA depletion kits. However, as they are designed for more standardized RNAseq experiments, they may perform suboptimally in Ribo-seq. In order to overcome this, it is possible to use custom biotinylated oligos complementary to the most abundant rRNA fragments, however currently no computational framework exists to aid the design of optimal oligos. Results Here, we first show that a major confounding issue is that the rRNA fragments generated via Ribo-seq vary significantly with differing experimental conditions, suggesting that a “one-size-fits-all” approach may be inefficient. Therefore we developed Ribo-ODDR, an oligo design pipeline integrated with a user-friendly interface that assists in oligo selection for efficient experiment-specific rRNA depletion. Ribo-ODDR uses preliminary data to identify the most abundant rRNA fragments, and calculates the rRNA depletion efficiency of potential oligos. We experimentally show that Ribo-ODDR designed oligos outperform commercially available kits and lead to a significant increase in rRNA depletion in Ribo-seq. Availability Ribo-ODDR is freely accessible at https://github.com/fallerlab/Ribo-ODDR Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CPVA: a web-based metabolomic tool for chromatographic peak visualization and annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa200 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3913-3915

Author(s):

Hemi Luan ◽

Xingen Jiang ◽

Fenfen Ji ◽

Zhangzhang Lan ◽

Zongwei Cai ◽

...

Keyword(s):

False Positive ◽

Supplementary Information ◽

Liquid Chromatography Mass Spectrometry ◽

Targeted Metabolomics ◽

Metabolomics Data ◽

Web Based ◽

Tremendous Amount ◽

Chromatographic Peaks ◽

User Friendly

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Streamlining the GIS to CAD Workflow for Automated Pipeline Alignment Sheet Generation

Volume 2: Pipeline Safety Management Systems; Project Management, Design, Construction, and Environmental Issues; Strain Based Design; Risk and Reliability; Northern, Offshore, and Production Pipelines ◽

10.1115/ipc2020-9673 ◽

2020 ◽

Author(s):

Gyanendra Gurung ◽

Kshama Roy

Keyword(s):

Information System ◽

Geographic Information System ◽

Spatial Analysis ◽

Cost Effective ◽

Systematic Errors ◽

Effective Solution ◽

Gis Integration ◽

Automated Pipeline ◽

Central Database ◽

User Friendly

Abstract The use of Geographic Information System (GIS) in managing pipeline database and automating routine engineering processes has become a standard practice in the pipeline industry. While maintaining a central database provides security, integrity, and easy management of data throughout the pipeline’s lifecycle, GIS enables spatial analysis of pipeline data in addition to streamlining access and visualization of results. One of the major benefits of GIS integration lies in the ease of automating the alignment sheet generation for pipelines. This paper introduces a simplified pipeline alignment sheet generation workflow using GIS datasets to produce highly customizable alignment sheets in AutoCAD, a much-preferred format in the pipeline industry. By utilizing existing GIS and AutoCAD features to generate the alignment sheet, writing complicated geo-processing or plotting algorithms is minimized, which in turn reduces the risks of committing any systematic errors. This robust and user-friendly workflow not only ensures safety but also leads to a cost-effective solution.

Download Full-text

MetaADEDB 2.0: a comprehensive database on adverse drug events

Bioinformatics ◽

10.1093/bioinformatics/btaa973 ◽

2020 ◽

Author(s):

Zhuohang Yu ◽

Zengrui Wu ◽

Weihua Li ◽

Guixia Liu ◽

Yun Tang

Keyword(s):

Safety Assessment ◽

Adverse Drug Events ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Supplementary Information ◽

Online Database ◽

Web Interface ◽

Drug Discovery And Development ◽

Comprehensive Information ◽

User Friendly

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text

COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM

10.1101/2020.05.25.114884 ◽

2020 ◽

Cited By ~ 2

Author(s):

Frédéric Lemoine ◽

Luc Blassel ◽

Jakub Voznica ◽

Olivier Gascuel

Keyword(s):

Daily Basis ◽

Supplementary Information ◽

Summary Statistics ◽

Evolutionary Novelty ◽

Bioinformatics Analyses ◽

Link Type ◽

Sequencing Quality ◽

User Friendly ◽

Profile Hmm ◽

New Mutations

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.

Download Full-text