PPaxe: easy extraction of protein occurrence and interactions from the scientific literature

2018 ◽  
Vol 35 (14) ◽  
pp. 2523-2524 ◽  
Author(s):  
S Castillo-Lara ◽  
J F Abril

Abstract Motivation Protein–protein interactions (PPIs) are very important to build models for understanding many biological processes. Although several databases hold many of these interactions, exploring them, selecting those relevant for a given subject and contextualizing them can be a difficult task for researchers. Extracting PPIs directly from the scientific literature can be very helpful for providing such context, as the sentences describing these interactions may give insights to researchers in helpful ways. Results We have developed PPaxe, a python module and a web application that allows users to extract PPIs and protein occurrence from a given set of PubMed and PubMedCentral articles. It presents the results of the analysis in different ways to help researchers export, filter and analyze the results easily. Availability and implementation PPaxe web demo is freely available at https://compgen.bio.ub.edu/PPaxe. All the software can be downloaded from https://compgen.bio.ub.edu/PPaxe/download, including a command-line version and docker containers for an easy installation. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Sunil Nagpal ◽  
Bhusan K Kuntal ◽  
Sharmila S Mande

Abstract Motivation Venn diagrams are frequently used to compare composition of datasets (e.g. datasets containing list of proteins and genes). Network diagram constructed using such datasets are usually generated using ‘list of edges’, popularly known as edge-lists. An edge-list and the corresponding generated network are, however, composed of two elements, namely, edges (e.g. protein–protein interactions) and nodes (e.g. proteins). Researchers often use individual lists of edges and nodes to compare composition of biological networks using existing Venn diagram tools. However, specialized analysis workflows are required for comparison of nodes as well as edges. Apart from this, different tools or graph libraries are needed for visualizing any specific edges of interest (e.g. protein–protein interactions which are present across all networks or are shared between subset of networks or are exclusively present in a selected network). Further, these results are required to be exported in the form of publication worthy network diagram(s), particularly for small networks. Results We introduce a (server independent) JavaScript framework (called NetSets.js) that integrates popular Venn and network diagrams in a single application. A free to use intuitive web application (utilizing NetSets.js), specifically designed to perform both compositional comparisons (e.g. for identifying common/exclusive edges or nodes) and interactive user defined visualizations of network (for the identified common/exclusive interactions across multiple networks) using simple edge-lists is also presented. The tool also enables connection to Cytoscape desktop application using the Netsets-Cyapp. We demonstrate the utility of our tool using real world biological networks (microbiome, gene interaction, multiplex and protein–protein interaction networks). Availabilityand implementation http://web.rniapps.net/netsets (freely available for academic use). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Emma H Gail ◽  
Anup D Shah ◽  
Ralf B Schittenhelm ◽  
Chen Davidovich

Abstract Summary Unbiased detection of protein–protein and protein–RNA interactions within ribonucleoprotein complexes are enabled through crosslinking followed by mass spectrometry. Yet, different methods detect different types of molecular interactions and therefore require the usage of different software packages with limited compatibility. We present crisscrosslinkeR, an R package that maps both protein–protein and protein–RNA interactions detected by different types of approaches for crosslinking with mass spectrometry. crisscrosslinkeR produces output files that are compatible with visualization using popular software packages for the generation of publication-quality figures. Availability and implementation crisscrosslinkeR is a free and open-source package, available through GitHub: github.com/egmg726/crisscrosslinker. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Nils Kurzawa ◽  
André Mateus ◽  
Mikhail M Savitski

Abstract Summary Rtpca is an R package implementing methods for inferring protein–protein interactions (PPIs) based on thermal proteome profiling experiments of a single condition or in a differential setting via an approach called thermal proximity coaggregation. It offers user-friendly tools to explore datasets for their PPI predictive performance and easily integrates with available R packages. Availability and implementation Rtpca is available from Bioconductor (https://bioconductor.org/packages/Rtpca). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 17 (4) ◽  
pp. 271-286
Author(s):  
Chang Xu ◽  
Limin Jiang ◽  
Zehua Zhang ◽  
Xuyao Yu ◽  
Renhai Chen ◽  
...  

Background: Protein-Protein Interactions (PPIs) play a key role in various biological processes. Many methods have been developed to predict protein-protein interactions and protein interaction networks. However, many existing applications are limited, because of relying on a large number of homology proteins and interaction marks. Methods: In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the sequence-based feature representation, for identifying protein-protein interactions. Our method firstly constructs a sequence-based feature vector to represent each pair of proteins, viaMultivariate Mutual Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638- dimentional features into an integrated learning model for judging interaction pairs and non-interaction pairs. Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in order to suppress over-adaptation during the training process. Results: To evaluate the performance of our method, we conduct several comprehensive tests for PPIs prediction. On the H. pyloridataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the accuracy of our method is increased by 0.57%. On the S. cerevisiaedataset, our method achieves 95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the Humandataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our method is increased by 0.6%. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at https://github.com/guofei-tju/RF-Ada-DF.git.


Author(s):  
Qianmu Yuan ◽  
Jianwen Chen ◽  
Huiying Zhao ◽  
Yaoqi Zhou ◽  
Yuedong Yang

Abstract Motivation Protein–protein interactions (PPI) play crucial roles in many biological processes, and identifying PPI sites is an important step for mechanistic understanding of diseases and design of novel drugs. Since experimental approaches for PPI site identification are expensive and time-consuming, many computational methods have been developed as screening tools. However, these methods are mostly based on neighbored features in sequence, and thus limited to capture spatial information. Results We propose a deep graph-based framework deep Graph convolutional network for Protein–Protein-Interacting Site prediction (GraphPPIS) for PPI site prediction, where the PPI site prediction problem was converted into a graph node classification task and solved by deep learning using the initial residual and identity mapping techniques. We showed that a deeper architecture (up to eight layers) allows significant performance improvement over other sequence-based and structure-based methods by more than 12.5% and 10.5% on AUPRC and MCC, respectively. Further analyses indicated that the predicted interacting sites by GraphPPIS are more spatially clustered and closer to the native ones even when false-positive predictions are made. The results highlight the importance of capturing spatially neighboring residues for interacting site prediction. Availability and implementation The datasets, the pre-computed features, and the source codes along with the pre-trained models of GraphPPIS are available at https://github.com/biomed-AI/GraphPPIS. The GraphPPIS web server is freely available at https://biomed.nscc-gz.cn/apps/GraphPPIS. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (19) ◽  
pp. 4846-4853 ◽  
Author(s):  
Yan Wang ◽  
Miguel Correa Marrero ◽  
Marnix H Medema ◽  
Aalt D J van Dijk

Abstract Motivation Polyketide synthases (PKSs) are enzymes that generate diverse molecules of great pharmaceutical importance, including a range of clinically used antimicrobials and antitumor agents. Many polyketides are synthesized by cis-AT modular PKSs, which are organized in assembly lines, in which multiple enzymes line up in a specific order. This order is defined by specific protein–protein interactions (PPIs). The unique modular structure and catalyzing mechanism of these assembly lines makes their products predictable and also spurred combinatorial biosynthesis studies to produce novel polyketides using synthetic biology. However, predicting the interactions of PKSs, and thereby inferring the order of their assembly line, is still challenging, especially for cases in which this order is not reflected by the ordering of the PKS-encoding genes in the genome. Results Here, we introduce PKSpop, which uses a coevolution-based PPI algorithm to infer protein order in PKS assembly lines. Our method accurately predicts protein orders (93% accuracy). Additionally, we identify new residue pairs that are key in determining interaction specificity, and show that coevolution of N- and C-terminal docking domains of PKSs is significantly more predictive for PPIs than coevolution between ketosynthase and acyl carrier protein domains. Availability and implementation The code is available on http://www.bif.wur.nl/ (under ‘Software’). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4525-4527 ◽  
Author(s):  
Alex X Lu ◽  
Taraneh Zarin ◽  
Ian S Hsu ◽  
Alan M Moses

Abstract Summary We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells. YeastSpotter is user-friendly and generalizable, reducing the computational expertise required for this critical preprocessing step in many image analysis pipelines. Availability and implementation YeastSpotter is available at http://yeastspotter.csb.utoronto.ca/. Code is available at https://github.com/alexxijielu/yeast_segmentation. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3263-3265 ◽  
Author(s):  
Lucas Czech ◽  
Pierre Barbera ◽  
Alexandros Stamatakis

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4405-4407 ◽  
Author(s):  
Steven Monger ◽  
Michael Troup ◽  
Eddie Ip ◽  
Sally L Dunwoodie ◽  
Eleni Giannoulatou

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Michael Milton ◽  
Natalie Thorne

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document