Algorithms for protein interaction networks

Abstract Background Identifying protein functions is important for many biological applications. Since experimental functional characterization of proteins is time-consuming and costly, accurate and efficient computational methods for predicting protein functions are in great demand for generating the testable hypotheses guiding large-scale experiments.“ Results Here, we propose Graph2GO, a multi-modal graph-based representation learning model that can integrate heterogeneous information, including multiple types of interaction networks (sequence similarity network and protein-protein interaction network) and protein features (amino acid sequence, subcellular location, and protein domains) to predict protein functions on gene ontology. Comparing Graph2GO to BLAST, as a baseline model, and to two popular protein function prediction methods (Mashup and deepNF), we demonstrated that our model can achieve state-of-the-art performance. We show the robustness of our model by testing on multiple species. We also provide a web server supporting function query and downstream analysis on-the-fly. Conclusions Graph2GO is the first model that has utilized attributed network representation learning methods to model both interaction networks and protein features for predicting protein functions, and achieved promising performance. Our model can be easily extended to include more protein features to further improve the performance. Besides, Graph2GO is also applicable to other application scenarios involving biological networks, and the learned latent representations can be used as feature inputs for machine learning tasks in various downstream analyses.

Download Full-text

Analysis of Protein-Protein Interaction Networks through Computational Approaches

Protein and Peptide Letters ◽

10.2174/0929866526666191105142034 ◽

2020 ◽

Vol 27 (4) ◽

pp. 265-278 ◽

Cited By ~ 1

Author(s):

Ying Han ◽

Liang Cheng ◽

Weiju Sun

Keyword(s):

Protein Interaction ◽

Biological Networks ◽

Interaction Networks ◽

Computational Techniques ◽

Cellular Functions ◽

Protein Protein Interaction ◽

Comprehensive Information ◽

Protein Interaction Prediction ◽

Or Gene ◽

Experimental Findings

The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein−protein interaction prediction.

Download Full-text

Universal Screening Methods and Applications of ThermoFluor®

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057106292746 ◽

2006 ◽

Vol 11 (7) ◽

pp. 854-863 ◽

Cited By ~ 124

Author(s):

Maxwell D. Cummings ◽

Michael A. Farnum ◽

Marina I. Nelen

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Protein Unfolding ◽

Direct Detection ◽

Functional Characterization ◽

Screening Methods ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Bacterial Enzyme ◽

Research Problems

The genomics revolution has unveiled a wealth of poorly characterized proteins. Scientists are often able to produce milligram quantities of proteins for which function is unknown or hypothetical, based only on very distant sequence homology. Broadly applicable tools for functional characterization are essential to the illumination of these orphan proteins. An additional challenge is the direct detection of inhibitors of protein-protein interactions (and allosteric effectors). Both of these research problems are relevant to, among other things, the challenge of finding and validating new protein targets for drug action. Screening collections of small molecules has long been used in the pharmaceutical industry as 1 method of discovering drug leads. Screening in this context typically involves a function-based assay. Given a sufficient quantity of a protein of interest, significant effort may still be required for functional characterization, assay development, and assay configuration for screening. Increasingly, techniques are being reported that facilitate screening for specific ligands for a protein of unknown function. Such techniques also allow for function-independent screening with better characterized proteins. ThermoFluor®, a screening instrument based on monitoring ligand effects on temperature-dependent protein unfolding, can be applied when protein function is unknown. This technology has proven useful in the decryption of an essential bacterial enzyme and in the discovery of a series of inhibitors of a cancer-related, protein-protein interaction. The authors review some of the tools relevant to these research problems in drug discovery, and describe our experiences with 2 different proteins.

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

iProteinDB: an integrative database of Drosophila post-translational modifications

10.1101/386268 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yanhui Hu ◽

Richelle Sopko ◽

Verena Chung ◽

Romain A. Studer ◽

Sean D. Landry ◽

...

Keyword(s):

Protein Interactions ◽

Protein Function ◽

Large Scale ◽

Model Organisms ◽

General Strategy ◽

Post Translational Modification ◽

Post Translational Modifications ◽

Functional Sites ◽

Evolutionarily Conserved ◽

And Function

AbstractPost-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing stability, protein interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, commonly serine, threonine and tyrosine. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that many phosphorylation sites may be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites with regards to regulation and function. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from Drosophila embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for Drosophila. At iProteinDB, scientists can view the PTM landscape for any Drosophila protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related Drosophila species. Further, iProteinDB enables comparison of PTM data from Drosophila to that of orthologous proteins from other model organisms, including human, mouse, rat, Xenopus laevis, Danio rerio, and Caenorhabditis elegans.

Download Full-text

Mining Protein Interactome Networks to Measure Interaction Reliability and Select Hub Proteins

Computational Knowledge Discovery for Bioinformatics Research ◽

10.4018/978-1-4666-1785-8.ch013 ◽

2013 ◽

pp. 222-238

Author(s):

Young-Rae Cho ◽

Aidong Zhang

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Functional Characterization ◽

Flow Simulation ◽

Protein Protein Interactions ◽

Systematic Analysis ◽

Graph Theoretic ◽

Interactome Network ◽

Protein Interactome ◽

Hub Proteins

High-throughput techniques involve large-scale detection of protein-protein interactions. This interaction data set from the genome-scale perspective is structured into an interactome network. Since the interaction evidence represents functional linkage, various graph-theoretic computational approaches have been applied to the interactome networks for functional characterization. However, this data is generally unreliable, and the typical genome-wide interactome networks have a complex connectivity. In this paper, the authors explore systematic analysis of protein interactome networks, and propose a $k$-round signal flow simulation algorithm to measure interaction reliability from connection patterns of the interactome networks. This algorithm quantitatively characterizes functional links between proteins by simulating the propagation of information signals through complex connections. In this regard, the algorithm efficiently estimates the strength of alternative paths for each interaction. The authors also present an algorithm for mining the complex interactome network structure. The algorithm restructures the network by hierarchical ordering of nodes, and this structure re-formatting process reveals hub proteins in the interactome networks. This paper demonstrates that two rounds of simulation accurately scores interaction reliability in terms of ontological correlation and functional consistency. Finally, the authors validate that the selected structural hubs represent functional core proteins.

Download Full-text

INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity

Nucleic Acids Research ◽

10.1093/nar/gkv523 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W134-W140 ◽

Cited By ~ 52

Author(s):

Damiano Piovesan ◽

Manuel Giollo ◽

Emanuela Leonardi ◽

Carlo Ferrari ◽

Silvio C.E. Tosatto

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Interaction Networks

Download Full-text

A look back at the quality of Protein Function Prediction tools in CAFA

10.7287/peerj.preprints.27161 ◽

2018 ◽

Author(s):

Morteza Pourreza Shahri ◽

Madhusudan Srinivasan ◽

Diane Bimczok ◽

Upulee Kanewala ◽

Indika Kahanda

Keyword(s):

Protein Function ◽

Large Scale ◽

Computational Models ◽

Protein Function Prediction ◽

Function Prediction ◽

Test Case ◽

Test Cases ◽

Metamorphic Testing ◽

Main Challenge ◽

Scale Experiment

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

Download Full-text

Mining Protein Interactome Networks to Measure Interaction Reliability and Select Hub Proteins

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2010070102 ◽

2010 ◽

Vol 1 (3) ◽

pp. 20-35

Author(s):

Young-Rae Cho ◽

Aidong Zhang

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Functional Characterization ◽

Flow Simulation ◽

Data Set ◽

Systematic Analysis ◽

Core Proteins ◽

Interactome Network ◽

Protein Interactome ◽

Hub Proteins

High-throughput techniques involve large-scale detection of protein-protein interactions. This interaction data set from the genome-scale perspective is structured into an interactome network. Since the interaction evidence represents functional linkage, various graph-theoretic computational approaches have been applied to the interactome networks for functional characterization. However, this data is generally unreliable, and the typical genome-wide interactome networks have a complex connectivity. In this paper, the authors explore systematic analysis of protein interactome networks, and propose a $k$-round signal flow simulation algorithm to measure interaction reliability from connection patterns of the interactome networks. This algorithm quantitatively characterizes functional links between proteins by simulating the propagation of information signals through complex connections. In this regard, the algorithm efficiently estimates the strength of alternative paths for each interaction. The authors also present an algorithm for mining the complex interactome network structure. The algorithm restructures the network by hierarchical ordering of nodes, and this structure re-formatting process reveals hub proteins in the interactome networks. This paper demonstrates that two rounds of simulation accurately scores interaction reliability in terms of ontological correlation and functional consistency. Finally, the authors validate that the selected structural hubs represent functional core proteins.

Download Full-text

Clustering analysis of tumor metabolic networks

BMC Bioinformatics ◽

10.1186/s12859-020-03564-9 ◽

2020 ◽

Vol 21 (S10) ◽

Author(s):

Ichcha Manipur ◽

Ilaria Granata ◽

Lucia Maddalena ◽

Mario R. Guarracino

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Protein Interactions ◽

Biological Networks ◽

Clustering Analysis ◽

Large Scale ◽

Metabolic Networks ◽

Computational Time ◽

Expression Data ◽

Metabolic Models

Abstract Background Biological networks are representative of the diverse molecular interactions that occur within cells. Some of the commonly studied biological networks are modeled through protein-protein interactions, gene regulatory, and metabolic pathways. Among these, metabolic networks are probably the most studied, as they directly influence all physiological processes. Exploration of biochemical pathways using multigraph representation is important in understanding complex regulatory mechanisms. Feature extraction and clustering of these networks enable grouping of samples obtained from different biological specimens. Clustering techniques separate networks depending on their mutual similarity. Results We present a clustering analysis on tissue-specific metabolic networks for single samples from three primary tumor sites: breast, lung, and kidney cancer. The metabolic networks were obtained by integrating genome scale metabolic models with gene expression data. We performed network simplification to reduce the computational time needed for the computation of network distances. We empirically proved that networks clustering can characterize groups of patients in multiple conditions. Conclusions We provide a computational methodology to explore and characterize the metabolic landscape of tumors, thus providing a general methodology to integrate analytic metabolic models with gene expression data. This method represents a first attempt in clustering large scale metabolic networks. Moreover, this approach gives the possibility to get valuable information on what are the effects of different conditions on the overall metabolism.

Download Full-text