Graph2GO: a multi-modal attributed network embedding method for inferring protein functions

Abstract Background Identifying protein functions is important for many biological applications. Since experimental functional characterization of proteins is time-consuming and costly, accurate and efficient computational methods for predicting protein functions are in great demand for generating the testable hypotheses guiding large-scale experiments.“ Results Here, we propose Graph2GO, a multi-modal graph-based representation learning model that can integrate heterogeneous information, including multiple types of interaction networks (sequence similarity network and protein-protein interaction network) and protein features (amino acid sequence, subcellular location, and protein domains) to predict protein functions on gene ontology. Comparing Graph2GO to BLAST, as a baseline model, and to two popular protein function prediction methods (Mashup and deepNF), we demonstrated that our model can achieve state-of-the-art performance. We show the robustness of our model by testing on multiple species. We also provide a web server supporting function query and downstream analysis on-the-fly. Conclusions Graph2GO is the first model that has utilized attributed network representation learning methods to model both interaction networks and protein features for predicting protein functions, and achieved promising performance. Our model can be easily extended to include more protein features to further improve the performance. Besides, Graph2GO is also applicable to other application scenarios involving biological networks, and the learned latent representations can be used as feature inputs for machine learning tasks in various downstream analyses.

Download Full-text

Algorithms for protein interaction networks

Biochemical Society Transactions ◽

10.1042/bst0330530 ◽

2005 ◽

Vol 33 (3) ◽

pp. 530-534 ◽

Cited By ~ 3

Author(s):

M. Lappe ◽

L. Holm

Keyword(s):

Protein Interactions ◽

Biological Networks ◽

Protein Function ◽

Large Scale ◽

Sequence Similarity ◽

Functional Characterization ◽

Interaction Networks ◽

Computational Techniques ◽

Interaction Patterns ◽

Main Challenge

The functional characterization of all genes and their gene products is the main challenge of the postgenomic era. Recent experimental and computational techniques have enabled the study of interactions among all proteins on a large scale. In this paper, approaches will be presented to exploit interaction information for the inference of protein structure, function, signalling pathways and ultimately entire interactomes. Interaction networks can be modelled as graphs, showing the operation of gene function in terms of protein interactions. Since the architecture of biological networks differs distinctly from random networks, these functional maps contain a signal that can be used for predictive purposes. Protein function and structure can be predicted by matching interaction patterns, without the requirement of sequence similarity. Moving on to a higher level definition of protein function, the question arises how to decompose complex networks into meaningful subsets. An algorithm will be demonstrated, which extracts whole signal-transduction pathways from noisy graphs derived from text-mining the biological literature. Finally, an algorithmic strategy is formulated that enables the proteomics community to build a reliable scaffold of the interactome in a fraction of the time compared with uncoordinated efforts.

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity

Nucleic Acids Research ◽

10.1093/nar/gkv523 ◽

2015 ◽

Vol 43 (W1) ◽

pp. W134-W140 ◽

Cited By ~ 52

Author(s):

Damiano Piovesan ◽

Manuel Giollo ◽

Emanuela Leonardi ◽

Carlo Ferrari ◽

Silvio C.E. Tosatto

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Interaction Networks

Download Full-text

Predicting protein functions from redundancies in large-scale protein interaction networks

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2132527100 ◽

2003 ◽

Vol 100 (22) ◽

pp. 12579-12583 ◽

Cited By ~ 194

Author(s):

M. P. Samanta ◽

S. Liang

Keyword(s):

Protein Interaction ◽

Large Scale ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Protein Functions

Download Full-text

Deep Annotation of Protein Function across Diverse Bacteria from Mutant Phenotypes

10.1101/072470 ◽

2016 ◽

Cited By ~ 21

Author(s):

Morgan N. Price ◽

Kelly M. Wetmore ◽

R. Jordan Waters ◽

Mark Callaghan ◽

Jayashree Ray ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Hypothetical Proteins ◽

Data Set ◽

Protein Coding ◽

Bacterial Proteins ◽

Genome Wide ◽

Protein Functions ◽

Mutant Phenotypes ◽

Related Proteins

SummaryThe function of nearly half of all protein-coding genes identified in bacterial genomes remains unknown. To systematically explore the functions of these proteins, we generated saturated transposon mutant libraries from 25 diverse bacteria and we assayed mutant phenotypes across hundreds of distinct conditions. From 3,903 genome-wide mutant fitness assays, we obtained 14.9 million gene phenotype measurements and we identified a mutant phenotype for 8,487 proteins with previously unknown functions. The majority of these hypothetical proteins (57%) had phenotypes that were either specific to a few conditions or were similar to that of another gene, thus enabling us to make informed predictions of protein function. For 1,914 of these hypothetical proteins, the functional associations are conserved across related proteins from different bacteria, which confirms that these associations are genuine. This comprehensive catalogue of experimentally-annotated protein functions also enables the targeted exploration of specific biological processes. For example, sensitivity to a DNA-damaging agent revealed 28 known families of DNA repair proteins and 11 putative novel families. Across all sequenced bacteria, 14% of proteins that lack detailed annotations have an ortholog with a functional association in our data set. Our study demonstrates the utility and scalability of high-throughput genetics for large-scale annotation of bacterial proteins and provides a vast compendium of experimentally-determined protein functions across diverse bacteria.

Download Full-text

NPF：Network propagation for protein function prediction

10.21203/rs.3.rs-16452/v2 ◽

2020 ◽

Author(s):

Bihai Zhao ◽

Zhihong Zhang ◽

Meiping Jiang ◽

Sai Hu ◽

Yingchun Luo ◽

...

Keyword(s):

Protein Interaction ◽

Protein Function ◽

Cross Validation ◽

Function Prediction ◽

Protein Interaction Networks ◽

Functional Similarity ◽

Interaction Networks ◽

Omics Data ◽

Protein Functions ◽

Network Propagation

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.

Download Full-text

NPF：Network propagation for protein function prediction

10.21203/rs.3.rs-16452/v1 ◽

2020 ◽

Author(s):

bihai zhao ◽

Zhihong Zhang ◽

Meiping Jiang ◽

Sai Hu ◽

Yingchun Luo ◽

...

Keyword(s):

Protein Interaction ◽

Protein Function ◽

Function Prediction ◽

Biological Data ◽

Protein Interaction Networks ◽

Functional Similarity ◽

Interaction Networks ◽

Omics Data ◽

Protein Functions ◽

Network Propagation

Abstract Background: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, disease treatment and new drug development. Various methods have been developed to facilitate the prediction of functions by combining protein interaction networks (PINs) with multi-omics data. However, how to make full use of multiple biological data to improve the performance of functions annotation is still a dilemma.Results: We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. Comprehensive evaluation of NPF indicates that NPF archived higher performance than competing methods in terms of leave-one-out cross-validation and ten-fold cross validation.Conclusions: We demonstrated that network propagation combined with multi-omics data can not only discover more partners with similar function, but also effectively free from the constraints of the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information from protein correlations.

Download Full-text

NetQuilt: Deep Multispecies Network-based Protein Function Prediction using Homology-informed Network Similarity

Bioinformatics ◽

10.1093/bioinformatics/btab098 ◽

2021 ◽

Author(s):

Meet Barot ◽

Vladimir Gligorijević ◽

Kyunghyun Cho ◽

Richard Bonneau

Keyword(s):

Biological Networks ◽

Protein Function ◽

Functional Annotation ◽

Sequence Similarity ◽

Function Prediction ◽

Supplementary Information ◽

Learning Sequence ◽

Network Information ◽

Ppi Networks ◽

Multiple Species

Abstract Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method, and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability The code is freely available at https://github.com/nowittynamesleft/NetQuilt Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Wei2GO: weighted sequence similarity-based protein function prediction

10.1101/2020.04.24.059501 ◽

2020 ◽

Author(s):

Maarten J.M.F Reijnders

Keyword(s):

Gene Ontology ◽

Open Source ◽

Protein Function ◽

Large Scale ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Function Prediction ◽

Computational Time ◽

Web Servers ◽

Weighted Sequence

AbstractBackgroundProtein function prediction is an important part of bioinformatics and genomics studies. There are many different predictors available, however most of these are in the form of web-servers instead of open-source locally installable versions. Such local versions are necessary to perform large scale genomics studies due to the presence of limitations imposed by web servers such as queues, prediction speed, and updatability of databases.MethodsThis paper describes Wei2GO: a weighted sequence similarity and python-based open-source protein function prediction software. It uses DIAMOND and HMMScan sequence alignment searches against the UniProtKB and Pfam databases respectively, transfers Gene Ontology terms from the reference protein to the query protein, and uses a weighing algorithm to calculate a score for the Gene Ontology annotations.ResultsWei2GO is compared against the Argot2 and Argot2.5 web servers, which use a similar concept, and DeepGOPlus which acts as a reference. Wei2GO shows an increase in performance according to precision and recall curves, Fmax scores, and Smin scores for biological process and molecular function ontologies. Computational time compared to Argot2 and Argot2.5 is decreased from several hours to several minutes.AvailabilityWei2GO is written in Python 3, and can be found at https://gitlab.com/mreijnders/Wei2GO

Download Full-text

Application of dynamic expansion tree for finding large network motifs in biological networks

PeerJ ◽

10.7717/peerj.6917 ◽

2019 ◽

Vol 7 ◽

pp. e6917 ◽

Cited By ~ 1

Author(s):

Sabyasachi Patra ◽

Anjali Mohapatra

Keyword(s):

Biological Networks ◽

Protein Function ◽

Large Scale ◽

Network Motif ◽

Graph Isomorphism ◽

Interaction Network ◽

Motif Finding ◽

Network Motifs ◽

Large Network ◽

Scalable Network

Network motifs play an important role in the structural analysis of biological networks. Identification of such network motifs leads to many important applications such as understanding the modularity and the large-scale structure of biological networks, classification of networks into super-families, and protein function annotation. However, identification of large network motifs is a challenging task as it involves the graph isomorphism problem. Although this problem has been studied extensively in the literature using different computational approaches, still there is a lot of scope for improvement. Motivated by the challenges involved in this field, an efficient and scalable network motif finding algorithm using a dynamic expansion tree is proposed. The novelty of the proposed algorithm is that it avoids computationally expensive graph isomorphism tests and overcomes the space limitation of the static expansion tree (SET) which makes it enable to find large motifs. In this algorithm, the embeddings corresponding to a child node of the expansion tree are obtained from the embeddings of a parent node, either by adding a vertex or by adding an edge. This process does not involve any graph isomorphism check. The time complexity of vertex addition and edge addition are O(n) and O(1), respectively. The growth of a dynamic expansion tree (DET) depends on the availability of patterns in the target network. Pruning of branches in the DET significantly reduces the space requirement of the SET. The proposed algorithm has been tested on a protein–protein interaction network obtained from the MINT database. The proposed algorithm is able to identify large network motifs faster than most of the existing motif finding algorithms.

Download Full-text