PICKLE 3.0: enriching the human meta-database with the mouse protein interactome extended via mouse–human orthology

Abstract Summary The PICKLE 3.0 upgrade refers to the enrichment of this human protein–protein interaction (PPI) meta-database with the mouse protein interactome. Experimental PPI data between mouse genetic entities are rather limited; however, they are substantially complemented by PPIs between mouse and human genetic entities. The relational scheme of PICKLE 3.0 has been amended to exploit the Mouse Genome Informatics mouse–human ortholog gene pair collection, enabling (i) the extension through orthology of the mouse interactome with potentially valid PPIs between mouse entities based on the experimental PPIs between mouse and human entities and (ii) the comparison between mouse and human PPI networks. Interestingly, 43.5% of the experimental mouse PPIs lacks a corresponding by orthology PPI in human, an inconsistency in need of further investigation. Overall, as primary mouse PPI datasets show a considerably limited overlap, PICKLE 3.0 provides a unique comprehensive representation of the mouse protein interactome. Availability and implementation PICKLE can be queried and downloaded at http://www.pickle.gr. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique

Bioinformatics ◽

10.1093/bioinformatics/bty995 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2395-2402 ◽

Cited By ~ 39

Author(s):

Xiaoying Wang ◽

Bin Yu ◽

Anjun Ma ◽

Cheng Chen ◽

Bingqiang Liu ◽

...

Keyword(s):

Ensemble Learning ◽

Protein Interaction ◽

Feature Selection Method ◽

Feature Space ◽

Supplementary Information ◽

Sequence Profile ◽

Protein Protein Interaction ◽

Interaction Sites ◽

Imbalance Problem ◽

Ppi Networks

Abstract Motivation The prediction of protein–protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. Results A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2–15.7% and 6.1–18.9% higher than the other existing tools, respectively. Availability and implementation The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Selection maintains protein interactome resilience in the long-term evolution experiment with Escherichia coli

10.1101/2021.01.20.427477 ◽

2021 ◽

Author(s):

Rohan Maddamsetti

Keyword(s):

Experimental Evolution ◽

Long Term Evolution ◽

Rate Of Change ◽

Network Resilience ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Term Evolution ◽

Protein Interactome ◽

Evolution Experiment

AbstractMost cellular functions are carried out by a dynamic network of interacting proteins. An open question is whether the network properties of protein interactomes represent phenotypes under natural selection. One proposal is that protein interactomes have evolved to be resilient, such that they tend to maintain connectivity when proteins are removed from the network. This hypothesis predicts that interactome resilience should be maintained during long-term experimental evolution. I tested this prediction by modeling the evolution of protein-protein interaction (PPI) networks in Lenski’s long-term evolution experiment with Escherichia coli (LTEE). In this test, I removed proteins affected by nonsense, insertion, deletion, and transposon mutations in evolved LTEE strains, and measured the resilience of the resulting networks. I compared the rate of change of network resilience in each LTEE population to the rate of change of network resilience for corresponding randomized networks. The evolved PPI networks are significantly more resilient than networks in which random proteins have been deleted. Moreover, the evolved networks are generally more resilient than networks in which the random deletion of proteins was restricted to those disrupted in LTEE. These results suggest that evolution in the LTEE has favored PPI networks that are, on average, more resilient than expected from the genetic variation across the evolved populations. My findings therefore support the hypothesis that selection maintains protein interactome resilience over evolutionary time.Significance StatementUnderstanding how protein-protein interaction (PPI) networks evolve is a central goal of evolutionary systems biology. One property that has been hypothesized to be important for PPI network evolution is resilience, which means that networks tend to maintain connectivity even after many nodes (proteins in this case) have been removed. This hypothesis predicts that PPI network resilience should be maintained during long-term experimental evolution. Consistent with this prediction, I found that the PPI networks that evolved over 50,000 generations of Lenski’s long-term evolution experiment with E. coli are more resilient than expected by chance.

Download Full-text

HPOLabeler: improving prediction of human protein–phenotype associations by learning to rank

Bioinformatics ◽

10.1093/bioinformatics/btaa284 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4180-4188

Author(s):

Lizhi Liu ◽

Xiaodi Huang ◽

Hiroshi Mamitsuka ◽

Shanfeng Zhu

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Learning To Rank ◽

Supplementary Information ◽

Human Phenotype ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Naive Method ◽

Temporal Validation ◽

Ranking Problems

Abstract Motivation Annotating human proteins by abnormal phenotypes has become an important topic. Human Phenotype Ontology (HPO) is a standardized vocabulary of phenotypic abnormalities encountered in human diseases. As of November 2019, only <4000 proteins have been annotated with HPO. Thus, a computational approach for accurately predicting protein–HPO associations would be important, whereas no methods have outperformed a simple Naive approach in the second Critical Assessment of Functional Annotation, 2013–2014 (CAFA2). Results We present HPOLabeler, which is able to use a wide variety of evidence, such as protein–protein interaction (PPI) networks, Gene Ontology, InterPro, trigram frequency and HPO term frequency, in the framework of learning to rank (LTR). LTR has been proved to be powerful for solving large-scale, multi-label ranking problems in bioinformatics. Given an input protein, LTR outputs the ranked list of HPO terms from a series of input scores given to the candidate HPO terms by component learning models (logistic regression, nearest neighbor and a Naive method), which are trained from given multiple evidence. We empirically evaluate HPOLabeler extensively through mainly two experiments of cross validation and temporal validation, for which HPOLabeler significantly outperformed all component models and competing methods including the current state-of-the-art method. We further found that (i) PPI is most informative for prediction among diverse data sources and (ii) low prediction performance of temporal validation might be caused by incomplete annotation of new proteins. Availability and implementation http://issubmission.sjtu.edu.cn/hpolabeler/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Systems Biology Inferring edge function in protein-protein interaction networks

10.1101/321984 ◽

2018 ◽

Cited By ~ 2

Author(s):

Daniel Esposito ◽

Joseph Cursons ◽

Melissa Davis

Keyword(s):

Protein Interaction ◽

Supplementary Information ◽

Human Interactome ◽

High Coverage ◽

Protein Protein Interaction ◽

Cellular Processes ◽

Regulatory Processes ◽

Ppi Networks ◽

Biological Phenomena ◽

Biological Insight

AbstractMotivation: Post-translational modifications (PTMs) regulate many key cellular processes. Numerous studies have linked the topology of protein-protein interaction (PPI) networks to many biological phenomena such as key regulatory processes and disease. However, these methods fail to give insight in the functional nature of these interactions. On the other hand, pathways are commonly used to gain biological insight into the function of PPIs in the context of cascading interactions, sacrificing the coverage of networks for rich functional annotations on each PPI. We present a machine learning approach that uses Gene Ontology, InterPro and Pfam annotations to infer the edge functions in PPI networks, allowing us to combine the high coverage of networks with the information richness of pathways.Results: An ensemble method with a combination Logistic Regression and Random Forest classifiers trained on a high-quality set of annotated interactions, with a total of 18 unique labels, achieves high a average F1 score 0.88 despite not taking advantage of multi-label dependencies. When applied to the human interactome, our method confidently classifies 62% of interactions at a probability of 0.7 or higher.Availability: Software and data are available at https://github.com/DavisLaboratory/pyPPIContact:[email protected] information: Supplementary data are available at Bioinformatics online.

Download Full-text

A novel method for data fusion over entity-relation graphs and its application to protein–protein interaction prediction

Bioinformatics ◽

10.1093/bioinformatics/btab092 ◽

2021 ◽

Author(s):

Daniele Raimondi ◽

Jaak Simm ◽

Adam Arany ◽

Yves Moreau

Keyword(s):

Data Fusion ◽

Protein Interactions ◽

Protein Function ◽

Scientific Progress ◽

Supplementary Information ◽

Sources Of Information ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Proteome Level ◽

Clear Shift

Abstract Motivation Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress. Here, we present a novel non-linear data fusion framework that generalizes the conventional matrix factorization paradigm allowing inference over arbitrary entity-relation graphs, and we applied it to the prediction of protein–protein interactions (PPIs). Improving our knowledge of PPI networks at the proteome scale is indeed crucial to understand protein function, physiological and disease states and cell life in general. Results We devised three data fusion-based models for the proteome-level prediction of PPIs, and we show that our method outperforms state of the art approaches on common benchmarks. Moreover, we investigate its predictions on newly published PPIs, showing that this new data has a clear shift in its underlying distributions and we thus train and test our models on this extended dataset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Brief Survey of Biological Network Alignment and a Variant with Incorporation of Functional Annotations

Current Bioinformatics ◽

10.2174/1574893612666171020103747 ◽

2018 ◽

Vol 14 (1) ◽

pp. 4-10

Author(s):

Fang Jing ◽

Shao-Wu Zhang ◽

Shihua Zhang

Keyword(s):

Gene Ontology ◽

Topological Structure ◽

Metabolic Networks ◽

Biological Network ◽

Genomic Sequence ◽

Network Alignment ◽

Future Directions ◽

Functional Annotations ◽

Protein Protein Interaction ◽

Ppi Networks

Background:Biological network alignment has been widely studied in the context of protein-protein interaction (PPI) networks, metabolic networks and others in bioinformatics. The topological structure of networks and genomic sequence are generally used by existing methods for achieving this task.Objective and Method:Here we briefly survey the methods generally used for this task and introduce a variant with incorporation of functional annotations based on similarity in Gene Ontology (GO). Making full use of GO information is beneficial to provide insights into precise biological network alignment.Results and Conclusion:We analyze the effect of incorporation of GO information to network alignment. Finally, we make a brief summary and discuss future directions about this topic.

Download Full-text

Short loop functional commonality identified in leukaemia proteome highlights crucial protein sub-networks

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab010 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Sun Sook Chung ◽

Joseph C F Ng ◽

Anna Laddach ◽

N Shaun B Thomas ◽

Franca Fraternali

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Interaction Network ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Short Loop ◽

New Strategy ◽

Loop Network ◽

Protein Protein Interaction Network

Abstract Direct drug targeting of mutated proteins in cancer is not always possible and efficacy can be nullified by compensating protein–protein interactions (PPIs). Here, we establish an in silico pipeline to identify specific PPI sub-networks containing mutated proteins as potential targets, which we apply to mutation data of four different leukaemias. Our method is based on extracting cyclic interactions of a small number of proteins topologically and functionally linked in the Protein–Protein Interaction Network (PPIN), which we call short loop network motifs (SLM). We uncover a new property of PPINs named ‘short loop commonality’ to measure indirect PPIs occurring via common SLM interactions. This detects ‘modules’ of PPI networks enriched with annotated biological functions of proteins containing mutation hotspots, exemplified by FLT3 and other receptor tyrosine kinase proteins. We further identify functional dependency or mutual exclusivity of short loop commonality pairs in large-scale cellular CRISPR–Cas9 knockout screening data. Our pipeline provides a new strategy for identifying new therapeutic targets for drug discovery.

Download Full-text

In vivo interactome profiling by enzyme‐catalyzed proximity labeling

Cell & Bioscience ◽

10.1186/s13578-021-00542-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yangfan Xu ◽

Xianqun Fan ◽

Yang Hu

Keyword(s):

State Of The Art ◽

Catalytic Efficiency ◽

Biological Processes ◽

Protein Protein Interaction ◽

Current State ◽

Protein Interactome ◽

Potential Applications ◽

Protein Protein Interaction Networks ◽

Temporal And Spatial

AbstractEnzyme-catalyzed proximity labeling (PL) combined with mass spectrometry (MS) has emerged as a revolutionary approach to reveal the protein-protein interaction networks, dissect complex biological processes, and characterize the subcellular proteome in a more physiological setting than before. The enzymatic tags are being upgraded to improve temporal and spatial resolution and obtain faster catalytic dynamics and higher catalytic efficiency. In vivo application of PL integrated with other state of the art techniques has recently been adapted in live animals and plants, allowing questions to be addressed that were previously inaccessible. It is timely to summarize the current state of PL-dependent interactome studies and their potential applications. We will focus on in vivo uses of newer versions of PL and highlight critical considerations for successful in vivo PL experiments that will provide novel insights into the protein interactome in the context of human diseases.

Download Full-text

Identification of candidate biomarkers of liver hydatid disease via microarray profiling, bioinformatics analysis, and machine learning

Journal of International Medical Research ◽

10.1177/0300060521993980 ◽

2021 ◽

Vol 49 (3) ◽

pp. 030006052199398

Author(s):

Jinwu Peng ◽

Zhili Duan ◽

Yamin Guo ◽

Xiaona Li ◽

Xiaoqin Luo ◽

...

Keyword(s):

Random Forest ◽

Hydatid Disease ◽

Characteristic Curve ◽

Receiver Operator Characteristic Curve ◽

Random Forest Model ◽

Hepatic Hydatid Disease ◽

Forest Model ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Microarray Profiling

Objectives Liver echinococcosis is a severe zoonotic disease caused by Echinococcus (tapeworm) infection, which is epidemic in the Qinghai region of China. Here, we aimed to explore biomarkers and establish a predictive model for the diagnosis of liver echinococcosis. Methods Microarray profiling followed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analysis was performed in liver tissue from patients with liver hydatid disease and from healthy controls from the Qinghai region of China. A protein–protein interaction (PPI) network and random forest model were established to identify potential biomarkers and predict the occurrence of liver echinococcosis, respectively. Results Microarray profiling identified 1152 differentially expressed genes (DEGs), including 936 upregulated genes and 216 downregulated genes. Several previously unreported biological processes and signaling pathways were identified. The FCGR2B and CTLA4 proteins were identified by the PPI networks and random forest model. The random forest model based on FCGR2B and CTLA4 reliably predicted the occurrence of liver hydatid disease, with an area under the receiver operator characteristic curve of 0.921. Conclusion Our findings give new insight into gene expression in patients with liver echinococcosis from the Qinghai region of China, improving our understanding of hepatic hydatid disease.

Download Full-text

CATH functional families predict functional sites in proteins

Bioinformatics ◽

10.1093/bioinformatics/btaa937 ◽

2020 ◽

Author(s):

Sayoni Das ◽

Harry M Scholes ◽

Neeladri Sen ◽

Christine Orengo

Keyword(s):

Functional Characterization ◽

Functional Site ◽

Training Data ◽

Supplementary Information ◽

Conserved Residues ◽

Functional Sites ◽

Protein Protein Interaction ◽

Evolutionary Features ◽

Functional Families

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text