Impact of low-confidence interactions on computational identification of protein complexes

Protein complexes are the cornerstones of most of the biological processes. Identifying protein complexes is crucial in understanding the principles of cellular organization with several important applications, including in disease diagnosis. Several computational techniques have been developed to identify protein complexes from protein–protein interaction (PPI) data (equivalently, from PPI networks). These PPI data have a significant amount of false positives, which is a bottleneck in identifying protein complexes correctly. Gene ontology (GO)-based semantic similarity measures can be used to assign a confidence score to PPIs. Consequently, low-confidence PPIs are highly likely to be false positives. In this paper, we systematically study the impact of low-confidence PPIs on the performance of complex detection methods using GO-based semantic similarity measures. We consider five state-of-the-art complex detection algorithms and nine GO-based similarity measures in the evaluation. We find that each complex detection algorithm significantly improves its performance after the filtration of low-similarity scored PPIs. It is also observed that the percentage improvement and the filtration percentage (of low-confidence PPIs) are highly correlated.

Download Full-text

A SURVEY OF COMPUTATIONAL METHODS FOR PROTEIN COMPLEX PREDICTION FROM PROTEIN INTERACTION NETWORKS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001230002x ◽

2013 ◽

Vol 11 (02) ◽

pp. 1230002 ◽

Cited By ~ 72

Author(s):

SRIGANESH SRIHARI ◽

HON WAI LEONG

Keyword(s):

Computational Methods ◽

High Throughput ◽

Protein Complexes ◽

Detection Methods ◽

Physical Interaction ◽

Protein Complex Prediction ◽

Ppi Networks ◽

History Of ◽

Complex Detection ◽

Key Aspects

Complexes of physically interacting proteins are one of the fundamental functional units responsible for driving key biological mechanisms within the cell. Their identification is therefore necessary to understand not only complex formation but also the higher level organization of the cell. With the advent of "high-throughput" techniques in molecular biology, significant amount of physical interaction data has been cataloged from organisms such as yeast, which has in turn fueled computational approaches to systematically mine complexes from the network of physical interactions among proteins (PPI network). In this survey, we review, classify and evaluate some of the key computational methods developed till date for the identification of protein complexes from PPI networks. We present two insightful taxonomies that reflect how these methods have evolved over the years toward improving automated complex prediction. We also discuss some open challenges facing accurate reconstruction of complexes, the crucial ones being the presence of high proportion of errors and noise in current high-throughput datasets and some key aspects overlooked by current complex detection methods. We hope this review will not only help to condense the history of computational complex detection for easy reference but also provide valuable insights to drive further research in this area.

Download Full-text

Identifying Protein Complexes from PPI Networks Using GO Semantic Similarity

2011 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2011.52 ◽

2011 ◽

Cited By ~ 1

Author(s):

Jian Wang ◽

Dong Xie ◽

Hongfei Lin ◽

Zhihao Yang ◽

Yijia Zhang

Keyword(s):

Semantic Similarity ◽

Protein Complexes ◽

Ppi Networks

Download Full-text

A Novel Computational Framework to Predict the Impact of a Point Mutation on PDZ Domain Classification

10.1101/244251 ◽

2018 ◽

Author(s):

Muhammad Moinuddin ◽

Wasim Aftab ◽

Adnan Memic

Keyword(s):

Usher Syndrome ◽

Pdz Domain ◽

Point Mutations ◽

Similarity Measures ◽

Bigram Frequency ◽

Computational Techniques ◽

Pdz Domains ◽

Computational Framework ◽

Class 1 ◽

The Impact

AbstractPDZ domains represent one of the most common protein homology regions playing key roles in several diseases. Point mutations (PM) in amino acid primary sequence of PDZ domains can alter domain functions by affecting for example, downstream phosphorylation, a pivotal process in biology. Our goal in this present study was to introduce a novel approach to investigate how point mutations within the Class 1, Class 2 and Class 1–2 PDZ domains could affect the changes in binding with their partner ligands and hence affect their classification. We focused on features in PDZ domains of various species including human, rat and mouse. However, our work represents a generic computational framework that could be used to analyze PM in any given PDZ sequence. We have adopted two different approaches to investigate the impact of PM. In the first approach, we have developed a statistical model using bigram frequencies of amino acids and employed six different similarity measures to contrast the bigram frequency distribution of a wild type sequence relevant to its point mutants. In the next approach, we developed a statistical method that incorporates the impact of bigram frequency history associated with each mutational site that we call history weighted conditional change in probabilities. In this PM study, we observed that the history weighted method performs best when compared to all other methods studied in terms of picking up sites in PDZ domain where a PM could flip the class. We anticipate that this method will present a step forward towards computational techniques unveiling PDZ domain point mutants that largely affect the protein-ligand binding, specificity and affinity. We hope that this and future studies could aid therapy in which PDZ mutations have been implicated as the main disease drivers such as the Usher syndrome.

Download Full-text

Denoising Protein–Protein interaction network via variational graph auto-encoder for protein complex detection

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020400107 ◽

2020 ◽

Vol 18 (03) ◽

pp. 2040010 ◽

Cited By ~ 1

Author(s):

Heng Yao ◽

Jihong Guan ◽

Tianying Liu

Keyword(s):

Protein Interaction ◽

Protein Complex ◽

Protein Complexes ◽

Empirical Evaluation ◽

High Rate ◽

Detection Methods ◽

Convolutional Network ◽

Protein Protein Interaction ◽

Protein Complex Detection ◽

Complex Detection

Identifying protein complexes is an important issue in computational biology, as it benefits the understanding of cellular functions and the design of drugs. In the past decades, many computational methods have been proposed by mining dense subgraphs in Protein–Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents accurately detecting complexes directly from the raw PINs. In this paper, we propose a denoising approach for protein complex detection by using variational graph auto-encoder. First, we embed a PIN to vector space by a stacked graph convolutional network (GCN), then decide which interactions in the PIN are credible. If the probability of an interaction being credible is less than a threshold, we delete the interaction. In such a way, we reconstruct a reliable PIN. Following that, we detect protein complexes in the reconstructed PIN by using several typical detection methods, including CPM, Coach, DPClus, GraphEntropy, IPCA and MCODE, and compare the results with those obtained directly from the original PIN. We conduct the empirical evaluation on four yeast PPI datasets (Gavin, Krogan, DIP and Wiphi) and two human PPI datasets (Reactome and Reactomekb), against two yeast complex benchmarks (CYC2008 and MIPS) and three human complex benchmarks (REACT, REACT_uniprotkb and CORE_COMPLEX_human), respectively. Experimental results show that with the reconstructed PINs obtained by our denoising approach, complex detection performance can get obviously boosted, in most cases by over 5%, sometimes even by 200%. Furthermore, we compare our approach with two existing denoising methods (RWS and RedNemo) while varying different matching rates on separate complex distributions. Our results show that in most cases (over 2/3), the proposed approach outperforms the existing methods.

Download Full-text

Identifying Hierarchical and Overlapping Protein Complexes Based on Essential Protein-Protein Interactions and “Seed-Expanding” Method

BioMed Research International ◽

10.1155/2014/838714 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12

Author(s):

Jun Ren ◽

Wei Zhou ◽

Jianxin Wang

Keyword(s):

Protein Interactions ◽

Time Complexity ◽

Protein Complexes ◽

Hierarchical Organization ◽

Experimental Results ◽

Functional Enrichment ◽

Detection Methods ◽

Ppi Network ◽

Protein Protein Interactions ◽

Ppi Networks

Many evidences have demonstrated that protein complexes are overlapping and hierarchically organized in PPI networks. Meanwhile, the large size of PPI network wants complex detection methods have low time complexity. Up to now, few methods can identify overlapping and hierarchical protein complexes in a PPI network quickly. In this paper, a novel method, called MCSE, is proposed based onλ-module and “seed-expanding.” First, it chooses seeds as essential PPIs or edges with high edge clustering values. Then, it identifies protein complexes by expanding each seed to aλ-module. MCSE is suitable for large PPI networks because of its low time complexity. MCSE can identify overlapping protein complexes naturally because a protein can be visited by different seeds. MCSE uses the parameterλ_th to control the range of seed expanding and can detect a hierarchical organization of protein complexes by tuning the value ofλ_th. Experimental results ofS. cerevisiaeshow that this hierarchical organization is similar to that of known complexes in MIPS database. The experimental results also show that MCSE outperforms other previous competing algorithms, such as CPM, CMC, Core-Attachment, Dpclus, HC-PIN, MCL, and NFC, in terms of the functional enrichment and matching with known protein complexes.

Download Full-text

Information Content-Based Gene Ontology Semantic Similarity Approaches: Toward a Unified Framework Theory

BioMed Research International ◽

10.1155/2013/292063 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 31

Author(s):

Gaston K. Mazandu ◽

Nicola J. Mulder

Keyword(s):

Gene Ontology ◽

Information Content ◽

Semantic Similarity ◽

Experimental Evaluation ◽

Similarity Measures ◽

Mathematical Framework ◽

Unified Framework ◽

The Impact ◽

Unified Description ◽

Similarity Scores

Several approaches have been proposed for computing term information content (IC) and semantic similarity scores within the gene ontology (GO) directed acyclic graph (DAG). These approaches contributed to improving protein analyses at the functional level. Considering the recent proliferation of these approaches, a unified theory in a well-defined mathematical framework is necessary in order to provide a theoretical basis for validating these approaches. We review the existing IC-based ontological similarity approaches developed in the context of biomedical and bioinformatics fields to propose a general framework and unified description of all these measures. We have conducted an experimental evaluation to assess the impact of IC approaches, different normalization models, and correction factors on the performance of a functional similarity metric. Results reveal that considering only parents or only children of terms when assessing information content or semantic similarity scores negatively impacts the approach under consideration. This study produces a unified framework for current and future GO semantic similarity measures and provides theoretical basics for comparing different approaches. The experimental evaluation of different approaches based on different term information content models paves the way towards a solution to the issue of scoring a term’s specificity in the GO DAG.

Download Full-text

The impact of protein interaction networks’ characteristics on computational complex detection methods

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2017.12.002 ◽

2018 ◽

Vol 439 ◽

pp. 141-151 ◽

Cited By ~ 7

Author(s):

Xiaoxia Liu ◽

Zhihao Yang ◽

Ziwei Zhou ◽

Yuanyuan Sun ◽

Hongfei Lin ◽

...

Keyword(s):

Protein Interaction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Detection Methods ◽

Complex Detection ◽

The Impact

Download Full-text

Predicting False Positives of Protein-Protein Interaction Data by Semantic Similarity Measures§

Current Bioinformatics ◽

10.2174/1574893611308030009 ◽

2013 ◽

Vol 8 (3) ◽

pp. 339-346 ◽

Cited By ~ 12

Author(s):

George Montanez ◽

Young-Rae Cho

Keyword(s):

Semantic Similarity ◽

Protein Interaction ◽

Similarity Measures ◽

False Positives ◽

Protein Interaction Data ◽

Interaction Data ◽

Protein Protein Interaction

Download Full-text

Computationally Derived Adaptive Inspirational Stimuli for Real-Time Design Support During Concept Generation

Volume 7: 31st International Conference on Design Theory and Methodology ◽

10.1115/detc2019-98188 ◽

2019 ◽

Author(s):

Kosa Goucher-Lambert ◽

Joshua T. Gyory ◽

Kenneth Kotovsky ◽

Jonathan Cagan

Keyword(s):

Real Time ◽

Semantic Similarity ◽

Semantic Analysis ◽

A Priori ◽

Similarity Measures ◽

Design Activity ◽

Concept Generation ◽

Experimental Conditions ◽

Final Design ◽

The Impact

Abstract Design activity can be supported using inspirational stimuli (e.g., analogies, patents, etc.), by helping designers overcome impasses or in generating solutions with more positive characteristics during ideation. Design researchers typically generate inspirational stimuli a priori in order to investigate their impact. However, for a chosen stimulus to possess maximal utility, it should automatically reflect the current and ongoing progress of the designer. In this work, designers receive computationally selected inspirational stimuli midway through an ideation session in response to the state of their current solution. Sourced from a broad database of related example solutions, the semantic similarity between the content of the current design and concepts within the database determine which potential stimulus is received. Designers receive a particular stimulus based on three experimental conditions: a semantically near stimulus, a semantically far stimulus, or no stimulus (control). Results indicate that adaptive inspirational stimuli can be determined using Latent Semantic Analysis (LSA) and that semantic similarity measures are a promising approach for real-time monitoring of the design process. The ability to achieve differentiable near vs. far stimuli was validated using both semantic cosine similarity values and participant self-response ratings. As a further contribution, this work also explores the impact of different types of adaptive inspirational stimuli on design outcomes. Here, near inspirational stimuli increase the feasibility of design solutions. Results also demonstrate the significant impact of the overall inspirational stimulus innovativeness on final design outcomes, which may be greater than differences across individual sub-dimensions.

Download Full-text

Adaptive Inspirational Design Stimuli: Using Design Output to Computationally Search for Stimuli That Impact Concept Generation

Journal of Mechanical Design ◽

10.1115/1.4046077 ◽

2020 ◽

Vol 142 (9) ◽

Author(s):

Kosa Goucher-Lambert ◽

Joshua T. Gyory ◽

Kenneth Kotovsky ◽

Jonathan Cagan

Keyword(s):

Semantic Similarity ◽

Semantic Analysis ◽

Similarity Measures ◽

Design Activity ◽

Concept Generation ◽

Experimental Conditions ◽

Final Design ◽

Design Solutions ◽

Design Innovation ◽

The Impact

Abstract Design activity can be supported using inspirational stimuli (e.g., analogies, patents) by helping designers overcome impasses or in generating solutions with more positive characteristics during ideation. Design researchers typically generate inspirational stimuli a priori in order to investigate their impact. However, for a chosen stimulus to possess maximal utility, it should automatically reflect the current and ongoing progress of the designer. In this work, designers receive computationally selected inspirational stimuli midway through an ideation session in response to the contents of their current solution. Sourced from a broad database of related example solutions, the semantic similarity between the content of the current design and concepts within the database determines which potential stimulus is received. Designers receive a particular stimulus based on three experimental conditions: a semantically near stimulus, a semantically far stimulus, or no stimulus (control). Results indicate that adaptive inspirational stimuli can be determined using latent semantic analysis (LSA) and that semantic similarity measures are a promising approach for real-time monitoring of the design process. The ability to achieve differentiable near versus far stimuli was validated using both semantic cosine similarity values and participant self-response ratings. As a further contribution, this work also explores the impact of different types of adaptive inspirational stimuli on design outcomes using a newly introduced “design innovation” measure. The design innovation measure mathematically captures the overall goodness of a design concept by uniquely combining expert ratings across easier to evaluate subdimensions of feasibility, usefulness, and novelty. While results demonstrate that near inspirational stimuli increase the feasibility of design solutions, they also show the significant impact of the overall inspirational stimulus innovativeness on final design outcomes. In fact, participants are more likely to generate innovative final design solutions when given innovative inspirational stimuli, regardless of their experimental condition.

Download Full-text