GO Semantic Similarity-Based False Positive Reduction of Protein-Protein Interactions

Global studies of protein–protein interactions are crucial to both elucidating gene function and producing an integrated view of the workings of living cells. High-throughput studies of the yeast interactome have been performed using both genetic and biochemical screens. Despite their size, the overlap between these experimental datasets is very limited. This could be due to each approach sampling only a small fraction of the total interactome. Alternatively, a large proportion of the data from these screens may represent false-positive interactions. We have used the Genome Information Management System (GIMS) to integrate interactome datasets with transcriptome and protein annotation data and have found significant evidence that the proportion of false-positive results is high. Not all high-throughput datasets are similarly contaminated, and the tandem affinity purification (TAP) approach appears to yield a high proportion of reliable interactions for which corroborating evidence is available. From our integrative analyses, we have generated a set of verified interactome data for yeast.

Download Full-text

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings

BMC Genomics ◽

10.1186/s12864-019-6272-2 ◽

2019 ◽

Vol 20 (S9) ◽

Cited By ~ 3

Author(s):

Xiaoshi Zhong ◽

Rama Kaalia ◽

Jagath C. Rajapakse

Keyword(s):

Information Content ◽

Semantic Similarity ◽

Protein Interactions ◽

Large Scale ◽

Functional Similarity ◽

Experimental Results ◽

Graph Embeddings ◽

Protein Protein Interactions ◽

Vector Representations ◽

Go Terms

Abstract Background Semantic similarity between Gene Ontology (GO) terms is a fundamental measure for many bioinformatics applications, such as determining functional similarity between genes or proteins. Most previous research exploited information content to estimate the semantic similarity between GO terms; recently some research exploited word embeddings to learn vector representations for GO terms from a large-scale corpus. In this paper, we proposed a novel method, named GO2Vec, that exploits graph embeddings to learn vector representations for GO terms from GO graph. GO2Vec combines the information from both GO graph and GO annotations, and its learned vectors can be applied to a variety of bioinformatics applications, such as calculating functional similarity between proteins and predicting protein-protein interactions. Results We conducted two kinds of experiments to evaluate the quality of GO2Vec: (1) functional similarity between proteins on the Collaborative Evaluation of GO-based Semantic Similarity Measures (CESSM) dataset and (2) prediction of protein-protein interactions on the Yeast and Human datasets from the STRING database. Experimental results demonstrate the effectiveness of GO2Vec over the information content-based measures and the word embedding-based measures. Conclusion Our experimental results demonstrate the effectiveness of using graph embeddings to learn vector representations from undirected GO and GOA graphs. Our results also demonstrate that GO annotations provide useful information for computing the similarity between GO terms and between proteins.

Download Full-text

Predicting shrimp protein-protein interactions and gene ontology terms using association rule and semantic similarity calculation

2014 International Computer Science and Engineering Conference (ICSEC) ◽

10.1109/icsec.2014.6978208 ◽

2014 ◽

Author(s):

Sirintra Vaiwsri ◽

Anuphap Prachumwat ◽

Sudsanguan Ngamsuriyaroj ◽

Ananta Srisuphab

Keyword(s):

Gene Ontology ◽

Semantic Similarity ◽

Protein Interactions ◽

Association Rule ◽

Protein Protein Interactions ◽

Similarity Calculation

Download Full-text

A semantic similarity based methodology for predicting protein-protein interactions: Evaluation with P53-interacting kinases

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2020.103579 ◽

2020 ◽

Vol 111 ◽

pp. 103579

Author(s):

Steven Cox ◽

Xialan Dong ◽

Ruhi Rai ◽

Laura Christopherson ◽

Weifan Zheng ◽

...

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Protein Protein Interactions

Download Full-text

Selection of GO-Based Semantic Similarity Measures through AMDE for Predicting Protein-Protein Interactions

Swarm, Evolutionary, and Memetic Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-27242-4_7 ◽

2011 ◽

pp. 55-62

Author(s):

Anirban Mukhopadhyay ◽

Moumita De ◽

Ujjwal Maulik

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Similarity Measures ◽

Protein Protein Interactions ◽

Selection Of

Download Full-text

Discovering novel protein–protein interactions by measuring the protein semantic similarity from the biomedical literature

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014420086 ◽

2014 ◽

Vol 12 (06) ◽

pp. 1442008 ◽

Cited By ~ 4

Author(s):

Jung-Hsien Chiang ◽

Jiun-Huang Ju

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Similarity Measures ◽

Biomedical Literature ◽

Biological Research ◽

Protein Protein Interactions ◽

Automated Identification ◽

Learning Classifier ◽

Novel Method ◽

Novel Protein

Protein–protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein–protein interactions.

Download Full-text

Semantic Similarity-based Validation of Human Protein-Protein Interactions

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05) ◽

10.1109/csbw.2005.122 ◽

2006 ◽

Author(s):

Xiang Guo ◽

Hai Hu ◽

M.N. Liebman ◽

C.D. Shriver

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Human Protein ◽

Protein Protein Interactions

Download Full-text

Combining Semantic Similarity and GO Enrichment for Computation of Functional Similarity

10.1101/155689 ◽

2017 ◽

Author(s):

Wenting Liu ◽

Jianjun Liu ◽

Jagath C. Rajapakse

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Similarity Measures ◽

Functional Similarity ◽

Protein Protein Interactions ◽

Go Enrichment ◽

Sequence Homologies ◽

Benchmark Datasets ◽

Novel Method ◽

Go Terms

AbstractFunctional similarity between genes is widely used in many bioinformatics applications including detecting molecular pathways, finding co-expressed genes, predicting protein-protein interactions, and prioritization of candidate genes. Methods evaluating functional similarity of genes are mostly based on semantic similarity of gene ontology (GO) terms. Though there are hundreds of functional similarity measures available in the literature, none of them considers the enrichment of the GO terms by the querying gene pair. We propose a novel method to incorporate GO enrichment into the existing functional similarity measures. Our experiments show that the inclusion of gene enrichment significantly improves the performance of 44 widely used functional similarity measures, especially in the prediction of sequence homologies, gene expression correlations, and protein-protein interactions.Software availabilityThe software (python code) and all the benchmark datasets evaluation (R script) are available at https://gitlab.com/liuwt/EnrichFunSim.

Download Full-text

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain

Database ◽

10.1093/database/baaa078 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Carlota Cardoso ◽

Rita T Sousa ◽

Sebastian Köhler ◽

Catia Pesquita

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Knowledge Graph ◽

Data Sets ◽

Biomedical Domain ◽

Protein Protein Interactions ◽

Data Set ◽

Human Phenotype ◽

Benchmark Data ◽

Gene Similarity

Abstract The ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning. It is of particular importance in the biomedical domain, where semantic similarity can be applied to the prediction of protein–protein interactions, associations between diseases and genes, cellular localization of proteins, among others. In recent years, several knowledge graph-based semantic similarity measures have been developed, but building a gold standard data set to support their evaluation is non-trivial. We present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity. These data sets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, and explore proxy similarities calculated based on protein sequence similarity, protein family similarity, protein–protein interactions and phenotype-based gene similarity. Data sets have varying sizes and cover four different species at different levels of annotation completion. For each data set, we also provide semantic similarity computations with state-of-the-art representative measures. Database URL: https://github.com/liseda-lab/kgsim-benchmark.

Download Full-text