scholarly journals Functional geometry of protein interactomes

2019 ◽  
Vol 35 (19) ◽  
pp. 3727-3734 ◽  
Author(s):  
Noël Malod-Dognin ◽  
Nataša Pržulj

Abstract Motivation Protein–protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. Results To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker’s yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes. Availability and implementation Codes and datasets are freely available at http://www0.cs.ucl.ac.uk/staff/natasa/Simplets/. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 34 (10) ◽  
pp. 2050090
Author(s):  
Pengli Lu ◽  
JingJuan Yu

Essential protein plays a crucial role in the process of cell life. The identification of essential proteins not only promotes the development of drug target technology, but also contributes to the mechanism of biological evolution. There are plenty of scholars who pay attention to discover essential proteins according to the topological structure of protein network and biological information. The accuracy of protein recognition still demands to be improved. In this paper, we propose a method which integrates the clustering coefficient in protein complexes and topological properties to determine the essentiality of proteins. First, we give the definition of In-clustering coefficient (IC) to describe the properties of protein complexes. Then we propose a new method, complex edge and node clustering (CENC) coefficient, to identify essential proteins. Different Protein–Protein Interaction (PPI) networks of Saccharomyces cerevisiae, MIPS and DIP are used as experimental materials. Through some experiments of logistic regression model, the results show that the method of CENC can promote the ability of recognizing essential proteins by comparing with the existing methods DC, BC, EC, SC, LAC, NC and the recent UC method.


2020 ◽  
Vol 176 (3-4) ◽  
pp. 349-384
Author(s):  
Domenico Cantone ◽  
Marianna Nicolosi-Asmundo ◽  
Daniele Francesco Santamaria

In this paper we consider the most common TBox and ABox reasoning services for the description logic 𝒟ℒ〈4LQSR,x〉(D) ( 𝒟 ℒ D 4,× , for short) and prove their decidability via a reduction to the satisfiability problem for the set-theoretic fragment 4LQSR. 𝒟 ℒ D 4,× is a very expressive description logic. It combines the high scalability and efficiency of rule languages such as the SemanticWeb Rule Language (SWRL) with the expressivity of description logics. In fact, among other features, it supports Boolean operations on concepts and roles, role constructs such as the product of concepts and role chains on the left-hand side of inclusion axioms, role properties such as transitivity, symmetry, reflexivity, and irreflexivity, and data types. We further provide a KE-tableau-based procedure that allows one to reason on the main TBox and ABox reasoning tasks for the description logic 𝒟 ℒ D 4,× . Our algorithm is based on a variant of the KE-tableau system for sets of universally quantified clauses, where the KE-elimination rule is generalized in such a way as to incorporate the γ-rule. The novel system, called KEγ-tableau, turns out to be an improvement of the system introduced in [1] and of standard first-order KE-tableaux [2]. Suitable benchmark test sets executed on C++ implementations of the three mentioned systems show that in several cases the performances of the KEγ-tableau-based reasoner are up to about 400% better than the ones of the other two systems.


2018 ◽  
Vol 35 (15) ◽  
pp. 2535-2544 ◽  
Author(s):  
Dipan Shaw ◽  
Hao Chen ◽  
Tao Jiang

AbstractMotivationIsoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms.ResultsWe evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions.Availability and implementationhttps://github.com/dls03/DeepIsoFun/Supplementary informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 2013 ◽  
pp. 1-19
Author(s):  
Wai-Yuan Tan ◽  
Hong Zhou

To incorporate biologically observed epidemics into multistage models of carcinogenesis, in this paper we have developed new stochastic models for human cancers. We have further incorporated genetic segregation of cancer genes into these models to derive generalized mixture models for cancer incidence. Based on these models we have developed a generalized Bayesian approach to estimate the parameters and to predict cancer incidence via Gibbs sampling procedures. We have applied these models to fit and analyze the SEER data of human eye cancers from NCI/NIH. Our results indicate that the models not only provide a logical avenue to incorporate biological information but also fit the data much better than other models. These models would not only provide more insights into human cancers but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.


Sign in / Sign up

Export Citation Format

Share Document