scholarly journals Identifying protein complexes from heterogeneous biological data

2013 ◽  
Vol 81 (11) ◽  
pp. 2023-2033 ◽  
Author(s):  
Min Wu ◽  
Zhipeng Xie ◽  
Xiaoli Li ◽  
Chee‐Keong Kwoh ◽  
Jie Zheng
2019 ◽  
Vol 20 (S18) ◽  
Author(s):  
Wenxiang Zhang ◽  
Xiujuan Lei (IEEE member) ◽  
Chen Bian

Abstract Background It’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors. Results In this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section. Conclusion In summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective.


2011 ◽  
Vol 135-136 ◽  
pp. 602-608
Author(s):  
Ya Meng ◽  
Xue Qun Shang ◽  
Miao Miao ◽  
Miao Wang

Mining functional modules with biological significance has attracted lots of attention recently. However, protein-protein interaction (PPI) network and other biological data generally bear uncertainties attributed to noise, incompleteness and inaccuracy in practice. In this paper, we focus on received PPI data with uncertainties to explore interesting protein complexes. Moreover, some novel conceptions extended from known graph conceptions are used to develop a depth-first algorithm to mine protein complexes in a simple uncertain graph. Our experiments take protein complexes from MIPS database as standard of accessing experimental results. Experiment results indicate that our algorithm has good performance in terms of coverage and precision. Experimental results are also assessed on Gene Ontology (GO) annotation, and the evaluation demonstrates proteins of our most acquired protein complexes show a high similarity. Finally, several experiments are taken to test the scalability of our algorithm. The result is also observed.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Peng Liu ◽  
Lei Yang ◽  
Daming Shi ◽  
Xianglong Tang

A method for predicting protein-protein interactions based on detected protein complexes is proposed to repair deficient interactions derived from high-throughput biological experiments. Protein complexes are pruned and decomposed into small parts based on the adaptivek-cores method to predict protein-protein interactions associated with the complexes. The proposed method is adaptive to protein complexes with different structure, number, and size of nodes in a protein-protein interaction network. Based on different complex sets detected by various algorithms, we can obtain different prediction sets of protein-protein interactions. The reliability of the predicted interaction sets is proved by using estimations with statistical tests and direct confirmation of the biological data. In comparison with the approaches which predict the interactions based on the cliques, the overlap of the predictions is small. Similarly, the overlaps among the predicted sets of interactions derived from various complex sets are also small. Thus, every predicted set of interactions may complement and improve the quality of the original network data. Meanwhile, the predictions from the proposed method replenish protein-protein interactions associated with protein complexes using only the network topology.


2019 ◽  
Author(s):  
Maxime Folschette ◽  
Vincent Legagneux ◽  
Arnaud Poret ◽  
Lokmane Chebouba ◽  
Carito Guziolowski ◽  
...  

AbstractBackgroundIntegrating genome-wide gene expression patient profiles with regulatory knowledge is a challenging task because of the inherent heterogeneity, noise and incompleteness of biological data. From the computational side, several solvers for logic programs are able to perform extremely well in decision problems for combinatorial search domains. The challenge then is how to process the biological knowledge in order to feed these solvers to gain insights in a biological study. It requires formalizing the biological knowledge to give a precise interpretation of this information; currently, very few pathway databases offer this possibility.ResultsThe presented work proposes an automatic pipeline to extract automatically regulatory knowledge from pathway databases and generate novel computational predictions related to the state of expression or activity of biological molecules. We applied it in the context of hepatocellular carcinoma (HCC) progression, and evaluate the precision and the stability of these computational predictions. Our working base is a graph of 3,383 nodes and 13,771 edges extracted from the KEGG database, in which we integrate 209 differentially expressed genes between low and high aggressive HCC across 294 patients. Our computational model predicts the shifts of expression of 146 initially non-observed biological components. Our predictions were validated at 88% using a larger experimental dataset and cross-validation techniques. In particular, we focus on the protein complexes predictions and show for the first time that NFKB1/BCL-3 complexes are activated in aggressive HCC. In spite of the large dimension of the reconstructed models, our analyses over the computational predictions discover a well constrained region where KEGG regulatory knowledge constrains gene expression of several biomolecules. These regions can offer interesting windows to perturb experimentally such complex systems.ConclusionThis new pipeline allows biologists to develop their own predictive models based on a list of genes. It facilitates the identification of new regulatory biomolecules using knowledge graphs and predictive computational methods. Our workflow is implemented in an automatic python pipeline which is publicly available at https://github.com/LokmaneChebouba/key-pipe and contains as testing data all the data used in this paper.


2019 ◽  
Vol 2019 ◽  
pp. 1-17 ◽  
Author(s):  
Jinxiong Zhang ◽  
Cheng Zhong ◽  
Hai Xiang Lin ◽  
Mian Wang

Identification of protein complex is very important for revealing the underlying mechanism of biological processes. Many computational methods have been developed to identify protein complexes from static protein-protein interaction (PPI) networks. Recently, researchers are considering the dynamics of protein-protein interactions. Dynamic PPI networks are closer to reality in the cell system. It is expected that more protein complexes can be accurately identified from dynamic PPI networks. In this paper, we use the undulating degree above the base level of gene expression instead of the gene expression level to construct dynamic temporal PPI networks. Further we convert dynamic temporal PPI networks into dynamic Temporal Interval Protein Interaction Networks (TI-PINs) and propose a novel method to accurately identify more protein complexes from the constructed TI-PINs. Owing to preserving continuous interactions within temporal interval, the constructed TI-PINs contain more dynamical information for accurately identifying more protein complexes. Our proposed identification method uses multisource biological data to judge whether the joint colocalization condition, the joint coexpression condition, and the expanding cluster condition are satisfied; this is to ensure that the identified protein complexes have the features of colocalization, coexpression, and functional homogeneity. The experimental results on yeast data sets demonstrated that using the constructed TI-PINs can obtain better identification of protein complexes than five existing dynamic PPI networks, and our proposed identification method can find more protein complexes accurately than four other methods.


2017 ◽  
Author(s):  
Paola Pesántez-Cabrera ◽  
Ananth Kalyanaraman

AbstractMethods to efficiently uncover and extract community structures are required in a number of biological applications where networked data and their interactions can be modeled as graphs, and observing tightly-knit groups of vertices (“communities”) can offer insights into the structural and functional building blocks of the underlying network. Classical applications of community detection have largely focused on unipartite networks—i.e., graphs built out of a single type of objects. However, due to increased availability of biological data from various sources, there is now an increasing need for handling heterogeneous networks which are built out of multiple types of objects. In this paper, we address the problem of identifying communities from biological bipartite networks—i.e., networks where interactions are observed between two different types of objects (e.g., genes and diseases, drugs and protein complexes, plants and pollinators, hosts and pathogens). Toward detecting communities in such bipartite networks, we make the following contributions: i) (metric) we propose a variant of bipartite modularity; ii) (algorithms) we present an efficient algorithm called biLouvain that implements a set of heuristics toward fast and precise community detection in bipartite networks; and iii) (experiments) we present a thorough experimental evaluation of our algorithm including comparison to other state-of-the-art methods to identify communities in bipartite networks. Experimental results show that our biLouvain algorithm identifies communities that have a comparable or better quality (as measured by bipartite modularity) than existing methods, while significantly reducing the time-to-solution between one and four orders of magnitude.


2021 ◽  
Author(s):  
Iain Johnston ◽  
Kamaludin Dingle ◽  
Sam F Greenbury ◽  
Chico Q. Camargo ◽  
Jonathan P K Doye ◽  
...  

Engineers routinely design systems to be modular and symmetric in order to increase robustness to perturbations and to facilitate alterations at a later date. Biological structures also frequently exhibit modularity and symmetry, but the origin of such trends is much less well understood. It can be tempting to assume -- by analogy to engineering design -- that symmetry and modularity arise from natural selection. But evolution, unlike engineers, cannot plan ahead, and so these traits must also afford some immediate selective advantage which is hard to reconcile with the breadth of systems where symmetry is observed. Here we introduce an alternative non-adaptive hypothesis based on an algorithmic picture of evolution. It suggests that symmetric structures preferentially arise not just due to natural selection, but also because they require less specific information to encode, and are therefore much more likely to appear as phenotypic variation through random mutations. Arguments from algorithmic information theory can formalise this intuition, leading to the prediction that many genotype-phenotype maps are exponentially biased towards phenotypes with low descriptional complexity. A preference for symmetry is a special case of this bias towards compressible descriptions. We test these predictions with extensive biological data, showing that that protein complexes, RNA secondary structures, and a model gene-regulatory network all exhibit the expected exponential bias towards simpler (and more symmetric) phenotypes. Lower descriptional complexity also correlates with higher mutational robustness, which may aid the evolution of complex modular assemblies of multiple components.


Author(s):  
E. H. Egelman ◽  
X. Yu

The RecA protein of E. coli has been shown to mediate genetic recombination, regulate its own synthesis, control the expression of other genes, act as a specific protease, form a helical polymer and have an ATPase activity, among other observed properties. The unusual filament formed by the RecA protein on DNA has not previously been shown to exist outside of bacteria. Within this filament, the 36 Å pitch of B-form DNA is extended to about 95 Å, the pitch of the RecA helix. We have now establishedthat similar nucleo-protein complexes are formed by bacteriophage and yeast proteins, and availableevidence suggests that this structure is universal across all of biology, including humans. Thus, understanding the function of the RecA protein will reveal basic mechanisms, in existence inall organisms, that are at the foundation of general genetic recombination and repair.Recombination at this moment is assuming an importance far greater than just pure biology. The association between chromosomal rearrangements and neoplasms has become stronger and stronger, and these rearrangements are most likely products of the recombinatory apparatus of the normal cell. Further, damage to DNA appears to be a major cause of cancer.


Author(s):  
C.A. Mannella ◽  
K.F. Buttle ◽  
K.A. O‘Farrell ◽  
A. Leith ◽  
M. Marko

Early transmission electron microscopy of plastic-embedded, thin-sectioned mitochondria indicated that there are numerous junctions between the outer and inner membranes of this organelle. More recent studies have suggested that the mitochondrial membrane contacts may be the site of protein complexes engaged in specialized functions, e.g., import of mitochondrial precursor proteins, adenine nucleotide channeling, and even intermembrane signalling. It has been suggested that the intermembrane contacts may be sites of membrane fusion involving non-bilayer lipid domains in the two membranes. However, despite growing interest in the nature and function of intramitochondrial contact sites, little is known about their structure.We are using electron microscopic tomography with the Albany HVEM to determine the internal organization of mitochondria. We have reconstructed a 0.6-μm section through an isolated, plasticembedded rat-liver mitochondrion by combining 123 projections collected by tilting (+/- 70°) around two perpendicular tilt axes. The resulting 3-D image has confirmed the basic inner-membrane organization inferred from lower-resolution reconstructions obtained from single-axis tomography.


Author(s):  
L. T. Germinario ◽  
J. Blackwell ◽  
J. Frank

This report describes the use of digital correlation and averaging methods 1,2 for the reconstruction of high dose electron micrographs of the chitin-protein complex from Megarhyssa ovipositor. Electron microscopy of uranyl acetate stained insect cuticle has demonstrated a hexagonal array of unstained chitin monofibrils, 2.4−3.0 nm in diameter, in a stained protein matrix3,4. Optical diffraction Indicated a hexagonal lattice with a = 5.1-8.3 nm3 A particularly well ordered complex is found in the ovipositor of the ichneumon fly Megarhyssa: the small angle x-ray data gives a = 7.25 nm, and the wide angle pattern shows that the protein consists of subunits arranged in a 61 helix, with an axial repeat of 3.06 nm5.


Sign in / Sign up

Export Citation Format

Share Document