An adaptive refinement for community detection methods for disease module identification in biological networks using novel metric based on connectivity, conductance & modularity

Author(s):  
Raghvendra Mall ◽  
Ehsan Ullah ◽  
Khalid Kunji ◽  
Halima Bensmail ◽  
Michele Ceccarelli
F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1286
Author(s):  
Dimitri Perrin ◽  
Guido Zuccon

Biological networks are highly modular and contain a large number of clusters, which are often associated with a specific biological function or disease. Identifying these clusters, or modules, is therefore valuable, but it is not trivial. In this article we propose a recursive method based on the Louvain algorithm for community detection and the PageRank algorithm for authoritativeness weighting in networks. PageRank is used to initialise the weights of nodes in the biological network; the Louvain algorithm with the Newman-Girvan criterion for modularity is then applied to the network to identify modules. Any identified module with more than k nodes is further processed by recursively applying PageRank and Louvain, until no module contains more than k nodes (where k is a parameter of the method, no greater than 100). This method is evaluated on a heterogeneous set of six biological networks from the Disease Module Identification DREAM Challenge. Empirical findings suggest that the method is effective in identifying a large number of significant modules, although with substantial variability across restarts of the method.


2019 ◽  
Vol 10 ◽  
Author(s):  
Beethika Tripathi ◽  
Srinivasan Parthasarathy ◽  
Himanshu Sinha ◽  
Karthik Raman ◽  
Balaraman Ravindran

2019 ◽  
Author(s):  
Hongzhu Cui ◽  
Suhas Srinivasan ◽  
Dmitry Korkin

AbstractProgress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, DIMSUM, which enables the integration of genome-wide association studies (GWAS), functional effects of mutations, and protein-protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for disease module analysis, facilitating discovery of new disease markers.


2017 ◽  
Author(s):  
Raghvendra Mall ◽  
Ehsan Ullah ◽  
Khalid Kunjia ◽  
Halima Bensmail

AbstractMotivationBiological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures.MethodsThe task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast two-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. We then apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities.ResultsOur proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest.ConclusionThe proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/ raghvendramallmlresearcher/codes.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1042 ◽  
Author(s):  
Gilles Didier ◽  
Alberto Valdeolivas ◽  
Anaïs Baudot

The identification of communities, or modules, is a common operation in the analysis of large biological networks. The Disease Module Identification DREAM challenge established a framework to evaluate clustering approaches in a biomedical context, by testing the association of communities with GWAS-derived common trait and disease genes. We implemented here several extensions of the MolTi software that detects communities by optimizing multiplex (and monoplex) network modularity. In particular, MolTi now runs a randomized version of the Louvain algorithm, can consider edge and layer weights, and performs recursive clustering. On simulated networks, the randomization procedure clearly improves the detection of communities. On the DREAM challenge benchmark, the results strongly depend on the selected GWAS dataset and enrichment p-value threshold. However, the randomization procedure, as well as the consideration of weighted edges and layers generally increases the number of trait and disease community detected. The new version of MolTi and the scripts used for the DMI DREAM challenge are available at: https://github.com/gilles-didier/MolTi-DREAM.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 378 ◽  
Author(s):  
Raghvendra Mall ◽  
Ehsan Ullah ◽  
Khalid Kunji ◽  
Michele Ceccarelli ◽  
Halima Bensmail

Disease processes are usually driven by several genes interacting in molecular modules or pathways leading to the disease. The identification of such modules in gene or protein networks is the core of computational methods in biomedical research. With this pretext, the Disease Module Identification (DMI) DREAM Challenge was initiated as an effort to systematically assess module identification methods on a panel of 6 diverse genomic networks. In this paper, we propose a generic refinement method based on ideas of merging and splitting the hierarchical tree obtained from any community detection technique for constrained DMI in biological networks. The only constraint was that size of community is in the range [3, 100]. We propose a novel model evaluation metric, called F-score, computed from several unsupervised quality metrics like modularity, conductance and connectivity to determine the quality of a graph partition at given level of hierarchy. We also propose a quality measure, namely Inverse Confidence, which ranks and prune insignificant modules to obtain a curated list of candidate disease modules (DM) for biological network. The predicted modules are evaluated on the basis of the total number of unique candidate modules that are associated with complex traits and diseases from over 200 genome-wide association study (GWAS) datasets. During the competition, we identified 42 modules, ranking 15th at the official false detection rate (FDR) cut-off of 0.05 for identifying statistically significant DM in the 6 benchmark networks. However, for stringent FDR cut-offs 0.025 and 0.01, the proposed method identified 31 (rank 9) and 16 DMIs (rank 10) respectively. From additional analysis, our proposed approach detected a total of 44 DM in the networks in comparison to 60 for the winner of DREAM Challenge. Interestingly, for several individual benchmark networks, our performance was better or competitive with the winner.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 933 ◽  
Author(s):  
Cui ◽  
Srinivasan ◽  
Korkin

Rapid progress in high-throughput -omics technologies moves us one step closer to the datacalypse in life sciences. In spite of the already generated volumes of data, our knowledge of the molecular mechanisms underlying complex genetic diseases remains limited. Increasing evidence shows that biological networks are essential, albeit not sufficient, for the better understanding of these mechanisms. The identification of disease-specific functional modules in the human interactome can provide a more focused insight into the mechanistic nature of the disease. However, carving a disease network module from the whole interactome is a difficult task. In this paper, we propose a computational framework, Discovering most IMpacted SUbnetworks in interactoMe (DIMSUM), which enables the integration of genome-wide association studies (GWAS) and functional effects of mutations into the protein–protein interaction (PPI) network to improve disease module detection. Specifically, our approach incorporates and propagates the functional impact of non-synonymous single nucleotide polymorphisms (nsSNPs) on PPIs to implicate the genes that are most likely influenced by the disruptive mutations, and to identify the module with the greatest functional impact. Comparison against state-of-the-art seed-based module detection methods shows that our approach could yield modules that are biologically more relevant and have stronger association with the studied disease. We expect for our method to become a part of the common toolbox for the disease module analysis, facilitating the discovery of new disease markers.


2020 ◽  
Author(s):  
Chao Li ◽  
Kun He ◽  
Guang shuai Liu ◽  
John E. Hopcroft

Abstract BackgroundDiscovering functional modules in protein-protein interaction networks through optimization remains a longstanding challenge in Biology. Traditional algorithms simply consider strong protein complexes found in the original network by optimizing some metric, which may cause obstacles for discovering weak and hidden complexes that are overshadowed by strong complexes. Additionally, protein complexes have not only different densities but also various ranges of scales, making them extremely difficult to be detected. We address these issues and propose a hierarchical hidden community detection approach to predict protein complexes of various strengths and scales accurately. ResultsWe propose a meta-method called HirHide (Hierarchical Hidden Community Detection). It is the first combination of hierarchical structure with hidden structure, which provides a new perspective for finding protein complexes of various strengths and scales. We compare the performance of several standard community detection methods with their HirHide versions. Experimental results show that the HirHide versions achieve better performance and sometimes even significantly outperform the baselines. ConclusionsHirHide can adopt any standard community detection method as the base algorithm and enable it to discover hidden hierarchical communities as well as boosting the detection of strong hierarchical communities. Some biological networks are too complex for standard community detection algorithms to produce a positive performance. Most of the time, a better choice is to choose a corresponding algorithm based on the characteristics of a specific biological network. Under these circumstances, HirHide has clear advantages because of its flexibility. At the same time, according to the natural hierarchy of cells, organelle, intracellular compound etc., hierarchical structure with hidden structure is in line with the characteristics of the data itself, thus helping researchers to study biological interactions more deeply.


Author(s):  
Michael Banf

Here we present a fast and highly scalable community structure preserving network module detection that recursively integrates graph sparsification and clustering. Our algorithm, called SparseClust, participated in the most recent DREAM community challenge on disease module identification, an open competition to comprehensively assess module identification methods across a wide range of biological networks.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1042 ◽  
Author(s):  
Gilles Didier ◽  
Alberto Valdeolivas ◽  
Anaïs Baudot

The identification of communities, or modules, is a common operation in the analysis of large biological networks. The Disease Module Identification DREAM challenge established a framework to evaluate clustering approaches in a biomedical context, by testing the association of communities with GWAS-derived common trait and disease genes. We implemented here several extensions of the MolTi software that detects communities by optimizing multiplex (and monoplex) network modularity. In particular, MolTi now runs a randomized version of the Louvain algorithm, can consider edge and layer weights, and performs recursive clustering. On simulated networks, the randomization procedure clearly improves the detection of communities. On the DREAM challenge benchmark, the results strongly depend on the selected GWAS dataset and enrichment p-value threshold. However, the randomization procedure, as well as the consideration of weighted edges and layers generally increases the number of trait and disease community detected. The new version of MolTi and the scripts used for the DMI DREAM challenge are available at: https://github.com/gilles-didier/MolTi-DREAM.


Sign in / Sign up

Export Citation Format

Share Document