On flexible cohesive subgraph mining

2021 ◽  
Author(s):  
Dandan Liu ◽  
Zhaonian Zou
Keyword(s):  
Author(s):  
Jagannadha Rao D. B.

This paper addresses this issue and devises a new method for frequent subgraph mining in order to retrieve the valuable information from the database that captured the attention of the users. This paper proposes the recurrent-Gaston (R-Gaston) algorithm for the frequent subgraph mining process by enhancing the existing Gaston algorithm. Moreover, the method uses support measures based on the frequency and page duration parameters in order to define the support for the proposed R-Gaston algorithm. The simulation of the proposed R-Gaston is carried out using the weblog and the MSNBC databases. The proposed R-Gaston has attained values of number of structures mined and the execution time as 184, and 1282ms for the MSNBC database, with 60 and 75ms for the weblog database, respectively.


2020 ◽  
Vol 14 (4) ◽  
pp. 573-585
Author(s):  
Guimu Guo ◽  
Da Yan ◽  
M. Tamer Özsu ◽  
Zhe Jiang ◽  
Jalal Khalil

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.


2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


Distributed System, plays a vital role in Frequent Subgraph Mining (FSM) to extract frequent subgraph from Large Graph database. It help to reduce in memory requirements, computational costs as well as increase in data security by distributing resources across distributed sites, which may be homogeneous or heterogeneous. In this paper, we focus on the problem related complexity of data arises in centralized system by using MapReduce framework. We proposed a MapReduced based Optimized Frequent Subgrph Mining (MOFSM) algorithm in MapReduced framework for large graph database. We also compare our algorithm with existing methods using four real-world standard datasets to verify that better solution with respect to performance and scalability of algorithm. These algorithms are used to extract subgraphs in distributed system which is important in real-world applications, such as computer vision, social network analysis, bio-informatics, financial and transportation network.


Sign in / Sign up

Export Citation Format

Share Document