On flexible cohesive subgraph mining

This paper addresses this issue and devises a new method for frequent subgraph mining in order to retrieve the valuable information from the database that captured the attention of the users. This paper proposes the recurrent-Gaston (R-Gaston) algorithm for the frequent subgraph mining process by enhancing the existing Gaston algorithm. Moreover, the method uses support measures based on the frequency and page duration parameters in order to define the support for the proposed R-Gaston algorithm. The simulation of the proposed R-Gaston is carried out using the weblog and the MSNBC databases. The proposed R-Gaston has attained values of number of structures mined and the execution time as 184, and 1282ms for the MSNBC database, with 60 and 75ms for the weblog database, respectively.

Download Full-text

Scalable mining of maximal quasi-cliques

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436916 ◽

2020 ◽

Vol 14 (4) ◽

pp. 573-585

Author(s):

Guimu Guo ◽

Da Yan ◽

M. Tamer Özsu ◽

Zhe Jiang ◽

Jalal Khalil

Keyword(s):

Graph Mining ◽

State Of The Art ◽

Minimum Degree ◽

The Other ◽

Dense Subgraph ◽

Load Imbalance ◽

Subgraph Mining ◽

Execution Engine ◽

Clique Algorithm ◽

Dense Subgraph Mining

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.

Download Full-text

Subgraph Mining

Encyclopedia of Social Network Analysis and Mining ◽

10.1007/978-1-4939-7131-2_101289 ◽

2018 ◽

pp. 3025-3025

Keyword(s):

Subgraph Mining

Download Full-text

A qualitative survey on frequent subgraph mining

Open Computer Science ◽

10.1515/comp-2018-0018 ◽

2018 ◽

Vol 8 (1) ◽

pp. 194-209 ◽

Cited By ~ 1

Author(s):

Büsra Güvenoglu ◽

Belgin Ergenç Bostanoglu

Keyword(s):

Data Mining ◽

Graph Mining ◽

Research Area ◽

Heterogeneous Data ◽

Graph Representation ◽

Frequent Subgraph Mining ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Input Type ◽

Frequent Subgraphs

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.

Download Full-text

Map Reduce Based Optimized Frequent Subgraph Mining Algorithm for Large Graph Database

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6141.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3131-3139

Keyword(s):

Distributed System ◽

Real World ◽

Transportation Network ◽

Vital Role ◽

Graph Database ◽

Frequent Subgraph Mining ◽

Large Graph ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Centralized System

Distributed System, plays a vital role in Frequent Subgraph Mining (FSM) to extract frequent subgraph from Large Graph database. It help to reduce in memory requirements, computational costs as well as increase in data security by distributing resources across distributed sites, which may be homogeneous or heterogeneous. In this paper, we focus on the problem related complexity of data arises in centralized system by using MapReduce framework. We proposed a MapReduced based Optimized Frequent Subgrph Mining (MOFSM) algorithm in MapReduced framework for large graph database. We also compare our algorithm with existing methods using four real-world standard datasets to verify that better solution with respect to performance and scalability of algorithm. These algorithms are used to extract subgraphs in distributed system which is important in real-world applications, such as computer vision, social network analysis, bio-informatics, financial and transportation network.

Download Full-text