Distributed Frequent Subgraph Mining Using Gaston and MapReduce

Author(s):  
Jagannadha Rao D. B.

This paper addresses this issue and devises a new method for frequent subgraph mining in order to retrieve the valuable information from the database that captured the attention of the users. This paper proposes the recurrent-Gaston (R-Gaston) algorithm for the frequent subgraph mining process by enhancing the existing Gaston algorithm. Moreover, the method uses support measures based on the frequency and page duration parameters in order to define the support for the proposed R-Gaston algorithm. The simulation of the proposed R-Gaston is carried out using the weblog and the MSNBC databases. The proposed R-Gaston has attained values of number of structures mined and the execution time as 184, and 1282ms for the MSNBC database, with 60 and 75ms for the weblog database, respectively.

2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


Distributed System, plays a vital role in Frequent Subgraph Mining (FSM) to extract frequent subgraph from Large Graph database. It help to reduce in memory requirements, computational costs as well as increase in data security by distributing resources across distributed sites, which may be homogeneous or heterogeneous. In this paper, we focus on the problem related complexity of data arises in centralized system by using MapReduce framework. We proposed a MapReduced based Optimized Frequent Subgrph Mining (MOFSM) algorithm in MapReduced framework for large graph database. We also compare our algorithm with existing methods using four real-world standard datasets to verify that better solution with respect to performance and scalability of algorithm. These algorithms are used to extract subgraphs in distributed system which is important in real-world applications, such as computer vision, social network analysis, bio-informatics, financial and transportation network.


Sign in / Sign up

Export Citation Format

Share Document