scholarly journals A Graph Mining Approach for Ranking and Discovering the Interesting Frequent Subgraph Patterns

Author(s):  
Saif Ur Rehman ◽  
Kexing Liu ◽  
Tariq Ali ◽  
Asif Nawaz ◽  
Simon James Fong

AbstractGraph mining is a well-established research field, and lately it has drawn in considerable research communities. It allows to process, analyze, and discover significant knowledge from graph data. In graph mining, one of the most challenging tasks is frequent subgraph mining (FSM). FSM consists of applying the data mining algorithms to extract interesting, unexpected, and useful graph patterns from the graphs. FSM has been applied to many domains, such as graphical data management and knowledge discovery, social network analysis, bioinformatics, and security. In this context, a large number of techniques have been suggested to deal with the graph data. These techniques can be classed into two primary categories: (i) a priori-based FSM approaches and (ii) pattern growth-based FSM approaches. In both of these categories, an extensive research work is available. However, FSM approaches are facing some challenges, including enormous numbers of frequent subgraph patterns (FSPs); no suitable mechanism for applying ranking at the appropriate level during the discovery process of the FSPs; extraction of repetitive and duplicate FSPs; user involvement in supplying the support threshold value; large number of subgraph candidate generation. Thus, the aim of this research is to make do with the challenges of enormous FSPs, avoid duplicate discovery of FSPs, and use the ranking for such patterns. Therefore, to address these challenges a new FSM framework A RAnked Frequent pattern-growth Framework (A-RAFF) is suggested. Consequently, A-RAFF provides an efficacious answer to these challenges through the initiation of a new ranking measure called FSP-Rank. The proposed ranking measure FSP-Rank effectively reduced the duplicate and enormous frequent patterns. The effectiveness of the techniques proposed in this study is validated by extensive experimental analysis using different benchmark and synthetic graph datasets. Our experiments have consistently demonstrated the promising empirical results, thus confirming the superiority and practical feasibility of the proposed FSM framework.

2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Saif Ur Rehman ◽  
Sohail Asghar ◽  
Yan Zhuang ◽  
Simon Fong

Due to rapid development of the Internet technology and new scientific advances, the number of applications that model the data as graphs increases, because graphs have highly expressive power to model a complicated structure. Graph mining is a well-explored area of research which is gaining popularity in the data mining community. A graph is a general model to represent data and has been used in many domains such as cheminformatics, web information management system, computer network, and bioinformatics, to name a few. In graph mining the frequent subgraph discovery is a challenging task. Frequent subgraph mining is concerned with discovery of those subgraphs from graph dataset which have frequent or multiple instances within the given graph dataset. In the literature a large number of frequent subgraph mining algorithms have been proposed; these included FSG, AGM, gSpan, CloseGraph, SPIN, Gaston, and Mofa. The objective of this research work is to perform quantitative comparison of the above listed techniques. The performances of these techniques have been evaluated through a number of experiments based on three different state-of-the-art graph datasets. This novel work will provide base for anyone who is working to design a new frequent subgraph discovery technique.


2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


To overcome the challenges for managing the rapid growth of social graphs, massive Distributed Graph Mining Systems are developed, such as Pregel, GiraphHama, GraphLab, PowerLab, etc. The common approach to all systems is to divide the entire Graph Dataset into smaller divisions and use it as “think like a vertex”, the programing model is to hold up a continual graph calculation. In this paper, we use the Optimized Frequent Subgraph Mining algorithm in the Giraph framework model and make a comparative study with existing different Distributed Systems. To enhance the flexibility and performance of the novel method, we carry out different optimization techniques associating it with updating different run time limits. We also investigate how the performance could be improved by Giraph Distribution System, which plays a vital role in social graphs such as LinkedIn, Twitter, Facebook, etc. The graph input, output, cluster set up and hardware configuration play vital roles in optimizing the performance of our proposed algorithm.


2017 ◽  
Vol 17 (1) ◽  
pp. 3-15
Author(s):  
J. Demetrovics ◽  
H. M. Quang ◽  
N. V. Anh ◽  
V. D. Thi

Abstract Graph mining isamajor area of interest within the field of data mining in recent years. Akey aspect of graph mining is frequent subgraph mining. Central to the entire discipline of frequent subgraph mining is the concept of subgraph isomorphism. One major issue in early subgraph isomorphism research concerns computational complexity. Normally, the subgraph isomorphism problem is NP-complete. Previous studies of frequent subgraph mining have not solved NP-complete problem in the subgraph isomorphism. In this paper, we proposeanew algorithm which can deal with this problem. The proposed algorithm can solve the subgraph isomorphism in polynomial time in some settings. Moreover, the new algorithm is proved theoretically more effective than previous studies in closed frequent subgraph mining.


2012 ◽  
Vol 28 (1) ◽  
pp. 75-105 ◽  
Author(s):  
Chuntao Jiang ◽  
Frans Coenen ◽  
Michele Zito

AbstractGraph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.


The review article discusses the possibilities of using fractal mathematical analysis to solve scientific and applied problems of modern biology and medicine. The authors show that only such an approach, related to the section of nonlinear mechanics, allows quantifying the chaotic component of the structure and function of living systems, that is a priori important additional information and expands, in particular, the possibilities of diagnostics, differential diagnosis and prediction of the course of physiological and pathological processes. A number of examples demonstrate the specific advantages of using fractal analysis for these purposes. The conclusion can be made that the expanded use of fractal analysis methods in the research work of medical and biological specialists is promising.


Sign in / Sign up

Export Citation Format

Share Document