scholarly journals Frequent Subgraph Mining Based Collaboration Pattern Analysis for Wikipedia

2019 ◽  
Vol 48 (2) ◽  
pp. 195-210
Author(s):  
Zhonghu Zuo ◽  
Chunhong Zhang ◽  
Xiaosheng Tang ◽  
Zheng Hu ◽  
Yuqian Tang

Online knowledge collaborations, where distributed members without hierarchies self-organize themselvesto create valuable contents, are prevalent in many open production systems such as Wikipedia, GitHub andsocial networks. While many existing studies from network science have been brought to analyse the general interactivebehavioural patterns embedded in these systems, how the collaborations influence the achievement outcomes hasnot been thoroughly investigated. In this paper, we mine the collaboration patterns from a micro perspective to deeplyunderstand the relationships between the collaboration among participants and the qualities of theWikipedia articles.In particular, the subgraphs contained in the collaboration networks derived from theWikipedia revision histories aretaken as the fundamental units to analyse the collaboration diversities from the subgraph properties such as size andtopology. In contrast to the predefined static motifs adopted by the previous works, the collaboration subgraphs aredirectly found from Wikipedia dataset by a frequent subgraph mining algorithm GRAMI, which is able to capturethe real dynamic collaboration patterns. Moreover, the relationships between the co-authors in the subgraphs are alsodiscriminated to further explore the collaboration patterns. The experiments exhibit the statistical properties of thecollaboration subgraphs and the efficiency of them as the metrics for the article quality assessments. We concludethat a small group of editors with relative frequent fixed collaboration patterns contribute more to the excellent articlequality than the professional extents of arbitrary individuals in the collaboration group. This discovery confirms thecommonly insight about collaboration that many heads are always better than one and concretely suggests a potentialexplanation for the increasing prevalence and success of the online knowledge collaborations

Author(s):  
Jagannadha Rao D. B.

This paper addresses this issue and devises a new method for frequent subgraph mining in order to retrieve the valuable information from the database that captured the attention of the users. This paper proposes the recurrent-Gaston (R-Gaston) algorithm for the frequent subgraph mining process by enhancing the existing Gaston algorithm. Moreover, the method uses support measures based on the frequency and page duration parameters in order to define the support for the proposed R-Gaston algorithm. The simulation of the proposed R-Gaston is carried out using the weblog and the MSNBC databases. The proposed R-Gaston has attained values of number of structures mined and the execution time as 184, and 1282ms for the MSNBC database, with 60 and 75ms for the weblog database, respectively.


2018 ◽  
Vol 8 (1) ◽  
pp. 194-209 ◽  
Author(s):  
Büsra Güvenoglu ◽  
Belgin Ergenç Bostanoglu

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.


Distributed System, plays a vital role in Frequent Subgraph Mining (FSM) to extract frequent subgraph from Large Graph database. It help to reduce in memory requirements, computational costs as well as increase in data security by distributing resources across distributed sites, which may be homogeneous or heterogeneous. In this paper, we focus on the problem related complexity of data arises in centralized system by using MapReduce framework. We proposed a MapReduced based Optimized Frequent Subgrph Mining (MOFSM) algorithm in MapReduced framework for large graph database. We also compare our algorithm with existing methods using four real-world standard datasets to verify that better solution with respect to performance and scalability of algorithm. These algorithms are used to extract subgraphs in distributed system which is important in real-world applications, such as computer vision, social network analysis, bio-informatics, financial and transportation network.


Sign in / Sign up

Export Citation Format

Share Document