An Approach for Interesting Subgraph Mining from Web Log Data Using W-Gaston Algorithm

Author(s):  
N. Jayalakshmi ◽  
P. Padmaja ◽  
G. Jaya Suma

Graph-Based Data Mining (GBDM) is an emerging research topic nowadays, for the retrieval of the essential information from the graph database. There exist many algorithms that find frequent patterns in a given graph database. One such algorithm, GASTON uses support based on frequency to discover frequent patterns. The discovery phase in the Gaston algorithm is time-consuming, and the pages captured the interest of the users are ignored by the existing GASTON algorithm. This paper proposes an algorithm, Weighted-Gaston (W-Gaston) algorithm, by modifying the existing Gaston algorithm. Here, four interesting measures are developed based on the frequency, entropy, and the page duration, for the retrieval of the interesting sub-graphs. The proposed interesting measures include four types of support: (1) Support based on the page duration (W-Support), (2) Support based on the entropy (E-Support), (3) Support based on the page duration and the entropy (WE-Support), and (4) Support based on the frequency, page duration, and the entropy (FWE-Support). The simulation of the proposed work is done using the MSNBC and the weblog databases. The experimental results show that the proposed algorithm performed well as compared with the existing algorithms.

2017 ◽  
Vol 79 (7) ◽  
Author(s):  
Chayanan Nawapornanan ◽  
Sarun Intakosum ◽  
Veera Boonjing

The share frequent patterns mining is more practical than the traditional frequent patternset mining because it can reflect useful knowledge such as total costs and profits of patterns. Mining share-frequent patterns becomes one of the most important research issue in the data mining. However, previous algorithms extract a large number of candidate and spend a lot of time to generate and test a large number of useless candidate in the mining process. This paper proposes a new efficient method for discovering share-frequent patterns. The new method reduces a number of candidates by generating candidates from only high transaction-measure-value patterns. The downward closure property of transaction-measure-value patterns assures correctness of the proposed method. Experimental results on dense and sparse datasets show that the proposed method is very efficient in terms of execution time. Also, it decreases the number of generated useless candidates in the mining process by at least 70%.


2013 ◽  
Vol 760-762 ◽  
pp. 1896-1901 ◽  
Author(s):  
Chuan Qi Chen

Fuzzy clustering analysis is a clustering algorithm based on function best practices, technology and optimal cost function using calculus. Fuzzy clustering, each sample is no longer belong to a class, but belong to a certain degree of membership of each class. In this paper, Web log sequential pattern mining knowledge gained, and visitors have the same browsing mode access to cutting the interaction of users with the Web information space. The paper presents analysis of Web log data mining based on improved fuzzy clustering algorithm. The experiment demonstrates the improved algorithm has better scalability.


2005 ◽  
Vol 2 (1) ◽  
pp. 103-118 ◽  
Author(s):  
Zengyou He ◽  
Xiaofei Xu ◽  
Zhexue Huang ◽  
Shengchun Deng

An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications and has recently attracted much attention in the data mining research community. In this paper, we present a new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set. The outliers are defined as the data transactions that contain less frequent patterns in their itemsets. We define a measure called FPOF (Frequent Pattern Outlier Factor) to detect the outlier transactions and propose the FindFPOF algorithm to discover outliers. The experimental results have shown that our approach outperformed the existing methods on identifying interesting outliers.


Sign in / Sign up

Export Citation Format

Share Document