Mining interestingness measures for string pattern mining

2012 ◽  
Vol 25 (1) ◽  
pp. 45-50 ◽  
Author(s):  
M. Baena-Garcı´a ◽  
R. Morales-Bueno
Author(s):  
Pradeep Kumar ◽  
Raju S. Bapi ◽  
P. Radha Krishna

Interestingness measures play an important role in finding frequently occurring patterns, regardless of the kind of patterns being mined. In this work, we propose variation to the AprioriALL Algorithm, which is commonly used for the sequence pattern mining. The proposed variation adds up the measure interest during every step of candidate generation to reduce the number of candidates thus resulting in reduced time and space cost. The proposed algorithm derives the patterns which are qualified and more of interest to the user. The algorithm, by using the interest, measure limits the size the candidates set whenever it is produced by giving the user more importance to get the desired patterns.


Author(s):  
Tetsushi Matsui ◽  
Takeaki Uno ◽  
Juzoh Umemori ◽  
Tsuyoshi Koide

2020 ◽  
Vol 10 (6) ◽  
pp. 1991
Author(s):  
Kerstin Neubarth ◽  
Darrell Conklin

A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses.


2016 ◽  
Vol 44 (1) ◽  
pp. 74-90 ◽  
Author(s):  
Dilip Singh Sisodia ◽  
Vijay Khandal ◽  
Riya Singhal

The prediction of users’ browsing behaviours is essential for putting appropriate information on the web. The browsing behaviours are stored as navigational patterns in web server logs. These weblogs are used to predict the frequently accessed patterns of web users, which can be used to predict user behaviour and to collect business intelligence. However, owing to the exponentially increasing weblog size, existing implementations of frequent-pattern-mining algorithms often take too much time and generate too many redundant patterns. This article introduces the most interesting pattern-based parallel FP-growth (MIP-PFP) algorithm. MIP-PFP is an improved implementation of the parallel FP-growth algorithm and implemented on the Apache Spark platform for extracting frequent patterns from huge weblogs. Experiments were performed on openly available National Aeronautics and Space Administration (NASA) weblog data to test the effectiveness of the MIP-PFP algorithm. The results were compared with existing implementation of PFP algorithms. The results suggest that the MIP-PFP algorithm running on Apache Spark reduced the execution time by a factor of more than 10 times. The effect of sequence length that has been used as input to the MIP-PFP algorithm was also evaluated with different interestingness parameters including support, confidence, lift, leverage, cosine, and conviction. It is observed from experimental results that only sequences of length greater than three produced a very low value of support for these interestingness measures.


Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.


Sign in / Sign up

Export Citation Format

Share Document