Frequent Itemsets as Descriptors of Textual Records

New Descriptors of Textual Records: Getting Help from Frequent Itemsets

Vietnam Journal of Computer Science ◽

10.1142/s2196888820500207 ◽

2020 ◽

Vol 07 (04) ◽

pp. 355-372

Author(s):

Ayoub Bokhabrine ◽

Ismaïl Biskri ◽

Nadia Ghazzali

Keyword(s):

Social Activity ◽

Numerical Data ◽

Frequent Itemsets ◽

Clustering Methods ◽

Support Set ◽

Textual Data ◽

Different Types ◽

Lexical Quality ◽

Textual Records

The analysis of numerical data, whether structured, semi-structured, or raw, is of paramount importance in many sectors of economic, scientific, or simply social activity. The process of extraction of association rules is based on the lexical quality of the text and on the minimum support set by the user. In this paper, we implemented a platform named “IDETEX” capable of extracting itemsets from textual data and using it for the experimentation in different types of clustering methods, such as [Formula: see text]-Medoids and Hierarchical clustering. The experiments conducted demonstrate the potential of the proposed approach for defining similarity between segments.

Download Full-text

An Efficient Approach of Extracting Frequent Itemsets from Large Data Using HDFS Framework

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13354 ◽

2017 ◽

Vol 7 (6) ◽

pp. 529

Author(s):

Prajakta G. Kulkarni ◽

S. R. Khonde

Keyword(s):

Large Data ◽

Frequent Itemsets ◽

Efficient Approach

Download Full-text

Predicting Heart-Diseases from Medical Dataset Through Frequent Itemsets Using Improved Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i8.325331 ◽

2018 ◽

Vol 6 (8) ◽

pp. 325-331

Author(s):

V. Vijayalakshmi

Keyword(s):

Heart Diseases ◽

Frequent Itemsets ◽

Medical Dataset ◽

Improved Algorithm

Download Full-text

Frequent itemsets grouping algorithm based on Hash list

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.03045 ◽

2013 ◽

Vol 33 (11) ◽

pp. 3045-3048

Author(s):

Hongmei WANG ◽

Ming HU

Keyword(s):

Frequent Itemsets ◽

Grouping Algorithm

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text