NUCLEAR: An Efficient Methods for Mining Frequent Itemsets and Generators from Closed Frequent Itemsets

Frequent itemset (FI) mining is an interesting data mining task. Instead of directly mining the FIs from data it is preferred to mine only the closed frequent itemsets (CFIs) first and then extract the FIs for each CFI. However, some algorithms require the generators for each CFI in order to extract the FIs, leading to an extra cost. In this paper, we introduce an effective algorithm, called NUCLEAR, which can induce the FIs from the lattice of CFIs without the need of the generators. It can enumerate generators as well by similar fashion. Experimental results showed that NUCLEAR is effective as compared to previous studies, especially, the time for extracting the FIs is usually much smaller than that for mining the CFIs.

Download Full-text

Mining Frequent Itemsets Over Uncertain Database using Matrix

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3824.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2048-2052

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Apriori Algorithm ◽

Computation Complexity ◽

Support Count ◽

Frequent Items ◽

Uncertain Database ◽

Modern Era ◽

Mining Frequent Itemsets

In the area of data mining for finding frequent itemset from huge database, there exist a lot of algorithms, out of all Apriori algorithm is the base of all algorithms. In Uapriori algorithm each items existential probability is examined with a given support count, if it is greater or equal then these items are known as frequent items, otherwise these are known as infrequent itemsets. In this paper matrix technology has been introduced over Uapriori algorithm which reduces execution time and computational complexity for finding frequent itemset from uncertain transactional database. In the modern era, volume of data is increasing exponentially and highly optimized algorithm is needed for processing such a large amount of data in less time. The proposed algorithm can be used in the field of data mining for retrieving frequent itemset from a large volume of database by taking very less computation complexity.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases

Computing Letters ◽

10.1163/1574040054861285 ◽

2005 ◽

Vol 1 (3) ◽

pp. 129-135

Author(s):

Jun Luo ◽

Sanguthevar Rajasekaran

Keyword(s):

Data Mining ◽

Association Rules ◽

Fast Algorithm ◽

Frequent Itemsets ◽

Experimental Results ◽

Important Data ◽

Computational Performance ◽

Large Databases ◽

Intersection Operation ◽

Better Than

Association rules mining is an important data mining problem that has been studied extensively. In this paper, a simple but Fast algorithm for Intersecting attributes lists using hash Tables (FIT) is presented. FIT is designed for efficiently computing all the frequent itemsets in large databases. It deploys an idea similar to Eclat but has a much better computational performance than Eclat due to two reasons: 1) FIT makes fewer total number of comparisons for each intersection operation between two attributes lists, and 2) FIT significantly reduces the total number of intersection operations. Our experimental results demonstrate that the performance of FIT is much better than that of Eclat and Apriori algorithms.

Download Full-text

Effective algorithm of mining frequent itemsets for association rules

Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826) ◽

10.1109/icmlc.2004.1382001 ◽

2005 ◽

Author(s):

Pei-Qi Liu ◽

Zeng-Zhi Li ◽

Yin-Liang Zhao

Keyword(s):

Association Rules ◽

Frequent Itemsets ◽

Effective Algorithm ◽

Mining Frequent Itemsets

Download Full-text

Research into the Algorithm of Frequent Pattern Mining Based on across Linker

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.195-196.984 ◽

2012 ◽

Vol 195-196 ◽

pp. 984-986

Author(s):

Ming Ru Zhao ◽

Yuan Sun ◽

Jian Guo ◽

Ping Ping Dong

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemsets ◽

Frequent Pattern ◽

Apriori Algorithm ◽

Important Data ◽

Classical Algorithm ◽

Frequent Itemsets Mining ◽

Mining Frequent Itemsets

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.

Download Full-text

A novel collaborative filtering algorithm by bit mining frequent itemsets

10.7287/peerj.preprints.26444 ◽

2018 ◽

Author(s):

Loc Nguyen ◽

Minh-Phung T. Do

Keyword(s):

Collaborative Filtering ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Storage Space ◽

Model Based ◽

Collaborative Filtering Algorithm ◽

Speed Up ◽

Mining Frequent Itemsets ◽

Bitwise Operations ◽

Process Mode

Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.

Download Full-text

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

The Journal of Supercomputing ◽

10.1007/s11227-020-03190-5 ◽

2020 ◽

Vol 76 (10) ◽

pp. 7619-7634 ◽

Cited By ~ 2

Author(s):

Wen Xiao ◽

Juan Hu

Keyword(s):

Data Mining ◽

Data Processing ◽

Sliding Window ◽

Frequent Itemsets ◽

Streaming Data ◽

Frequent Itemset ◽

Apache Spark ◽

Itemset Mining ◽

Mining Algorithm ◽

Vertical Data

Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.

Download Full-text

Apriori, Association Rules, Data Mining,Frequent Itemsets Mining (FIM), Parallel Computing

Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06) ◽

10.1109/sera.2006.17 ◽

2006 ◽

Cited By ~ 1

Author(s):

M. Yoshikawa ◽

H. Terai

Keyword(s):

Data Mining ◽

Parallel Computing ◽

Association Rules ◽

Frequent Itemsets ◽

Frequent Itemsets Mining ◽

Mining Frequent Itemsets

Download Full-text