scholarly journals NUCLEAR: An Efficient Methods for Mining Frequent Itemsets and Generators from Closed Frequent Itemsets

2021 ◽  
Vol 7 (2) ◽  
Author(s):  
Huy Quang Pham, Duc Tran, Ninh Bao Duong, Philippe Fournier-Viger, Alioune Ngom

Frequent itemset (FI) mining is an interesting data mining task. Instead of directly mining the FIs from data it is preferred to mine only the closed frequent itemsets (CFIs) first and then extract the FIs for each CFI. However, some algorithms require the generators for each CFI in order to extract the FIs, leading to an extra cost. In this paper, we introduce an effective algorithm, called NUCLEAR, which can induce the FIs from the lattice of CFIs without the need of the generators. It can enumerate generators as well by similar fashion. Experimental results showed that NUCLEAR is effective as compared to previous studies, especially, the time for extracting the FIs is usually much smaller than that for mining the CFIs.

In the area of data mining for finding frequent itemset from huge database, there exist a lot of algorithms, out of all Apriori algorithm is the base of all algorithms. In Uapriori algorithm each items existential probability is examined with a given support count, if it is greater or equal then these items are known as frequent items, otherwise these are known as infrequent itemsets. In this paper matrix technology has been introduced over Uapriori algorithm which reduces execution time and computational complexity for finding frequent itemset from uncertain transactional database. In the modern era, volume of data is increasing exponentially and highly optimized algorithm is needed for processing such a large amount of data in less time. The proposed algorithm can be used in the field of data mining for retrieving frequent itemset from a large volume of database by taking very less computation complexity.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-30
Author(s):  
Guangtao Wang ◽  
Gao Cong ◽  
Ying Zhang ◽  
Zhen Hai ◽  
Jieping Ye

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.


2017 ◽  
Vol 8 (1) ◽  
pp. 31-43
Author(s):  
Zuber Shaikh ◽  
Antara Mohadikar ◽  
Rachana Nayak ◽  
Rohith Padamadan

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.


2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2005 ◽  
Vol 1 (3) ◽  
pp. 129-135
Author(s):  
Jun Luo ◽  
Sanguthevar Rajasekaran

Association rules mining is an important data mining problem that has been studied extensively. In this paper, a simple but Fast algorithm for Intersecting attributes lists using hash Tables (FIT) is presented. FIT is designed for efficiently computing all the frequent itemsets in large databases. It deploys an idea similar to Eclat but has a much better computational performance than Eclat due to two reasons: 1) FIT makes fewer total number of comparisons for each intersection operation between two attributes lists, and 2) FIT significantly reduces the total number of intersection operations. Our experimental results demonstrate that the performance of FIT is much better than that of Eclat and Apriori algorithms.


2012 ◽  
Vol 195-196 ◽  
pp. 984-986
Author(s):  
Ming Ru Zhao ◽  
Yuan Sun ◽  
Jian Guo ◽  
Ping Ping Dong

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.


2018 ◽  
Author(s):  
Loc Nguyen ◽  
Minh-Phung T. Do

Collaborative filtering (CF) is a popular technique in recommendation study. Concretely, items which are recommended to user are determined by surveying her/his communities. There are two main CF approaches, which are memory-based and model-based. I propose a new CF model-based algorithm by mining frequent itemsets from rating database. Hence items which belong to frequent itemsets are recommended to user. My CF algorithm gives immediate response because the mining task is performed at offline process-mode. I also propose another so-called Roller algorithm for improving the process of mining frequent itemsets. Roller algorithm is implemented by heuristic assumption “The larger the support of an item is, the higher it’s likely that this item will occur in some frequent itemset”. It models upon doing white-wash task, which rolls a roller on a wall in such a way that is capable of picking frequent itemsets. Moreover I provide enhanced techniques such as bit representation, bit matching and bit mining in order to speed up recommendation process. These techniques take advantages of bitwise operations (AND, NOT) so as to reduce storage space and make algorithms run faster.


2020 ◽  
Vol 76 (10) ◽  
pp. 7619-7634 ◽  
Author(s):  
Wen Xiao ◽  
Juan Hu

Abstract Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.


Sign in / Sign up

Export Citation Format

Share Document