scholarly journals A Comprehensive Survey of Frequent Itemsets Mining on Transactional Database with Weighted Items

Author(s):  
Thanh Huan Phan ◽  
Hoài Bắc Lê

In 1993, Agrawal et al. proposed the first algorithm for mining traditional frequent itemset on binarytransactional database with unweighted items - This algorithmis essential in finding hindden relationships among items inyour data. Until 1998, with the development of various typesof transactional database - some researchers have proposed afrequent itemsets mining algorithms on transactional databasewith weighted items (the importance/meaning/value of itemsis different) - It provides more pieces of knowledge thantraditional frequent itemsets mining. In this article, the authors present a survey of frequent itemsets mining algorithmson transactional database with weighted items over the pasttwenty years. This research helps researchers to choose theright technical solution when it comes to scale up in big datamining. Finally, the authors give their recommendations anddirections for their future research.

2017 ◽  
Vol 8 (1) ◽  
pp. 31-43
Author(s):  
Zuber Shaikh ◽  
Antara Mohadikar ◽  
Rachana Nayak ◽  
Rohith Padamadan

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.


2021 ◽  
Vol 11 (3) ◽  
pp. 208-218
Author(s):  
Sadeq Darrab ◽  
◽  
David Broneske ◽  
Gunter Saake

Data mining is the process of extracting useful unknown knowledge from large datasets. Frequent itemset mining is the fundamental task of data mining that aims at discovering interesting itemsets that frequently appear together in a dataset. However, mining infrequent (rare) itemsets may be more interesting in many real-life applications such as predicting telecommunication equipment failures, genetics, medical diagnosis, or anomaly detection. In this paper, we survey up-to-date methods of rare itemset mining. The main goal of this survey is to provide a comprehensive overview of the state-of-the-art algorithms of rare itemset mining and its applications. The main contributions of this survey can be summarized as follows. In the first part, we define the task of rare itemset mining by explaining key concepts and terminology, motivation examples, and comparisons with underlying concepts. Then, we highlight the state-of-art methods for rare itemsets mining. Furthermore, we present variations of the task of rare itemset mining to discuss limitations of traditional rare itemset mining algorithms. After that, we highlight the fundamental applications of rare itemset mining. In the last, we point out research opportunities and challenges for rare itemset mining for future research.


Author(s):  
Hanaa Ibrahim Abu Zahra ◽  
Shaker El-Sappagh ◽  
Tarek Ahmef El Shishtawy

Most frequent itemset mining algorithms (FIMA) discover hidden relationships from unrelated items. They find the most frequent itemsets depending only on the frequency of the item's existence in the dataset. These algorithms give all items the same importance, and neglect the differences in importance of the items. They assume the full certainty of data, but in most cases, real word data may be uncertain. As a result, the data could be incomplete and/or imprecise. These two problems are the most common challenges that face FIMA algorithms. Some new algorithms proposed some solutions to face these two issues separately. In other words, some algorithms handle item importance only, and others handle uncertainty only. Few algorithms dealt with the two issues together. In this article, the single scan for weighted itemsets over the uncertain database (SSU-Wfim) is proposed. It depends on the single scan frequent itemsets algorithm (SS_FIM), and enhances it to deal with weighted items in an uncertain database. SSU_WFIM deals with the uncertainty of data by giving each item in a transaction an additional value to indicate occurrence likelihood. It gives the items different values to define the weight of them. It uses a table called Ptable to save the items and their probability values. This table is used to generate all possible candidates itemsets. The results indicate the high performance in aspects of runtime, memory consumption and scalability of SSU-Wfim comparing with the UApriori algorithm. The proposed algorithm saves time and memory with a percentage exceeds 70% for all tested datasets.


2021 ◽  
Vol 7 ◽  
pp. e385
Author(s):  
Saood Iqbal ◽  
Abdul Shahid ◽  
Muhammad Roman ◽  
Zahid Khan ◽  
Shaha Al-Otaibi ◽  
...  

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.


ARM is a significant area of knowledge mining which enables association rules which are essential for decision making. Frequent itemset mining has a challenge against large datasets. As going on the dataset size increases the burden and time to discover rules will increase. In this paper the ARM algorithms with tree structures like FP-tree, FIN with POC tree and PPC tree are discussed for reducing overheads and time consuming. These algorithms use highly competent data structures for mining frequent itemsets from the database. FIN uses nodeset a unique and novel data structure to extract frequent itemsets and POC tree to store frequent itemset information. These techniques are extremely helpful in the marketing fields. The proposed and implemented techniques reveal that they have improved about performance by means of time and efficiency


2021 ◽  
Vol 50 (4) ◽  
pp. 627-644
Author(s):  
Shariq Bashir ◽  
Daphne Teck Ching Lai

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-35
Author(s):  
Lázaro Bustio-Martínez ◽  
René Cumplido ◽  
Martín Letras ◽  
Raudel Hernández-León ◽  
Claudia Feregrino-Uribe ◽  
...  

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-30
Author(s):  
Guangtao Wang ◽  
Gao Cong ◽  
Ying Zhang ◽  
Zhen Hai ◽  
Jieping Ye

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.


Sign in / Sign up

Export Citation Format

Share Document