frequent itemsets
Recently Published Documents


TOTAL DOCUMENTS

1010
(FIVE YEARS 147)

H-INDEX

38
(FIVE YEARS 4)

2022 ◽  
Vol 54 (9) ◽  
pp. 1-35
Author(s):  
Lázaro Bustio-Martínez ◽  
René Cumplido ◽  
Martín Letras ◽  
Raudel Hernández-León ◽  
Claudia Feregrino-Uribe ◽  
...  

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.


2022 ◽  
Author(s):  
Shwetha Rai ◽  
◽  
Geetha M. ◽  
Preetham Kumar ◽  
Giridhar B. ◽  
...  

2021 ◽  
Vol 50 (4) ◽  
pp. 627-644
Author(s):  
Shariq Bashir ◽  
Daphne Teck Ching Lai

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.


Author(s):  
Majid Seyfi ◽  
Richi Nayak ◽  
Yue Xu ◽  
Shlomo Geva

We tackle the problem of discriminative itemset mining. Given a set of datasets, we want to find the itemsets that are frequent in the target dataset and have much higher frequencies compared with the same itemsets in other datasets. Such itemsets are very useful for dataset discrimination. We demonstrate that this problem has important applications and, at a same time, is very challenging. We present the DISSparse algorithm, a mining method that uses two determinative heuristics based on the sparsity characteristics of the discriminative itemsets as a small subset of the frequent itemsets. We prove that the DISSparse algorithm is sound and complete. We experimentally investigate the performance of the proposed DISSparse on a range of datasets, evaluating its efficiency and stability and demonstrating it is substantially faster than the baseline method.


2021 ◽  
Vol 11 (21) ◽  
pp. 10399
Author(s):  
Yalong Zhang ◽  
Wei Yu ◽  
Qiuqin Zhu ◽  
Xuan Ma ◽  
Hisakazu Ogura

When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.


2021 ◽  
Author(s):  
Naomie Sandra Noumi Sandji ◽  
Djamal Abdoul Nasser Seck

The general purpose of this paper is to propose a distributed version of frequent closed itemsets extraction in the context of big data. The goal is to have good performances of frequent closed itemsets extraction as frequent closed item-sets are bases for frequent itemsets. To achieve this goal, we have extended the Galois lattice technique (or concept lattice) in this context. Indeed, Galois lattices are an efficient alternative for extracting closed itemsets which are interesting approaches for generating frequent itemsets. Thus we proposed Dist Frequent Next Neighbour which is a distributed version of the Frequent Next Neighbour concept lattice construction algorithm, which considerably reduces the extraction time by parallelizing the computation of frequent concepts (closed itemsets).


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jianfang Qi ◽  
Xin Mou ◽  
Yue Li ◽  
Xiaoquan Chu ◽  
Weisong Mu

Purpose Conventional frequent itemsets mining ignores the fact that the relative benefits or significance of “transactions” belonging to different customers are different in most of the relevant applied studies, which leads to failure to obtain some association rules with lower support but from higher-value consumers. Because not all customers are financially attractive to firms, it is necessary that their values be determined and that transactions be weighted. The purpose of this study is to propose a novel consumer preference mining method based on conventional frequent itemsets mining, which can discover more rules from the high-value consumers. Design/methodology/approach In this study, the authors extend the conventional association rule problem by associating the “annual purchase amount” – “price preference” (AP) weight with a consumer to reflect the consumer’s contribution to a market. Furthermore, a novel consumer preference mining method, the AP-weclat algorithm, is proposed by introducing the AP weight into the weclat algorithm for discovering frequent itemsets with higher values. Findings The experimental results from the survey data revealed that compared with the weclat algorithm, the AP-weclat algorithm can make some association rules with low support but a large contribution to a market pass the screening by assigning different weights to consumers in the process of frequent itemsets generation. In addition, some valuable preference combinations can be provided for related practitioners to refer to. Originality/value This study is the first to introduce the AP-weclat algorithm for discovering frequent itemsets from transactions through considering AP weight. Moreover, the AP-weclat algorithm can be considered for application in other markets.


2021 ◽  
Vol 11 (19) ◽  
pp. 8971
Author(s):  
Yalong Zhang ◽  
Wei Yu ◽  
Xuan Ma ◽  
Hisakazu Ogura ◽  
Dongfen Ye

The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.


Sign in / Sign up

Export Citation Format

Share Document