Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm

Nowadays the Frequentitemset mining (FIM) is an essential task for retrieving frequently occurring patterns, correlation, events or association in a transactional database. Understanding of such frequent patterns helps to take substantial decisions in decisive situations. Multiple algorithms are proposed for finding such patterns, however the time and space complexity of these algorithms rapidly increases with number of items in a dataset. So it is necessary to analyze the efficiency of these algorithms by using different datasets. The aim of this paper is to evaluate theperformance of frequent itemset mining algorithms, Apriori and Frequent Pattern (FP) growth by comparing their features. This study shows that the FP-growth algorithm is more efficient than the Apriori algorithm for generating rules and frequent pattern mining.

Download Full-text

Frequent Itemset Mining in a Unique Scan using Transaction Database

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2477.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 612-617

Keyword(s):

Vital Role ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Experimental Result ◽

Apriori Algorithm ◽

Itemset Mining ◽

Pull Out ◽

Benchmark Database ◽

Transaction Database

In recent year, frequent Itemset Mining (FIM) has occurred as a vital role in data mining tasks. The search of FIM in a transactions data is discovered in this paper, pull out hidden pattern from transactions data. The main two limitation of the Apriori algorithm are undertaken, first, its scans the complete Databases at every passes to compute the supports of every itemset produced and secondly, the user defined responsive to variation of min_sup (minimum supports) thresholds. In this paper, proposed methodology called frequent Itemset Mining in unique Scan (FIMUS), needs a scan only one time of transaction databases to extract frequent itemsets. The generation of a static numbers of candidate Itemset is an exclusive feature, individually from the threshold of min_sup, which reduces the execution time for huge database. The proposed algorithm FIMUS is compared with Apriori algorithm using benchmark database for a dense databases. The experimental result confirms the scalability of FIMUS.

Download Full-text

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review

ACM Computing Surveys ◽

10.1145/3472289 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-35

Author(s):

Lázaro Bustio-Martínez ◽

René Cumplido ◽

Martín Letras ◽

Raudel Hernández-León ◽

Claudia Feregrino-Uribe ◽

...

Keyword(s):

Graphics Processing Units ◽

Hardware Acceleration ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Comprehensive Review ◽

Development Platform ◽

Itemset Mining ◽

Modern Development ◽

Frequent Itemsets Mining

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.

Download Full-text

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i3.316320 ◽

2018 ◽

Vol 6 (3) ◽

pp. 316-320

Author(s):

Basavaraj A. Goudannavar ◽

◽

Prashant Bhat ◽

Keyword(s):

Knowledge Discovery ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Inverse Frequent Itemset Mining Based on FP-Tree

Journal of Software ◽

10.3724/sp.j.1001.2008.00338 ◽

2008 ◽

Vol 19 (2) ◽

pp. 338-350 ◽

Cited By ~ 2

Author(s):

Yu-Hong GUO

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Mathematics ◽

10.3390/math9040450 ◽

2021 ◽

Vol 9 (4) ◽

pp. 450

Author(s):

Gergely Honti ◽

János Abonyi

Keyword(s):

Climate Change ◽

Extraction Process ◽

Knowledge Extraction ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Multilayer Network ◽

Interdisciplinary Science ◽

Academic Knowledge ◽

Itemset Mining

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

Download Full-text

Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm

Adaptive Apriori Algorithm for frequent itemset mining

Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

Frequent Itemset Mining Using Improved Apriori Algorithm with MapReduce

Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth

Frequent Itemset Mining in a Unique Scan using Transaction Database

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

Inverse Frequent Itemset Mining Based on FP-Tree

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Export Citation Format