Frequent Itemset Mining in a Unique Scan using Transaction Database

In recent year, frequent Itemset Mining (FIM) has occurred as a vital role in data mining tasks. The search of FIM in a transactions data is discovered in this paper, pull out hidden pattern from transactions data. The main two limitation of the Apriori algorithm are undertaken, first, its scans the complete Databases at every passes to compute the supports of every itemset produced and secondly, the user defined responsive to variation of min_sup (minimum supports) thresholds. In this paper, proposed methodology called frequent Itemset Mining in unique Scan (FIMUS), needs a scan only one time of transaction databases to extract frequent itemsets. The generation of a static numbers of candidate Itemset is an exclusive feature, individually from the threshold of min_sup, which reduces the execution time for huge database. The proposed algorithm FIMUS is compared with Apriori algorithm using benchmark database for a dense databases. The experimental result confirms the scalability of FIMUS.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Adaptive Apriori Algorithm for frequent itemset mining

2016 International Conference System Modeling & Advancement in Research Trends (SMART) ◽

10.1109/sysmart.2016.7894480 ◽

2016 ◽

Cited By ~ 3

Author(s):

Shubhangi D. Patil ◽

Ratnadeep R. Deshmukh ◽

D.K. Kirange

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm

10.1109/fpl.2009.5272494 ◽

2009 ◽

Cited By ~ 8

Author(s):

David W. Thoni ◽

Alfred Strey

Keyword(s):

Hardware Acceleration ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

Hp-Apriori: Horizontal parallel-apriori algorithm for frequent itemset mining from big data

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( ◽

10.1109/icbda.2017.8078825 ◽

2017 ◽

Author(s):

Mohammad-Hossein Nadimi-Shahraki ◽

Mehdi Mansouri

Keyword(s):

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Frequent itemset mining: technique to improve eclat based algorithm

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp5471-5478 ◽

2019 ◽

Vol 9 (6) ◽

pp. 5471

Author(s):

Mahadi Man ◽

Masita Abdul Jalil

Keyword(s):

Processing Time ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Experimental Result ◽

Memory Usage ◽

Enhancement Technique ◽

Itemset Mining ◽

Mining Technique ◽

Main Challenge ◽

Memory Utilization

<span lang="EN-US">In frequent</span><span lang="EN-US"> itemset mining, the main challenge is to discover relationships between data in a transactional database or relational database. Various algorithms have been introduced to process frequent itemset. Eclat based algorithms are one of the prominent algorithm used for frequent itemset mining. Various researches have been conducted based on Eclat based algorithm such as Tidset, dEclat, Sortdiffset and Postdiffset. The algorithm has been improvised along the time. However, the utilization of physical memory and processing time become the main problem in this process. This paper reviews and presents a comparison of various Eclat based algorithms for frequent itemset mining and propose an enhancement technique of Eclat based algorithm to reduce processing time and memory usage. The experimental result shows some improvement in processing time and memory utilization in frequent itemset mining.</span>

Download Full-text

Frequent Itemset Mining Using Improved Apriori Algorithm with MapReduce

2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA) ◽

10.1109/iccubea.2017.8463915 ◽

2017 ◽

Cited By ~ 1

Author(s):

Seema A. Tribhuvan ◽

Nitin R. Gavai ◽

Bharti P. Vasgi

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Apriori Algorithm ◽

Itemset Mining

Download Full-text

Enhancing the Performance of Large-scale Profitable Itemset Mining using Efficient Data Structures

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8151.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1768-1772

Keyword(s):

Data Structures ◽

Large Scale ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Second Phase ◽

Itemset Mining ◽

Efficient Data ◽

Profit Value ◽

Efficient Data Structures

The process of extracting the most frequently bought items from a transactional database is termed as frequent itemset mining. Although it provides us with an idea of the best-selling itemsets, the method fails to identify the most profitable items from the database. It is not uncommon to have minimal intersection between frequent itemsets and profitable itemsets, and the process of extracting the most profitable itemsets is termed as Greater Profitable Itemset (GPI) mining. There have been various approaches to mine GPI in which [7] proposed a two-phased algorithm to optimize regeneration of GPI when the profit value of any item changes. This constituted of keeping track of the pruned items in the first phase and using it to efficiently regenerate GPI in the second phase. This paper proposes an enhancement to the way these changes are tracked by storing the pruned itemsets according to their constituent items, unlike the earlier algorithm that stored records iteration wise. By storing the itemsets according to their constituent items, we make sure that only the required items are being retrieved. In contrast, the earlier algorithm would fetch all the items pruned in any iteration, regardless of its relevance. By fetching only relevant itemset, the proposed method would significantly bring down the computational requirements.

Download Full-text

Evaluation of Frequent Itemset Mining Algorithms-Apriori and FP Growth

International Journal of Engineering Technology and Management Sciences ◽

10.46647/ijetms.2020.v04i06.001 ◽

2020 ◽

Vol 4 (6) ◽

pp. 1-4

Author(s):

Jismy Joseph ◽

Kesavaraj G

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Frequent Patterns ◽

Apriori Algorithm ◽

Itemset Mining ◽

Mining Algorithms ◽

Time And Space Complexity

Nowadays the Frequentitemset mining (FIM) is an essential task for retrieving frequently occurring patterns, correlation, events or association in a transactional database. Understanding of such frequent patterns helps to take substantial decisions in decisive situations. Multiple algorithms are proposed for finding such patterns, however the time and space complexity of these algorithms rapidly increases with number of items in a dataset. So it is necessary to analyze the efficiency of these algorithms by using different datasets. The aim of this paper is to evaluate theperformance of frequent itemset mining algorithms, Apriori and Frequent Pattern (FP) growth by comparing their features. This study shows that the FP-growth algorithm is more efficient than the Apriori algorithm for generating rules and frequent pattern mining.

Download Full-text