A data mining proxy approach for efficient frequent itemset mining

Jeffrey Xu Yu; Zhiheng Li; Guimei Liu

doi:10.1007/s00778-007-0047-0

Frequent Itemset Mining in Data Mining: A Survey

International Journal of Computer Applications ◽

10.5120/ijca2016909219 ◽

2016 ◽

Vol 139 (9) ◽

pp. 15-18 ◽

Cited By ~ 3

Author(s):

Rana Ishita ◽

Amit Rathod

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Postdiffset: an Eclat-like algorithm for frequent itemset mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.28.12911 ◽

2018 ◽

Vol 7 (2.28) ◽

pp. 197

Author(s):

W A.W.A. Bakar ◽

M A. Jalil ◽

M Man ◽

Z Abdullah ◽

F Mohd

Keyword(s):

Data Mining ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Data Format ◽

Itemset Mining ◽

Data Formats ◽

Vertical Data ◽

Mining Algorithms

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent Itemset Mining in Data-Mining-As-a-Service Paradigm

IEEE Transactions on Services Computing ◽

10.1109/tsc.2015.2436387 ◽

2016 ◽

Vol 9 (1) ◽

pp. 18-32 ◽

Cited By ~ 6

Author(s):

Boxiang Dong ◽

Ruilin Liu ◽

Hui Wendy Wang

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Service Paradigm ◽

Result Correctness

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

A Weighted Frequent Itemset Mining Algorithm for Intelligent Decision in Smart System

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195518 ◽

2019 ◽

pp. 249-255

Author(s):

A. Kowsalya ◽

S. Uma Parameswari ◽

N. Kokila

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Accurate Information ◽

Smart System ◽

Itemset Mining ◽

Intelligent Decision ◽

Key Factor ◽

The Many ◽

Day By Day

Identifying the frequent item set is the challenging task in data mining as data is increased day by day in all fields. To analyze the accurate item set in that data like market basket is the key factor of improving the economical strategy of the marketing management. Frequent itemset mining, as an imperative of association rule examination, one of the mainly essential study fields in data mining. Weighted frequent itemset mining in vague databases equally the current prospect and significance of items into version in order to discover frequent itemsets of great importance to users. But many data are inconsistency because of the incomplete field in the collected data. This brings less stability in predicting the accurate information in the data which has the many fields. Many existing research have developed many technique or algorithm to bring the stable procedure to predict the data. But achieving the 100% accurate data from the collected dataset is still not completed. In this thesis, the proposed system will bring various parameters that will analyze dataset with Apriori and weighted Downwards Frequency Itemset Mining (WDFIM). In this analysis the minimum support, confidence level and time consumption are the parameters that analyzed where WDFIM is analyzing more accurate result when compared to Apiori algorithm.

Download Full-text

Data Mining Proxy: Serving Large Number of Users for Efficient Frequent Itemset Mining

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24775-3_56 ◽

2004 ◽

pp. 458-463 ◽

Cited By ~ 1

Author(s):

Zhiheng Li ◽

Jeffrey Xu Yu ◽

Hongjun Lu ◽

Yabo Xu ◽

Guimei Liu

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Implementasi Algoritma FP-Growth Untuk Strategi Pemasaran Ritel Hidroponik (Studi Kasus : PT. HAB)

Jurnal Buana Informatika ◽

10.24002/jbi.v10i1.1746 ◽

2019 ◽

Vol 10 (1) ◽

pp. 11

Author(s):

Adi Nugroho Susanto Putro ◽

Richardus Indra Gunawan

Keyword(s):

Data Mining ◽

Open Source ◽

Association Rule ◽

Research Community ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Interestingness Measure ◽

Itemset Mining ◽

Pattern Growth

Bisnis di bidang tanaman sayuran mengalami peningkatan yang cukup signifikan beberapa tahun belakangan ini. Salah satu cara untuk menghasilkan produk sayuran yang berkualitas tinggi secara kontinyu adalah budidaya dengan sistem hidroponik [1]. Bisnis hidroponik mempunyai peluang yang baik akan tetapi mempunyai kelemahan yaitu karena tanaman segar tanpa obat dan pengawet maka sayur dan buah hidroponik tidak dapat bertahan lama. Maka jika sayur dan buah ini tidak segera terjual akan mengakibatkan kerugian. Data mining merupakan proses mencari pola atau informasi menarik dalam data terpilih dengan menggunakan teknik atau metode tertentu. Apriori merupakan salah satu dari sepuluh algoritma yang paling berpengaruh dalam research community. Sejak algoritma Apriori pertama kali diperkenalkan, ada banyak upaya untuk merancang algoritma frequent itemset mining yang lebih efisien. Perbaikan yang paling menonjol pada Apriori menjadi sebuah metode yang disebut FP-Growth (frequent pattern growth) yang berhasil menghilangkan candidate generation [2]. Penelitian ini mengusulkan implementasi Algoritma FP-Growth dengan Software Open Source Weka untuk membantu menganalisa dan merancang katalog produk ritel hidroponik untuk mendorong buah atau sayur terjual secara bersama-sama. Dalam menentukan association rule, terdapat suatu interestingness measure (ukuran kepercayaan), yaitu support dan confidence. Penelitian ini, dengan menggunakan minimum suport 0,05 dan minimum confidence 0,9 menghasilkan 21 rule yang dapat digunakan sebagai strategi pemasaran PT. HAB.Kata Kunci: Algoritma FP-Growth, Strategi Pemasaran, Ritel Hidroponik.

Download Full-text

A study of frequent itemset mining techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v6i4.8300 ◽

2017 ◽

Vol 6 (4) ◽

pp. 141

Author(s):

Sachin Sharma ◽

Shaveta Bhatia

Keyword(s):

Data Mining ◽

Association Rules ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Minimum Threshold ◽

Minimum Support ◽

Itemset Mining ◽

Frequent Item ◽

Frequent Item Sets ◽

Different Sources

Frequent item set is the most crucial and expensive task for the industry today. It is the task of mining the information from different sources and a key approach in Data Mining. Frequent item sets satisfying the minimum threshold can be discovered. Association rules are extracted from frequent item sets. The Association rules are affected by the minimum support value entered by the user may be considered as Positive or negative. There may be some other Association rules, which involve the rare item sets. Various methods have been used by researchers for generating the Association Rules. In this paper, our aim is to study various techniques to generate the Association rules.

Download Full-text

Condensed Representations for Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch040 ◽

2011 ◽

pp. 207-211 ◽

Cited By ~ 2

Author(s):

Jean-Francois Boulicaut

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Future Trends ◽

Itemset Mining ◽

Typical Data ◽

Condensed Representations ◽

Transactional Data ◽

Research Domain

Condensed representations have been proposed in Mannila and Toivonen (1996) as a useful concept for the optimization of typical data-mining tasks. It appears as a key concept within the inductive database framework (Boulicaut et al., 1999; de Raedt, 2002; Imielinski & Mannila, 1996), and this article introduces this research domain, its achievements in the context of frequent itemset mining (FIM) from transactional data, and its future trends.

Download Full-text