A data mining proxy approach for efficient frequent itemset mining

2007 ◽  
Vol 17 (4) ◽  
pp. 947-970 ◽  
Author(s):  
Jeffrey Xu Yu ◽  
Zhiheng Li ◽  
Guimei Liu
2016 ◽  
Vol 139 (9) ◽  
pp. 15-18 ◽  
Author(s):  
Rana Ishita ◽  
Amit Rathod

2018 ◽  
Vol 7 (2.28) ◽  
pp. 197
Author(s):  
W A.W.A. Bakar ◽  
M A. Jalil ◽  
M Man ◽  
Z Abdullah ◽  
F Mohd

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns.  Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining. 


Author(s):  
Fathima Sherin T K ◽  
Anish Kumar B.

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.


2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


Author(s):  
A. Kowsalya ◽  
S. Uma Parameswari ◽  
N. Kokila

Identifying the frequent item set is the challenging task in data mining as data is increased day by day in all fields. To analyze the accurate item set in that data like market basket is the key factor of improving the economical strategy of the marketing management. Frequent itemset mining, as an imperative of association rule examination, one of the mainly essential study fields in data mining. Weighted frequent itemset mining in vague databases equally the current prospect and significance of items into version in order to discover frequent itemsets of great importance to users. But many data are inconsistency because of the incomplete field in the collected data. This brings less stability in predicting the accurate information in the data which has the many fields. Many existing research have developed many technique or algorithm to bring the stable procedure to predict the data. But achieving the 100% accurate data from the collected dataset is still not completed. In this thesis, the proposed system will bring various parameters that will analyze dataset with Apriori and weighted Downwards Frequency Itemset Mining (WDFIM). In this analysis the minimum support, confidence level and time consumption are the parameters that analyzed where WDFIM is analyzing more accurate result when compared to Apiori algorithm.


2019 ◽  
Vol 10 (1) ◽  
pp. 11
Author(s):  
Adi Nugroho Susanto Putro ◽  
Richardus Indra Gunawan

Bisnis di bidang tanaman sayuran mengalami peningkatan yang cukup signifikan beberapa tahun belakangan ini. Salah satu cara untuk menghasilkan produk sayuran yang berkualitas tinggi secara kontinyu adalah budidaya dengan sistem hidroponik [1]. Bisnis hidroponik mempunyai peluang yang baik akan tetapi mempunyai kelemahan yaitu karena tanaman segar tanpa obat dan pengawet maka sayur dan buah hidroponik tidak dapat bertahan lama. Maka jika sayur dan buah ini tidak segera terjual akan mengakibatkan kerugian. Data mining merupakan proses mencari pola atau informasi menarik dalam data terpilih dengan menggunakan teknik atau metode tertentu. Apriori merupakan salah satu dari sepuluh algoritma yang paling berpengaruh dalam research community. Sejak algoritma Apriori pertama kali diperkenalkan, ada banyak upaya untuk merancang algoritma frequent itemset mining yang lebih efisien. Perbaikan yang paling menonjol pada Apriori menjadi sebuah metode yang disebut FP-Growth (frequent pattern growth) yang berhasil menghilangkan candidate generation [2]. Penelitian ini mengusulkan implementasi Algoritma FP-Growth dengan Software Open Source Weka untuk membantu menganalisa dan merancang katalog produk ritel hidroponik untuk mendorong buah atau sayur terjual secara bersama-sama. Dalam menentukan association rule, terdapat suatu interestingness measure (ukuran kepercayaan), yaitu support dan confidence. Penelitian ini, dengan menggunakan minimum suport 0,05 dan minimum confidence 0,9 menghasilkan 21 rule yang dapat digunakan sebagai strategi pemasaran PT. HAB.Kata Kunci: Algoritma FP-Growth, Strategi Pemasaran, Ritel Hidroponik.


2017 ◽  
Vol 6 (4) ◽  
pp. 141
Author(s):  
Sachin Sharma ◽  
Shaveta Bhatia

Frequent item set is the most crucial and expensive task for the industry today. It is the task of mining the information from different sources and a key approach in Data Mining. Frequent item sets satisfying the minimum threshold can be discovered. Association rules are extracted from frequent item sets. The Association rules are affected by the minimum support value entered by the user may be considered as Positive or negative. There may be some other Association rules, which involve the rare item sets. Various methods have been used by researchers for generating the Association Rules. In this paper, our aim is to study various techniques to generate the Association rules.


Author(s):  
Jean-Francois Boulicaut

Condensed representations have been proposed in Mannila and Toivonen (1996) as a useful concept for the optimization of typical data-mining tasks. It appears as a key concept within the inductive database framework (Boulicaut et al., 1999; de Raedt, 2002; Imielinski & Mannila, 1996), and this article introduces this research domain, its achievements in the context of frequent itemset mining (FIM) from transactional data, and its future trends.


Sign in / Sign up

Export Citation Format

Share Document