A study of frequent itemset mining techniques

Frequent item set is the most crucial and expensive task for the industry today. It is the task of mining the information from different sources and a key approach in Data Mining. Frequent item sets satisfying the minimum threshold can be discovered. Association rules are extracted from frequent item sets. The Association rules are affected by the minimum support value entered by the user may be considered as Positive or negative. There may be some other Association rules, which involve the rare item sets. Various methods have been used by researchers for generating the Association Rules. In this paper, our aim is to study various techniques to generate the Association rules.

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

Hash based Approach for Mining Frequent Item Sets from Transactional Databases

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.34.19214 ◽

2018 ◽

Vol 7 (3.34) ◽

pp. 309

Author(s):

UMohan Srinivas ◽

Ch Anuradha ◽

Dr P. Sri Rama Chandra Murty

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Minimum Threshold ◽

Result Section ◽

Itemset Mining ◽

Novel Approach ◽

Large Databases ◽

Transactional Databases ◽

Efficient Level ◽

Frequent Item Sets

Frequent Itemset Mining become so popular in extracting hidden patterns from transactional databases. Among the several approaches, Apriori algorithm is known to be a basic approach which follows candidate generate and test based strategy. Although it is efficient level-wise approach, it has two limitations, (i) several passes are required to check the support of candidate itemsets. (ii) Towards more candidate itemsets and minimum threshold variations. A novel approach is proposed to tackle the above limitations. The proposed approach is one pass Hash-based Frequent Itemset Mining to derive frequent patterns. HFIM has feature that maintains candidate itemsets dynamically which are independent on minimum threshold. This feature allows to limit the number of scans over the database to one. In this paper, HFIM is compared with the Apriori to show the performance on standard datasets. The result section shows that HFIM outperforms Apriori over large databases.

Download Full-text

Implementasi Data Mining Menggunakan Algoritma Apriori Pada Penjualan Suku Cadang Motor

Jurnal Ilmu Komputer ◽

10.24843/jik.2021.v14.i02.p07 ◽

2021 ◽

Vol 14 (2) ◽

pp. 125

Author(s):

Ainul Mardiaha ◽

Yulia Yulia

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

A Priori ◽

Spare Parts ◽

Frequent Itemset ◽

Sales Data ◽

Minimum Support ◽

Relationship Of

This research was carried out to simplify or assist Candra Motor workshop owners in managing data and archives of motorcycle parts sales by applying a data mining a priori algorithm method. Data mining is an operation that uses a particular technique or method to look for different patterns or shapes in a selected data. Sales data for a year with the number of 15 items selected using the priori algorithm method. A priori algorithm is an algorithm for taking data with associative rules (association rule) to determine the associative relationship of an item combination. In a priori algorithm, it is determined frequent itemset-1, frequent itemset-2, and frequent itemset-3 so that the association rules can be obtained from previously selected data. To obtain the frequent itemset, each selected data must meet the minimum support and minimum confidence requirements. In this study using minimum support ? 7 or 0.583 and minimum confidence of 90%. So that some rules of association were obtained, where the calculation of the search for association rules manually and using WEKA software obtained the same results.By fulfilling the minimum support and minimum confidence requirements, the most sold spare parts are inner tube, Yamaha oil and MPX oil.

Download Full-text

Frequent Itemset Mining in Data Mining: A Survey

International Journal of Computer Applications ◽

10.5120/ijca2016909219 ◽

2016 ◽

Vol 139 (9) ◽

pp. 15-18 ◽

Cited By ~ 3

Author(s):

Rana Ishita ◽

Amit Rathod

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Partition based Single Scan Method for Mining Frequent Item Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9237.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4917-4922

Keyword(s):

Unique Feature ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Minimum Support ◽

Itemset Mining ◽

Highly Sensitive ◽

Support Threshold ◽

Hidden Patterns ◽

The Cost ◽

Frequent Item Sets

Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.

Download Full-text

Postdiffset: an Eclat-like algorithm for frequent itemset mining

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.28.12911 ◽

2018 ◽

Vol 7 (2.28) ◽

pp. 197

Author(s):

W A.W.A. Bakar ◽

M A. Jalil ◽

M Man ◽

Z Abdullah ◽

F Mohd

Keyword(s):

Data Mining ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Data Format ◽

Itemset Mining ◽

Data Formats ◽

Vertical Data ◽

Mining Algorithms

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns. Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining.

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

Trust-but-Verify: Verifying Result Correctness of Outsourced Frequent Itemset Mining in Data-Mining-As-a-Service Paradigm

IEEE Transactions on Services Computing ◽

10.1109/tsc.2015.2436387 ◽

2016 ◽

Vol 9 (1) ◽

pp. 18-32 ◽

Cited By ~ 6

Author(s):

Boxiang Dong ◽

Ruilin Liu ◽

Hui Wendy Wang

Keyword(s):

Data Mining ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Service Paradigm ◽

Result Correctness

Download Full-text

Association Rule Mining Algorithms for Big Data using RDD-ECLAT Algorithms

10.21203/rs.3.rs-935690/v1 ◽

2021 ◽

Author(s):

Martha ◽

Ramdas Vankdothu ◽

Hameed Mohd Abdul ◽

Rekha Gangula

Keyword(s):

Data Mining ◽

Big Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

New Paradigm ◽

Rule Mining ◽

Data Intensive ◽

Itemset Mining ◽

Real World Datasets ◽

Mining Algorithms

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

Mining Association Rules: A Case Study on Benchmark Dense Data

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v3.i3.pp546-553 ◽

2016 ◽

Vol 3 (3) ◽

pp. 546 ◽

Cited By ~ 2

Author(s):

Mustafa Bin Man ◽

Wan Aezwani Wan Abu Bakar ◽

Zailani Abdullah ◽

Masita@Masila Abd Jalil ◽

Tutut Herawan

Keyword(s):

Association Rules ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Data Repository ◽

Rule Mining ◽

Itemset Mining ◽

Major Attention ◽

Performance Results

<p class="Abstract">Data mining is the process of discovering knowledge and previously unknown pattern from large amount of data. The association rule mining (ARM) has been in trend where a new pattern analysis can be discovered to project for an important prediction about any issues. Since the first introduction of frequent itemset mining, it has received a major attention among researchers and various efficient and sophisticated algorithms have been proposed to do frequent itemset mining. Among the best-known algorithms are Apriori and FP-Growth. In this paper, we explore these algorithms and comparing their results in generating association rules based on benchmark dense datasets. The datasets are taken from frequent itemset mining data repository. The two algorithms are implemented in Rapid Miner 5.3.007 and the performance results are shown as comparison. FP-Growth is found to be better algorithm when encountering the support-confidence framework.</p>

Download Full-text