Mining Association Rules: A Case Study on Benchmark Dense Data

Author(s):  
Mustafa Bin Man ◽  
Wan Aezwani Wan Abu Bakar ◽  
Zailani Abdullah ◽  
Masita@Masila Abd Jalil ◽  
Tutut Herawan

<p class="Abstract">Data mining is the process of discovering knowledge and previously unknown pattern from large amount of data. The association rule mining (ARM) has been in trend where a new pattern analysis can be discovered to project for an important prediction about any issues. Since the first introduction of frequent itemset mining, it has received a major attention among researchers and various efficient and sophisticated algorithms have been proposed to do frequent itemset mining. Among the best-known algorithms are Apriori and FP-Growth. In this paper, we explore these algorithms and comparing their results in generating association rules based on benchmark dense datasets. The datasets are taken from frequent itemset mining data repository. The two algorithms are implemented in Rapid Miner 5.3.007 and the performance results are shown as comparison. FP-Growth is found to be better algorithm when encountering the support-confidence framework.</p>

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Chongjing Sun ◽  
Yan Fu ◽  
Junlin Zhou ◽  
Hui Gao

Frequent itemset mining is the important first step of association rule mining, which discovers interesting patterns from the massive data. There are increasing concerns about the privacy problem in the frequent itemset mining. Some works have been proposed to handle this kind of problem. In this paper, we introduce a personalized privacy problem, in which different attributes may need different privacy levels protection. To solve this problem, we give a personalized privacy-preserving method by using the randomized response technique. By providing different privacy levels for different attributes, this method can get a higher accuracy on frequent itemset mining than the traditional method providing the same privacy level. Finally, our experimental results show that our method can have better results on the frequent itemset mining while preserving personalized privacy.


2015 ◽  
Vol 78 (2-2) ◽  
Author(s):  
Wan Aezwani Wan Abu Bakar ◽  
Md. Yazid Md. Saman ◽  
Zailani Abdullah ◽  
Masita@Masila Abd Jalil ◽  
Tutut Herawan

Data Mining (DM), is the process of discovering knowledge and previously unknown pattern from large amount of data. The association rule mining has been in trend where a new pattern analysis can be discovered to project for an important prediction about any issues. In this article, we present comparison result between Apriori and FP-Growth algorithm in generating association rules based on a benchmark data from frequent itemset mining data repository. Experimentation with the two (2) algorithms are done in Rapid Miner 5.3.007 and the performance result is shown as a comparison. The results obtained confirmed and verified the results from the previous works done.


2021 ◽  
Vol 11 (1) ◽  
pp. 18-37
Author(s):  
Mehmet Bicer ◽  
Daniel Indictor ◽  
Ryan Yang ◽  
Xiaowen Zhang

Association rule mining is a common technique used in discovering interesting frequent patterns in data acquired in various application domains. The search space combinatorically explodes as the size of the data increases. Furthermore, the introduction of new data can invalidate old frequent patterns and introduce new ones. Hence, while finding the association rules efficiently is an important problem, maintaining and updating them is also crucial. Several algorithms have been introduced to find the association rules efficiently. One of them is Apriori. There are also algorithms written to update or maintain the existing association rules. Update with early pruning (UWEP) is one such algorithm. In this paper, the authors propose that in certain conditions it is preferable to use an incremental algorithm as opposed to the classic Apriori algorithm. They also propose new implementation techniques and improvements to the original UWEP paper in an algorithm we call UWEP2. These include the use of memorization and lazy evaluation to reduce scans of the dataset.


2018 ◽  
Vol 7 (2.28) ◽  
pp. 197
Author(s):  
W A.W.A. Bakar ◽  
M A. Jalil ◽  
M Man ◽  
Z Abdullah ◽  
F Mohd

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns.  Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining. 


2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


Author(s):  
Wenbin Zhou ◽  
Xuhui Xia ◽  
Zelin Zhang ◽  
Lei Wang

Abstract The potential relationship between service demands and remanufacturing services (RMS) is essential to make the decision of a RMS plan accurately and improve the efficiency and benefit. In the traditional association rule mining methods, a large number of candidate sets affect the mining efficiency, and the results are not easy for customers to understand. Therefore, a mining method based on binary particle swarm optimization ant colony algorithm to discover service demands and remanufacture services association rules is proposed. This method preprocesses the RMS records, converts them into a binary matrix, and uses the improved ant colony algorithm to mine the maximum frequent itemset. Because the particle swarm algorithm determines the initial pheromone concentration of the ant colony, it avoids the blindness of the ant colony, effectively enhances the searchability of the algorithm, and makes association rule mining faster and more accurate. Finally, a set of historical RMS record data of straightening machine is used to test the validity and feasibility of this method by extracting valid association rules to guide the design of RMS scheme for straightening machine parts.


2019 ◽  
Vol 10 (1) ◽  
pp. 11
Author(s):  
Adi Nugroho Susanto Putro ◽  
Richardus Indra Gunawan

Bisnis di bidang tanaman sayuran mengalami peningkatan yang cukup signifikan beberapa tahun belakangan ini. Salah satu cara untuk menghasilkan produk sayuran yang berkualitas tinggi secara kontinyu adalah budidaya dengan sistem hidroponik [1]. Bisnis hidroponik mempunyai peluang yang baik akan tetapi mempunyai kelemahan yaitu karena tanaman segar tanpa obat dan pengawet maka sayur dan buah hidroponik tidak dapat bertahan lama. Maka jika sayur dan buah ini tidak segera terjual akan mengakibatkan kerugian. Data mining merupakan proses mencari pola atau informasi menarik dalam data terpilih dengan menggunakan teknik atau metode tertentu. Apriori merupakan salah satu dari sepuluh algoritma yang paling berpengaruh dalam research community. Sejak algoritma Apriori pertama kali diperkenalkan, ada banyak upaya untuk merancang algoritma frequent itemset mining yang lebih efisien. Perbaikan yang paling menonjol pada Apriori menjadi sebuah metode yang disebut FP-Growth (frequent pattern growth) yang berhasil menghilangkan candidate generation [2]. Penelitian ini mengusulkan implementasi Algoritma FP-Growth dengan Software Open Source Weka untuk membantu menganalisa dan merancang katalog produk ritel hidroponik untuk mendorong buah atau sayur terjual secara bersama-sama. Dalam menentukan association rule, terdapat suatu interestingness measure (ukuran kepercayaan), yaitu support dan confidence. Penelitian ini, dengan menggunakan minimum suport 0,05 dan minimum confidence 0,9 menghasilkan 21 rule yang dapat digunakan sebagai strategi pemasaran PT. HAB.Kata Kunci: Algoritma FP-Growth, Strategi Pemasaran, Ritel Hidroponik.


2017 ◽  
Vol 6 (4) ◽  
pp. 141
Author(s):  
Sachin Sharma ◽  
Shaveta Bhatia

Frequent item set is the most crucial and expensive task for the industry today. It is the task of mining the information from different sources and a key approach in Data Mining. Frequent item sets satisfying the minimum threshold can be discovered. Association rules are extracted from frequent item sets. The Association rules are affected by the minimum support value entered by the user may be considered as Positive or negative. There may be some other Association rules, which involve the rare item sets. Various methods have been used by researchers for generating the Association Rules. In this paper, our aim is to study various techniques to generate the Association rules.


2020 ◽  
Vol 54 (3) ◽  
pp. 365-382
Author(s):  
Praveen Kumar Gopagoni ◽  
Mohan Rao S K

PurposeAssociation rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.Design/methodology/approachThe primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.FindingsThe experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.Originality/valueData mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.


Sign in / Sign up

Export Citation Format

Share Document