scholarly journals Minimum Threshold Determination Method based on Dataset Characteristics in Association Rule Mining

Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

Abstract The process of extracting data to obtain useful information is known as data mining. Furthermore, one of the promising and widely used techniques for this extraction process is association rule mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. The first step in association rule mining is the determination of the frequent item set that will be involved in the rule formation process. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, the minimum support value is determined by the user. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. In this study, a method was proposed to determine the minimum support value based on the characteristics of the dataset. Furthermore, this required certain criteria to be used as thresholds which led to more adaptive rules according to the needs of the user. The results of this study showed that 6 from 8 datasets, obtained a rule with lift ratio > 1 using the minimum threshold value that was determined through this method.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

AbstractAssociation rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items will be involved or called the frequent itemset. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, user determines the minimum support value randomly. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. It causes a lot of memory and time consumption. This is because the rule formation process is repeated until it finds the desired number of rules. The value of minimum support in the adaptive support model is determined based on the average and total number of items in each transaction, as well as their support values. Furthermore, the proposed method also uses certain criteria as thresholds, therefore, the resulting rules are in accordance with user needs. The minimum support value in the proposed method is obtained from the average utility value divided by the total existing transactions. Experiments were carried out on 8 specific datasets to determine the association rules using different dataset characteristics. The trial of the proposed adaptive support method uses 2 basic algorithms in the association rule, namely Apriori and Fpgrowth. The test is carried out repeatedly to determine the highest and lowest minimum support values. The result showed that 6 out of 8 datasets produced minimum and maximum support values for the apriori and fpgrowth algorithms. This means that the value of the proposed adaptive support has the ability to generate a rule when viewed from the quality as adaptive support produces at a lift ratio value of > 1. The dataset characteristics obtained from the experimental results can be used as a factor to determine the minimum threshold value.


2020 ◽  
Vol 7 (2) ◽  
pp. 135-148
Author(s):  
Didi Supriyadi

Tingkat persaingan dan kompleksitas permasalahan penjualan pada perusahaan retail, menuntut setiap perusahaan retail untuk mampu berkompetisi dengan perusahaan lain. Salah satu yang dapat dilakukan adalah melalui pengambilan keputusan terkait penjualan yang lebih tepat dan efektif. Besarnya data transaksinonal penjualan perusahaan retail dapat dilakukan ekstraksi informasi yang bermanfaat. Metode yang dapat digunakan untuk menggali informasi adalah melalui penerapan association rule mining. Association Rule Mining merupakan suatu metode data mining yang berfokus pada pola transaksi dengan cara mengekstraksi asosiasi atau hubungan suatu kejadian. Keranjang belanja yang terdapat pada perusahaan retail yang terkomputerisasi merupakan cara terbaik untuk memberikan dukungan rekomendasi keputusan secara ilmiah dengan cara menentukan hubungan antara barang yang dibeli secara bersamaan dalam setiap transaksi. Algoritma FP-growth digunakan untuk menentukan himpunan dataset yang paling sering muncul (frequent itemset) pada sekeompok data. Penelitian ini menghasilkan nilai minimum support 0,1% dan nilai minimum confidence 60% jumlah rule yang dihasilkan berjumlah 116457, nilai minimum confidence 70% jumlah rule yang dihasilkan berjumlah 84086, dan nilai minimum confidence 80% jumlah rule yang dihasilkan berjumlah 48623 dari data yang diolah sebanyak 22191. Hasil rule ini dapat digunakan untuk strategi pemasaran produk. Nilai minimum support 0,1% dimana semakin besar nilai minimum confidence maka menghasilkan rule yang semakin sedikit.


2022 ◽  
Vol 1 ◽  
Author(s):  
Agostinetto Giulia ◽  
Sandionigi Anna ◽  
Bruno Antonia ◽  
Pescini Dario ◽  
Casiraghi Maurizio

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.


Author(s):  
Claudio Haruo Yamamoto ◽  
Maria Cristina Ferreira de Oliveira ◽  
Solange Oliveira Rezende

Miners face many challenges when dealing with association rule mining tasks, such as defining proper parameters for the algorithm, handling sets of rules so large that exploration becomes difficult and uncomfortable, and understanding complex rules containing many items. In order to tackle these problems, many researchers have been investigating visual representations and information visualization techniques to assist association rule mining. In this chapter, an overview is presented of the many approaches found in literature. First, the authors introduce a classification of the different approaches that rely on visual representations, based on the role played by the visualization technique in the exploration of rule sets. Current approaches typically focus on model viewing, that is visualizing rule content, namely antecedent and consequent in a rule, and/or different interest measure values associated to it. Nonetheless, other approaches do not restrict themselves to aiding exploration of the final rule set, but propose representations to assist miners along the rule extraction process. One such approach is a methodology the authors have been developing that supports visually assisted selective generation of association rules based on identifying clusters of similar itemsets. They introduce this methodology and a quantitative evaluation of it. Then, they present a case study in which it was employed to extract rules from a real and complex dataset. Finally, they identify some trends and issues for further developments in this area.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
AA Izang ◽  
SO Kuyoro ◽  
OD Alao ◽  
RU Okoro ◽  
OA Adesegun

Association rule mining (ARM) is an aspect of data mining that has revolutionized the area of predictive modelling paving way for data mining technique to become the recommended method for business owners to evaluate organizational performance. Market basket analysis (MBA), a useful modeling technique in data mining, is often used to analyze customer buying pattern. Choosing the right ARM algorithm to use in MBA is somewhat difficult, as most algorithms performance is determined by characteristics such as amount of data used, application domain, time variation, and customer’s preferences. Hence this study examines four ARM algorithm used in MBA systems for improved business Decisions. One million, one hundered and twele thousand (1,112,000) transactional data were extracted from Babcock University Superstore. The dataset was induced with Frequent Pattern Growth, Apiori, Association Outliers and Supervised Association Rule ARM algorithms. The outputs were compared using minimum support threshold, confidence level and execution time as metrics. The result showed that The FP Growth has minimum support threshold of 0.011 and confidence level of 0.013, Apriori 0.019 and 0.022, Association outliers 0.026 and 0.294 while Supervised Association Rule has 0.032 and 0.212 respectively. The FP Growth and Apirori ARM algorithms performed better than Association Outliers and Supervised Association Rule when the minimum support and confidence threshold were both set to 0.1. The study concluded by recommending a hybrid ARM algorithm to be used for building MBA Applications. The outcome of this study when adopted by business ventures will lead to improved business decisions thereby helping to achieve customer retention. Keywords: Association rule mining, Business ventures, Data mining, Market basket analysis, Transactional data.


2019 ◽  
Vol 8 (S2) ◽  
pp. 9-12
Author(s):  
R. Smeeta Mary ◽  
K. Perumal

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.


2019 ◽  
Vol 10 (1) ◽  
pp. 173-188
Author(s):  
Uci Baetulloh ◽  
Acep Irham Gufroni ◽  
Rianto

Data transaksi penjualan produk kartu perdana kuota internet dapat dijadikan sebagai bahan acuan untuk mengetahui seberapa besar tingkat penjualan produk yang telah dipasarkan oleh beberapa operator telekomunikasi seluler. Data tersebut tidak hanya dijadikan sebagai data arsip penyimpanan laporan penjualan perusahaan saja, tetapi dapat dianalisa dan dimanfaatkan menjadi sebuah informasi untuk membantu dalam melakukan pengembangan strategi pemasaran produk. Tujuan dari penelitian ini yaitu untuk menemukan aturan asosiasi kombinasi antar item produk operator telekomunikasi seluler mana saja yang paling laku terjual di wilayah penjualan Priangan Timur meliputi cluster Ciamis, Garut dan Tasikmalaya. Perhitungan Algoritma Apriori pada aturan asosiasi ini dihitung melalui tiga tahap iterasi pembentukan kandidat k-itemset. Hasil analisa aturan asosiasi yang terbentuk dari perhitungan algoritma apriori dengan menentukan nilai minimum support 35% dan nilai minimum confidence 80%, menghasilkan 9 aturan asosiasi final terbaik pada cluster Ciamis, 21 aturan asosiasi final untuk cluster Tasikmalaya dan 7 aturan asosiasi final untuk cluster Garut. Ketiga wilayah penjualan tersebut produk yang paling sering laku terjual dipasaran outlet adalah produk dari operator kartu kuota internet XL dengan Telkomsel dan produk Indosat dengan Telkomsel. Dengan demikian hasil yang diperoleh dapat digunakan untuk membantu pengambil keputusan dalam meningkatkan penjualan produk yang lebih baik


Sign in / Sign up

Export Citation Format

Share Document