scholarly journals Minimum threshold determination method based on dataset characteristics in association rule mining

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

AbstractAssociation rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items will be involved or called the frequent itemset. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, user determines the minimum support value randomly. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. It causes a lot of memory and time consumption. This is because the rule formation process is repeated until it finds the desired number of rules. The value of minimum support in the adaptive support model is determined based on the average and total number of items in each transaction, as well as their support values. Furthermore, the proposed method also uses certain criteria as thresholds, therefore, the resulting rules are in accordance with user needs. The minimum support value in the proposed method is obtained from the average utility value divided by the total existing transactions. Experiments were carried out on 8 specific datasets to determine the association rules using different dataset characteristics. The trial of the proposed adaptive support method uses 2 basic algorithms in the association rule, namely Apriori and Fpgrowth. The test is carried out repeatedly to determine the highest and lowest minimum support values. The result showed that 6 out of 8 datasets produced minimum and maximum support values for the apriori and fpgrowth algorithms. This means that the value of the proposed adaptive support has the ability to generate a rule when viewed from the quality as adaptive support produces at a lift ratio value of > 1. The dataset characteristics obtained from the experimental results can be used as a factor to determine the minimum threshold value.

2021 ◽  
Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

Abstract The process of extracting data to obtain useful information is known as data mining. Furthermore, one of the promising and widely used techniques for this extraction process is association rule mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. The first step in association rule mining is the determination of the frequent item set that will be involved in the rule formation process. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, the minimum support value is determined by the user. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. In this study, a method was proposed to determine the minimum support value based on the characteristics of the dataset. Furthermore, this required certain criteria to be used as thresholds which led to more adaptive rules according to the needs of the user. The results of this study showed that 6 from 8 datasets, obtained a rule with lift ratio > 1 using the minimum threshold value that was determined through this method.


2020 ◽  
Vol 7 (2) ◽  
pp. 135-148
Author(s):  
Didi Supriyadi

Tingkat persaingan dan kompleksitas permasalahan penjualan pada perusahaan retail, menuntut setiap perusahaan retail untuk mampu berkompetisi dengan perusahaan lain. Salah satu yang dapat dilakukan adalah melalui pengambilan keputusan terkait penjualan yang lebih tepat dan efektif. Besarnya data transaksinonal penjualan perusahaan retail dapat dilakukan ekstraksi informasi yang bermanfaat. Metode yang dapat digunakan untuk menggali informasi adalah melalui penerapan association rule mining. Association Rule Mining merupakan suatu metode data mining yang berfokus pada pola transaksi dengan cara mengekstraksi asosiasi atau hubungan suatu kejadian. Keranjang belanja yang terdapat pada perusahaan retail yang terkomputerisasi merupakan cara terbaik untuk memberikan dukungan rekomendasi keputusan secara ilmiah dengan cara menentukan hubungan antara barang yang dibeli secara bersamaan dalam setiap transaksi. Algoritma FP-growth digunakan untuk menentukan himpunan dataset yang paling sering muncul (frequent itemset) pada sekeompok data. Penelitian ini menghasilkan nilai minimum support 0,1% dan nilai minimum confidence 60% jumlah rule yang dihasilkan berjumlah 116457, nilai minimum confidence 70% jumlah rule yang dihasilkan berjumlah 84086, dan nilai minimum confidence 80% jumlah rule yang dihasilkan berjumlah 48623 dari data yang diolah sebanyak 22191. Hasil rule ini dapat digunakan untuk strategi pemasaran produk. Nilai minimum support 0,1% dimana semakin besar nilai minimum confidence maka menghasilkan rule yang semakin sedikit.


2022 ◽  
Vol 1 ◽  
Author(s):  
Agostinetto Giulia ◽  
Sandionigi Anna ◽  
Bruno Antonia ◽  
Pescini Dario ◽  
Casiraghi Maurizio

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
AA Izang ◽  
SO Kuyoro ◽  
OD Alao ◽  
RU Okoro ◽  
OA Adesegun

Association rule mining (ARM) is an aspect of data mining that has revolutionized the area of predictive modelling paving way for data mining technique to become the recommended method for business owners to evaluate organizational performance. Market basket analysis (MBA), a useful modeling technique in data mining, is often used to analyze customer buying pattern. Choosing the right ARM algorithm to use in MBA is somewhat difficult, as most algorithms performance is determined by characteristics such as amount of data used, application domain, time variation, and customer’s preferences. Hence this study examines four ARM algorithm used in MBA systems for improved business Decisions. One million, one hundered and twele thousand (1,112,000) transactional data were extracted from Babcock University Superstore. The dataset was induced with Frequent Pattern Growth, Apiori, Association Outliers and Supervised Association Rule ARM algorithms. The outputs were compared using minimum support threshold, confidence level and execution time as metrics. The result showed that The FP Growth has minimum support threshold of 0.011 and confidence level of 0.013, Apriori 0.019 and 0.022, Association outliers 0.026 and 0.294 while Supervised Association Rule has 0.032 and 0.212 respectively. The FP Growth and Apirori ARM algorithms performed better than Association Outliers and Supervised Association Rule when the minimum support and confidence threshold were both set to 0.1. The study concluded by recommending a hybrid ARM algorithm to be used for building MBA Applications. The outcome of this study when adopted by business ventures will lead to improved business decisions thereby helping to achieve customer retention. Keywords: Association rule mining, Business ventures, Data mining, Market basket analysis, Transactional data.


2019 ◽  
Vol 8 (S2) ◽  
pp. 9-12
Author(s):  
R. Smeeta Mary ◽  
K. Perumal

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.


2019 ◽  
Vol 10 (1) ◽  
pp. 173-188
Author(s):  
Uci Baetulloh ◽  
Acep Irham Gufroni ◽  
Rianto

Data transaksi penjualan produk kartu perdana kuota internet dapat dijadikan sebagai bahan acuan untuk mengetahui seberapa besar tingkat penjualan produk yang telah dipasarkan oleh beberapa operator telekomunikasi seluler. Data tersebut tidak hanya dijadikan sebagai data arsip penyimpanan laporan penjualan perusahaan saja, tetapi dapat dianalisa dan dimanfaatkan menjadi sebuah informasi untuk membantu dalam melakukan pengembangan strategi pemasaran produk. Tujuan dari penelitian ini yaitu untuk menemukan aturan asosiasi kombinasi antar item produk operator telekomunikasi seluler mana saja yang paling laku terjual di wilayah penjualan Priangan Timur meliputi cluster Ciamis, Garut dan Tasikmalaya. Perhitungan Algoritma Apriori pada aturan asosiasi ini dihitung melalui tiga tahap iterasi pembentukan kandidat k-itemset. Hasil analisa aturan asosiasi yang terbentuk dari perhitungan algoritma apriori dengan menentukan nilai minimum support 35% dan nilai minimum confidence 80%, menghasilkan 9 aturan asosiasi final terbaik pada cluster Ciamis, 21 aturan asosiasi final untuk cluster Tasikmalaya dan 7 aturan asosiasi final untuk cluster Garut. Ketiga wilayah penjualan tersebut produk yang paling sering laku terjual dipasaran outlet adalah produk dari operator kartu kuota internet XL dengan Telkomsel dan produk Indosat dengan Telkomsel. Dengan demikian hasil yang diperoleh dapat digunakan untuk membantu pengambil keputusan dalam meningkatkan penjualan produk yang lebih baik


2021 ◽  
Vol 48 (4) ◽  
Author(s):  
Hafiz I. Ahmad ◽  
◽  
Alex T. H. Sim ◽  
Roliana Ibrahim ◽  
Mohammad Abrar ◽  
...  

Association rule mining (ARM) is used for discovering frequent itemsets for interesting relationships of associative and correlative behaviors within the data. This gives new insights of great value, both commercial and academic. The traditional ARM techniques discover interesting association rules based on a predefined minimum support threshold. However, there is no known standard of an exact definition of minimum support and providing an inappropriate minimum support value may result in missing important rules. In addition, most of the rules discovered by these traditional ARM techniques refer to already known knowledge. To address these limitations of the minimum support threshold in ARM techniques, this study proposes an algorithm to mine interesting association rules without minimum support using predicate logic and a property of a proposed interestingness measure (g measure). The algorithm scans the database and uses g measure’s property to search for interesting combinations. The selected combinations are mapped to pseudo-implications and inference rules of logic are used on the pseudo-implications to produce and validate the predicate rules. Experimental results of the proposed technique show better performance against state-of-the-art classification techniques, and reliable predicate rules are discovered based on the reliability differences of the presence and absence of the rule’s consequence.


2019 ◽  
Vol 7 (2) ◽  
pp. 143-152
Author(s):  
Lusa Indah Prahartiwi ◽  
Wulan Dari

Abstract   Over decades, retail chains and department stores have been selling their products without using the transactional data generated by their sales as a source of knowledge. Abundant data availability, the need for information (or knowledge) as a support for decision making to create business solutions, and infrastructure support in the field of information technology are the embryos of the birth of data mining technology. Association rule mining is a data mining method used to extract useful patterns between data items. In this research, the Apriori algorithm was applied to find frequent itemset in association rule mining. Data processing using Tanagra tools. The dataset used was the Supermarket dataset consisting of 12 attributes and 108.131 transaction. The experimental results obtained by association rules or rules from the combination of item-sets beer wine spirit-frozen foods and snack foods as a Frequent itemset with a support value of 15.489% and a confidence value of 83.719%. Lift ratio value obtained was 2.47766 which means that there were some benefits from the association rule or rules.   Keywords: Apriori, Association Rule Mining.   Abstrak   Selama beberapa dekade rantai ritel dan department store telah menjual produk mereka tanpa menggunakan data transaksional yang dihasilkan oleh penjualan mereka sebagai sumber pengetahuan. Ketersediaan data yang melimpah, kebutuhan akan informasi (atau pengetahuan) sebagai pendukung pengambilan keputusan untuk membuat solusi bisnis, dan dukungan infrastruktur di bidang teknologi informasi merupakan cikal-bakal dari lahirnya teknologi data mining. Data mining menemukan pola yang menarik dari database seperti association rule, correlations, sequences, classifier dan masih banyak lagi yang mana association rule adalah salah satu masalah yang paling popular. Association rule mining merupakan metode data mining yang digunakan untuk mengekstrasi pola yang bermanfaat di antara data barang. Pada penelitian ini diterapkan algoritma Apriori untuk pencarian frequent itemset dalam association rule mining. Pengolahan data menggunakan tools Tanagra. Dataset yang digunakan adalah dataset Supermarket yang terdiri dari 12 atribut dan 108.131 transaksi. Hasil eksperimen diperoleh aturan asosiasi atau rules dari kombinasi itemsets beer wine spirit-frozen foods dan snack foods sebagai Frequent itemset dengan nilai support sebesar 15,489% dan nilai confidence sebesar 83,719%. Nilai Lift ratio yang diperoleh sebesar 2,47766 yang artinya terdapat manfaat dari aturan asosiasi atau rules tersebut.   Kata kunci: Apriori, Association rule mining  


2021 ◽  
Vol 10 (1) ◽  
pp. 73
Author(s):  
Muhammad Firyanul Rizky ◽  
I Gusti Agung Gede Arya Kadyanan

Ubud market is one of the largest art markets in Bali, there are many local Balinese souvenir traders and craftspeople, most of them are livelihoods depend on buying and selling local souvenirs, Since the Covid-19 pandemic entered in April 2020, Ubud market traders have started to close their business and hoping economic recoveryin future. The author tries to do a track record of souvenir sales transactions in Ubud market to find the last sales pattern before the traders closes their business to give a solution for marketing strategies in future. The sales transaction data will just become meaningless trash if it’s useless.. To get use information about the products that are most sold out at Ubud Market from the transaction database, the author uses the Apriori algorithm. This study was determined final rules on 2 itemset combination, If buying Manik-Manik Craft, Also buy Barong Shirt with the highest confidence 70% and Minimum Support 28%, and for 3 itemset a combination, If buying Celuk Silver, and Barong Shirt, Also buy Manik-Manik Craft with the highest confidence 37.5% and Minimum Support 12%, based on that there are 3 best-selling souvenir products, namely Barong Shirt, Manik-Manik Craft and Silver-Celuk in March 2020. Keywords: Apriori Algorithm, Data Mining, Sales Analysis, Association Rule Mining, Ubud Market.


Sign in / Sign up

Export Citation Format

Share Document