An Enhanced Approach to Mine Maximal Frequent Itemset using Maximal Frequent Itemset Prima Algorithm (MFIPA)

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.

Existing data miming algorithms have mostly implemented data mining under centralized environment, but the large-scale database exists in the distributed form. According to the existing problem of the distributed data mining algorithm FDM and its improved algorithms, which exist the problem that the frequent itemsets are lost and network communication cost too much. This paper proposes a association rule mining algorithm based on distributed data (ARADD). The mapping marks the array mechanism is included in the ARADD algorithm, which can not only keep the integrity of the frequent itemsets, but also reduces the cost of network communication. The efficiency of algorithm is proved in the experiment.

Due to the popularity of knowledge discovery and data mining, in practice as well as among academic and corporate professionals, association rule mining is receiving increasing attention. The technology of data mining is applied in analyzing data in databases. This paper puts forward a new method which is suit to design the distributed databases.

The material is used by humans to manufacture the machines, components, devices and other products of substances. Association rules originated in the field of data mining, people use it to find large amounts of data between itemsets of the association. Apriori is a breadth-first algorithm to obtain the support is greater than the minimum support of frequent itemsets by repeatedly scanning the database. This paper presents the construction of materials science and information model based on association rule mining. Experimental data sets prove that the proposed algorithm is effective and reasonable.

Abstract   Over decades, retail chains and department stores have been selling their products without using the transactional data generated by their sales as a source of knowledge. Abundant data availability, the need for information (or knowledge) as a support for decision making to create business solutions, and infrastructure support in the field of information technology are the embryos of the birth of data mining technology. Association rule mining is a data mining method used to extract useful patterns between data items. In this research, the Apriori algorithm was applied to find frequent itemset in association rule mining. Data processing using Tanagra tools. The dataset used was the Supermarket dataset consisting of 12 attributes and 108.131 transaction. The experimental results obtained by association rules or rules from the combination of item-sets beer wine spirit-frozen foods and snack foods as a Frequent itemset with a support value of 15.489% and a confidence value of 83.719%. Lift ratio value obtained was 2.47766 which means that there were some benefits from the association rule or rules.   Keywords: Apriori, Association Rule Mining.   Abstrak   Selama beberapa dekade rantai ritel dan department store telah menjual produk mereka tanpa menggunakan data transaksional yang dihasilkan oleh penjualan mereka sebagai sumber pengetahuan. Ketersediaan data yang melimpah, kebutuhan akan informasi (atau pengetahuan) sebagai pendukung pengambilan keputusan untuk membuat solusi bisnis, dan dukungan infrastruktur di bidang teknologi informasi merupakan cikal-bakal dari lahirnya teknologi data mining. Data mining menemukan pola yang menarik dari database seperti association rule, correlations, sequences, classifier dan masih banyak lagi yang mana association rule adalah salah satu masalah yang paling popular. Association rule mining merupakan metode data mining yang digunakan untuk mengekstrasi pola yang bermanfaat di antara data barang. Pada penelitian ini diterapkan algoritma Apriori untuk pencarian frequent itemset dalam association rule mining. Pengolahan data menggunakan tools Tanagra. Dataset yang digunakan adalah dataset Supermarket yang terdiri dari 12 atribut dan 108.131 transaksi. Hasil eksperimen diperoleh aturan asosiasi atau rules dari kombinasi itemsets beer wine spirit-frozen foods dan snack foods sebagai Frequent itemset dengan nilai support sebesar 15,489% dan nilai confidence sebesar 83,719%. Nilai Lift ratio yang diperoleh sebesar 2,47766 yang artinya terdapat manfaat dari aturan asosiasi atau rules tersebut.   Kata kunci: Apriori, Association rule mining  

Association rule mining is one of the most important and well researched techniques of data mining, the key procedure of the association rule mining is to find frequent itemsets , the frequent itemsets are easily obtained by maximum frequent itemsets. so finding maximum frequent itemsets is one of the most important strategies of association data mining. Algorithms of mining maximum frequent itemsets based on compression matrix are introduced in this paper. It mainly obtains all maximum frequent itemsets by simply removing a set of rows and columns of transaction matrix, which is easily programmed recursive algorithm. The new algorithm optimizes the known association rule mining algorithms based on matrix given by some researchers in recent years, which greatly reduces the temporal complexity and spatial complexity, and highly promotes the efficiency of association rule mining.

Tingkat persaingan dan kompleksitas permasalahan penjualan pada perusahaan retail, menuntut setiap perusahaan retail untuk mampu berkompetisi dengan perusahaan lain. Salah satu yang dapat dilakukan adalah melalui pengambilan keputusan terkait penjualan yang lebih tepat dan efektif. Besarnya data transaksinonal penjualan perusahaan retail dapat dilakukan ekstraksi informasi yang bermanfaat. Metode yang dapat digunakan untuk menggali informasi adalah melalui penerapan association rule mining. Association Rule Mining merupakan suatu metode data mining yang berfokus pada pola transaksi dengan cara mengekstraksi asosiasi atau hubungan suatu kejadian. Keranjang belanja yang terdapat pada perusahaan retail yang terkomputerisasi merupakan cara terbaik untuk memberikan dukungan rekomendasi keputusan secara ilmiah dengan cara menentukan hubungan antara barang yang dibeli secara bersamaan dalam setiap transaksi. Algoritma FP-growth digunakan untuk menentukan himpunan dataset yang paling sering muncul (frequent itemset) pada sekeompok data. Penelitian ini menghasilkan nilai minimum support 0,1% dan nilai minimum confidence 60% jumlah rule yang dihasilkan berjumlah 116457, nilai minimum confidence 70% jumlah rule yang dihasilkan berjumlah 84086, dan nilai minimum confidence 80% jumlah rule yang dihasilkan berjumlah 48623 dari data yang diolah sebanyak 22191. Hasil rule ini dapat digunakan untuk strategi pemasaran produk. Nilai minimum support 0,1% dimana semakin besar nilai minimum confidence maka menghasilkan rule yang semakin sedikit.

There is huge amount of data being generated every minute on internet. This data is of no use until we cannot extract useful information from it. Data mining is the process of extracting useful information or knowledge from this huge amount of data that can be further used for various purposes. Discovering Association rules is one of the most important tasks among all other data mining tasks. Association rules contain the rules in the form of IF then THAN form. The leftmost part of the rule i.e. IF is called as the Antecedent which defines the condition and the rightmost part i.e. ELSE is called as the Consequent which defines the result. In this paper, we present the overview and comparison of Apriori, Apriori PT and Frequent Itemsets algorithm of association component in Tanagra Tool. We analyzed the performance based on the execution time and memory used for different number of instances, support and Rule Length in Spambase Dataset. The results show that when we increase the support value the Apriori PT takes the less execution time and Apriori takes less memory space. When numbers of instances are reduced Frequent Itemsets outperforms well both in case of memory and execution time. When rule length is increased the Apriori algorithm performs better than Apriori PT and Frequent Itemsets.

The discovery of association rules showing conditions of data co-occurrence has attracted the most attention in data mining. An example of an association rule is the rule “the customer who bought bread and butter also bought milk,” expressed by T(bread; butter)? T(milk). Let I ={x1,x2,…,xm} be a set of (data) items, called the domain; let D be a collection of records (transactions), where each record, T, has a unique identifier and contains a subset of items in I. We define itemset to be a set of items drawn from I and denote an itemset containing k items to be k-itemset. The support of itemset X, denoted by Ã(X/D), is the ratio of the number of records (in D) containing X to the total number of records in D. An association rule is an implication rule ?Y, where X; ? I and X ?Y=0. The confidence of ? Y is the ratio of s(?Y/D) to s(X/D), indicating that the percentage of those containing X also contain Y. Based on the user-specified minimum support (minsup) and confidence (minconf), the following statements are true: An itemset X is frequent if s(X/D)> minsup, and an association rule ? XY is strong i ?XY is frequent and ( / ) ( / ) X Y D X Y ? ¸ minconf. The problem of mining association rules is to find all strong association rules, which can be divided into two subproblems: 1. Find all the frequent itemsets. 2. Generate all strong rules from all frequent itemsets. Because the second subproblem is relatively straightforward ? we can solve it by extracting every subset from an itemset and examining the ratio of its support; most of the previous studies (Agrawal, Imielinski, & Swami, 1993; Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996; Park, Chen, & Yu, 1995; Savasere, Omiecinski, & Navathe, 1995) emphasized on developing efficient algorithms for the first subproblem. This article introduces two important techniques for association rule mining: (a) finding N most frequent itemsets and (b) mining multiple-level association rules.

Association rule mining is one of the most important and well researched techniques of data mining. The key procedure of the association rule mining is to find frequent itemsets. In this paper, a new mining frequent itemsets algorithm based on matrix is introduced. Frequent itemsets are obtained by compressing the transaction matrix efficiently by a new strategy. The new algorithm optimizes the known mining frequent itemsets algorithms based on matrix given by some researchers in recent years, which greatly reduces the temporal complexity and spatial complexity. It is more feasible especially when the degrees of the frequent itemsets are high.

