A Scalable Vertical Model for Mining Association Rules

2004 ◽  
Vol 03 (04) ◽  
pp. 317-329 ◽  
Author(s):  
Imad Rahal ◽  
Dongmei Ren ◽  
William Perrizo

Association rule mining (ARM) is the data-mining process for finding all association rules in datasets matching user-defined measures of interest such as support and confidence. Usually, ARM proceeds by mining all frequent itemsets — a step known to be very computationally intensive — from which rules are then derived in a straight forward manner. In general, mining all frequent itemsets prunes the space by using the downward closure (or anti-monotonicity) property of support which states that no itemset can be frequent unless all of its subsets are frequent. A large number of papers have addressed the problem of ARM but not many of them have focused on scalability over very large datasets (i.e. when datasets contain a very large number of transactions). In this paper, we propose a new model for representing data and mining frequent itemsets that is based on the P-tree technology for compression and faster logical operations over vertically structured data and on set enumeration trees for fast itemset enumeration. Experimental results presented hereinafter show big improvements for our approach over large datasets when compared to other contemporary approaches in the literature.

ARM is a significant area of knowledge mining which enables association rules which are essential for decision making. Frequent itemset mining has a challenge against large datasets. As going on the dataset size increases the burden and time to discover rules will increase. In this paper the ARM algorithms with tree structures like FP-tree, FIN with POC tree and PPC tree are discussed for reducing overheads and time consuming. These algorithms use highly competent data structures for mining frequent itemsets from the database. FIN uses nodeset a unique and novel data structure to extract frequent itemsets and POC tree to store frequent itemset information. These techniques are extremely helpful in the marketing fields. The proposed and implemented techniques reveal that they have improved about performance by means of time and efficiency


2014 ◽  
Vol 685 ◽  
pp. 575-578
Author(s):  
Guang Jiang Wang ◽  
Shi Guo Jin

Association rule mining is an important data mining method; it is the key link of finding frequent itemsets. The process of association rules mining is roughly into two steps: the first step is to find out from all the concentration of all the frequent itemsets; the second step is to obtain the association rules from frequent itemsets. This paper analyzes the collected information of nodes in wireless sensor network and management. The paper presents application of association rule mining technology in the collection and management of wireless sensor network node.


Kybernetes ◽  
2018 ◽  
Vol 47 (3) ◽  
pp. 441-457 ◽  
Author(s):  
Cheng-Hsiung Weng ◽  
Tony Cheng-Kui Huang

Purpose Customer lifetime value (CLV) scoring is highly effective when applied to marketing databases. Some researchers have extended the traditional association rule problem by associating a weight with each item in a transaction. However, studies of association rule mining have considered the relative benefits or significance of “items” rather than “transactions” belonging to different customers. Because not all customers are financially attractive to firms, it is crucial that their profitability be determined and that transactions be weighted according to CLV. This study aims to discover association rules from the CLV perspective. Design/methodology/approach This study extended the traditional association rule problem by allowing the association of CLV weight with a transaction to reflect the interest and intensity of customer values. Furthermore, the authors proposed a new algorithm, frequent itemsets of CLV weight (FICLV), to discover frequent itemsets from CLV-weighted transactions. Findings Experimental results from the survey data indicate that the proposed FICLV algorithm can discover valuable frequent itemsets. Moreover, the frequent itemsets identified using the FICLV algorithm outperform those discovered through conventional approaches for predicting customer purchasing itemsets in the coming period. Originality/value This study is the first to introduce the optimum approach for discovering frequent itemsets from transactions through considering CLV.


2021 ◽  
Vol 48 (4) ◽  
Author(s):  
Hafiz I. Ahmad ◽  
◽  
Alex T. H. Sim ◽  
Roliana Ibrahim ◽  
Mohammad Abrar ◽  
...  

Association rule mining (ARM) is used for discovering frequent itemsets for interesting relationships of associative and correlative behaviors within the data. This gives new insights of great value, both commercial and academic. The traditional ARM techniques discover interesting association rules based on a predefined minimum support threshold. However, there is no known standard of an exact definition of minimum support and providing an inappropriate minimum support value may result in missing important rules. In addition, most of the rules discovered by these traditional ARM techniques refer to already known knowledge. To address these limitations of the minimum support threshold in ARM techniques, this study proposes an algorithm to mine interesting association rules without minimum support using predicate logic and a property of a proposed interestingness measure (g measure). The algorithm scans the database and uses g measure’s property to search for interesting combinations. The selected combinations are mapped to pseudo-implications and inference rules of logic are used on the pseudo-implications to produce and validate the predicate rules. Experimental results of the proposed technique show better performance against state-of-the-art classification techniques, and reliable predicate rules are discovered based on the reliability differences of the presence and absence of the rule’s consequence.


Author(s):  
Maybin Muyeba ◽  
M. Sulaiman Khan ◽  
Frans Coenen

A novel approach is presented for effectively mining weighted fuzzy association rules (ARs). The authors address the issue of invalidation of downward closure property (DCP) in weighted association rule mining where each item is assigned a weight according to its significance wrt some user defined criteria. Most works on weighted association rule mining do not address the downward closure property while some make assumptions to validate the property. This chapter generalizes the weighted association rule mining problem with binary and fuzzy attributes with weighted settings. Their methodology follows an Apriori approach but employs T-tree data structure to improve efficiency of counting itemsets. The authors’ approach avoids pre and post processing as opposed to most weighted association rule mining algorithms, thus eliminating the extra steps during rules generation. The chapter presents experimental results on both synthetic and real-data sets and a discussion on evaluating the proposed approach.


2018 ◽  
Vol 189 ◽  
pp. 10012 ◽  
Author(s):  
Ming Yin ◽  
Wenjie Wang ◽  
Yang Liu ◽  
Dan Jiang

FP-Growth algorithm is an association rule mining algorithm based on frequent pattern tree (FP-Tree), which doesn’t need to generate a large number of candidate sets. However, constructing FP-Tree requires two scansof the original transaction database and the recursive mining of FP-Tree to generate frequent itemsets. In addition, the algorithm can’t work effectively when the dataset is dense. To solve the problems of large memory usage and low time-effectiveness of data mining in this algorithm, this paper proposes an improved algorithm based on adjacency table using a hash table to store adjacency table, which considerably saves the finding time. The experimental results show that the improved algorithm has good performance especially for mining frequent itemsets in dense data sets.


There is huge amount of data being generated every minute on internet. This data is of no use until we cannot extract useful information from it. Data mining is the process of extracting useful information or knowledge from this huge amount of data that can be further used for various purposes. Discovering Association rules is one of the most important tasks among all other data mining tasks. Association rules contain the rules in the form of IF then THAN form. The leftmost part of the rule i.e. IF is called as the Antecedent which defines the condition and the rightmost part i.e. ELSE is called as the Consequent which defines the result. In this paper, we present the overview and comparison of Apriori, Apriori PT and Frequent Itemsets algorithm of association component in Tanagra Tool. We analyzed the performance based on the execution time and memory used for different number of instances, support and Rule Length in Spambase Dataset. The results show that when we increase the support value the Apriori PT takes the less execution time and Apriori takes less memory space. When numbers of instances are reduced Frequent Itemsets outperforms well both in case of memory and execution time. When rule length is increased the Apriori algorithm performs better than Apriori PT and Frequent Itemsets.


Author(s):  
Hong Shen

The discovery of association rules showing conditions of data co-occurrence has attracted the most attention in data mining. An example of an association rule is the rule “the customer who bought bread and butter also bought milk,” expressed by T(bread; butter)? T(milk). Let I ={x1,x2,…,xm} be a set of (data) items, called the domain; let D be a collection of records (transactions), where each record, T, has a unique identifier and contains a subset of items in I. We define itemset to be a set of items drawn from I and denote an itemset containing k items to be k-itemset. The support of itemset X, denoted by Ã(X/D), is the ratio of the number of records (in D) containing X to the total number of records in D. An association rule is an implication rule ?Y, where X; ? I and X ?Y=0. The confidence of ? Y is the ratio of s(?Y/D) to s(X/D), indicating that the percentage of those containing X also contain Y. Based on the user-specified minimum support (minsup) and confidence (minconf), the following statements are true: An itemset X is frequent if s(X/D)> minsup, and an association rule ? XY is strong i ?XY is frequent and ( / ) ( / ) X Y D X Y ? ¸ minconf. The problem of mining association rules is to find all strong association rules, which can be divided into two subproblems: 1. Find all the frequent itemsets. 2. Generate all strong rules from all frequent itemsets. Because the second subproblem is relatively straightforward ? we can solve it by extracting every subset from an itemset and examining the ratio of its support; most of the previous studies (Agrawal, Imielinski, & Swami, 1993; Agrawal, Mannila, Srikant, Toivonen, & Verkamo, 1996; Park, Chen, & Yu, 1995; Savasere, Omiecinski, & Navathe, 1995) emphasized on developing efficient algorithms for the first subproblem. This article introduces two important techniques for association rule mining: (a) finding N most frequent itemsets and (b) mining multiple-level association rules.


2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Venkatapathy Umarani ◽  
Muthusamy Punithavalli

AbstractThe discovery of association rules is an important and challenging data mining task. Most of the existing algorithms for finding association rules require multiple passes over the entire database, and I/O overhead incurred is extremely high for very large databases. An obvious approach to reduce the complexity of association rule mining is sampling. In recent times, several sampling-based approaches have been developed for speeding up the process of association rule mining. A proficient progressive sampling-based approach is presented for mining association rules from large databases. At first, frequent itemsets are mined from an initial sample and subsequently, the negative border is computed from the mined frequent itemsets. Based on the support computed for the midpoint itemset in the sorted negative border, the sample size is either increased or association rules are mined from it. In this paper, we have presented an extensive analysis of the progressive sampling-based approach with different real life datasets and, in addition, the performance of the approach is evaluated with the well-known association rule mining algorithm, Apriori. The experimental results show that accuracy and computation time of the progressive sampling-based approach is effectively improved in mining of association rules from the real life datasets.


Sign in / Sign up

Export Citation Format

Share Document