Closed-Itemset Incremental-Mining Problem

Author(s):  
Luminita Dumitriu

Association rules, introduced by Agrawal, Imielinski and Swami (1993), provide useful means to discover associations in data. The problem of mining association rules in a database is defined as finding all the association rules that hold with more than a user-given minimum support threshold and a user-given minimum confidence threshold. According to Agrawal, Imielinski and Swami, this problem is solved in two steps: 1. Find all frequent itemsets in the database. 2. For each frequent itemset I, generate all the association rules I’ÞI\I’, where I’ÌI.

Author(s):  
Weigang Huo ◽  
Xingjie Feng ◽  
Zhiyuan Zhang

Keeping the generated fuzzy frequent itemsets up-to-date and discovering the new fuzzy frequent itemsets are challenging problems in dynamic databases. In this paper, the classical H-struct structure is extended to mining fuzzy frequent itemsets. The extended H-mine algorithm can use any t-norm operator to calculate the support of fuzzy itemset. The FP-tree-based structure called the Initial-FP-tree and the New-FP-tree are built to maintain the fuzzy frequent itemsets in the original database and the new inserted transactions respectively. The strategy of incremental mining of fuzzy frequent itemsets is achieved by breath-first-traversing the Initial-FP-tree and the New-FP-tree. All of the fuzzy frequent itemsets in the updated database can be obtained by traversing the Initial-FP-tree. The experiments on real datasets show that the proposed approach runs faster than the batch extended H-mine algorithm. Comparing with the existing algorithm for incremental mining fuzzy frequent itemsets, the proposed approach is superior in terms of the execution time. The memory cost of the proposed approach is lower than that of the existing algorithm when the minimum support threshold is low.


Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.


2021 ◽  
Vol 48 (4) ◽  
Author(s):  
Hafiz I. Ahmad ◽  
◽  
Alex T. H. Sim ◽  
Roliana Ibrahim ◽  
Mohammad Abrar ◽  
...  

Association rule mining (ARM) is used for discovering frequent itemsets for interesting relationships of associative and correlative behaviors within the data. This gives new insights of great value, both commercial and academic. The traditional ARM techniques discover interesting association rules based on a predefined minimum support threshold. However, there is no known standard of an exact definition of minimum support and providing an inappropriate minimum support value may result in missing important rules. In addition, most of the rules discovered by these traditional ARM techniques refer to already known knowledge. To address these limitations of the minimum support threshold in ARM techniques, this study proposes an algorithm to mine interesting association rules without minimum support using predicate logic and a property of a proposed interestingness measure (g measure). The algorithm scans the database and uses g measure’s property to search for interesting combinations. The selected combinations are mapped to pseudo-implications and inference rules of logic are used on the pseudo-implications to produce and validate the predicate rules. Experimental results of the proposed technique show better performance against state-of-the-art classification techniques, and reliable predicate rules are discovered based on the reliability differences of the presence and absence of the rule’s consequence.


2008 ◽  
pp. 3222-3234
Author(s):  
Yun Sing Koh ◽  
Nathan Rountree ◽  
Richard O’Keefe

Discovering association rules efficiently is an important data mining problem. We define sporadic rules as those with low support but high confidence; for example, a rare association of two symptoms indicating a rare disease. To find such rules using the well-known Apriori algorithm, minimum support has to be set very low, producing a large number of trivial frequent itemsets. To alleviate this problem, we propose a new method of discovering sporadic rules without having to produce all other rules above the minimum support threshold. The new method, called Apriori-Inverse, is a variation of the Apriori algorithm that uses the notion of maximum support instead of minimum support to generate candidate itemsets. Candidate itemsets of interest to us fall below a maximum support value but above a minimum absolute support value. Rules above maximum support are considered frequent rules, which are of no interest to us, whereas rules that occur by chance fall below the minimum absolute support value. We define two classes of sporadic rule: perfectly sporadic rules (those that consist only of items falling below maximum support) and imperfectly sporadic rules (those that may contain items over the maximum support threshold). This article is an expanded version of Koh and Rountree (2005).


Author(s):  
Fatimah Audah Md. Zaki ◽  
Nurul Fariza Zulkurnain

<p>Mining frequent itemsets from large dataset has a major drawback in which the explosive number of itemsets requires additional mining process which might filter the interesting ones. Therefore, as the solution, the concept of closed frequent itemset was introduced that is lossless and condensed representation of all the frequent itemsets and their corresponding supports.  Unfortunately, many algorithms are not memory-efficient since it requires the storage of closed itemsets in main memory for duplication checks. This paper presents BFF, a scalable algorithm for discovering closed frequent itemsets from high-dimensional data. Unlike many well-known algorithms, BFF traverses the search tree in breadth-first manner resulted to a minimum use of memory and less running time. The tests conducted on a number of microarray datasets show that the performance of this algorithm improved significantly as the support threshold decreases which is crucial in generating more interesting rules.</p>


2013 ◽  
Vol 411-414 ◽  
pp. 386-389 ◽  
Author(s):  
Tian Tian Xu ◽  
Xiang Jun Dong

Negative frequent itemsets (NFIS) like (a1a2¬a3a4) have played important roles in real applications because we can mine valued negative association rules from them. In one of our previous work, we proposed a method, namede-NFISto mine NFIS from positive frequent itemsets (PFIS). However,e-NFISonly uses single minimum support, which implicitly assumes that all items in the database are of the same nature or of similar frequencies in the database. This is often not the case in real-life applications. So a lot of methods to mine frequent itemsets with multiple minimum supports have been proposed. These methods allow users to assign different minimum supports to different items. But these methods only mine PFIS, doesn’t consider negative ones. So in this paper, we propose a new method, namede-msNFIS, to mine NFIS from PFIS based on multiple minimum supports. E-msNFIScontains three steps: 1) using existing methods to mine PFIS with multiple minimum supports; 2) using the same method ine-NFISto generate NCIS from PFIS got in step 1; 3) calculating the support of these NCIS only using the support of PFIS and then gettingNFIS. Experimental results show that thee-msNFISis efficient.


2013 ◽  
Vol 321-324 ◽  
pp. 2578-2582
Author(s):  
Qian Zhang

This paper examined the application of Apriori algorithm in extracting association rules in data mining by sample data on student enrollments. It studied the data mining techniques for extraction of association rules, analyzed the correlation between specialties and characteristics of admitted students, and evaluated the algorithm for mining association rules, in which the minimum support was 30% and the minimum confidence was 40%.


2021 ◽  
Vol 14 (2) ◽  
pp. 125
Author(s):  
Ainul Mardiaha ◽  
Yulia Yulia

This research was carried out to simplify or assist Candra Motor workshop owners in managing data and archives of motorcycle parts sales by applying a data mining a priori algorithm method. Data mining is an operation that uses a particular technique or method to look for different patterns or shapes in a selected data. Sales data for a year with the number of 15 items selected using the priori algorithm method. A priori algorithm is an algorithm for taking data with associative rules (association rule) to determine the associative relationship of an item combination. In a priori algorithm, it is determined frequent itemset-1, frequent itemset-2, and frequent itemset-3 so that the association rules can be obtained from previously selected data. To obtain the frequent itemset, each selected data must meet the minimum support and minimum confidence requirements. In this study using minimum support ? 7 or 0.583 and minimum confidence of 90%. So that some rules of association were obtained, where the calculation of the search for association rules manually and using WEKA software obtained the same results.By fulfilling the minimum support and minimum confidence requirements, the most sold spare parts are inner tube, Yamaha oil and MPX oil.


Sign in / Sign up

Export Citation Format

Share Document