scholarly journals TKFIM: Top-K frequent itemset mining technique based on equivalence classes

2021 ◽  
Vol 7 ◽  
pp. e385
Author(s):  
Saood Iqbal ◽  
Abdul Shahid ◽  
Muhammad Roman ◽  
Zahid Khan ◽  
Shaha Al-Otaibi ◽  
...  

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.

Author(s):  
Eka Karyawati ◽  
Edi Winarko

Frequent patterns (itemsets) discovery is an important problem in associative classification rule mining.  Differents approaches have been proposed such as the Apriori-like, Frequent Pattern (FP)-growth, and Transaction Data Location (Tid)-list Intersection algorithm. This paper focuses on surveying and comparing the state of the art associative classification techniques with regards to the rule generation phase of associative classification algorithms.  This phase includes frequent itemsets discovery and rules mining/extracting methods to generate the set of class association rules (CARs).  There are some techniques proposed to improve the rule generation method.  A technique by utilizing the concepts of discriminative power of itemsets can reduce the size of frequent itemset.  It can prune the useless frequent itemsets. The closed frequent itemset concept can be utilized to compress the rules to be compact rules.  This technique may reduce the size of generated rules.  Other technique is in determining the support threshold value of the itemset. Specifying not single but multiple support threshold values with regard to the class label frequencies can give more appropriate support threshold value.  This technique may generate more accurate rules. Alternative technique to generate rule is utilizing the vertical layout to represent dataset.  This method is very effective because it only needs one scan over dataset, compare with other techniques that need multiple scan over dataset.   However, one problem with these approaches is that the initial set of tid-lists may be too large to fit into main memory. It requires more sophisticated techniques to compress the tid-lists.


2020 ◽  
Vol 1 (3) ◽  
pp. 1-7
Author(s):  
Sarbani Dasgupta ◽  
Banani Saha

In data mining, Apriori technique is generally used for frequent itemsets mining and association rule learning over transactional databases. The frequent itemsets generated by the Apriori technique provides association rules which are used for finding trends in the database. As the size of the database increases, sequential implementation of Apriori technique will take a lot of time and at one point of time the system may crash. To overcome this problem, several algorithms for parallel implementation of Apriori technique have been proposed. This paper gives a comparative study on various parallel implementation of Apriori technique .It also focuses on the advantages of using the Map Reduce technology, the latest technology used in parallelization of large dataset mining.


Author(s):  
Luminita Dumitriu

Association rules, introduced by Agrawal, Imielinski and Swami (1993), provide useful means to discover associations in data. The problem of mining association rules in a database is defined as finding all the association rules that hold with more than a user-given minimum support threshold and a user-given minimum confidence threshold. According to Agrawal, Imielinski and Swami, this problem is solved in two steps: 1. Find all frequent itemsets in the database. 2. For each frequent itemset I, generate all the association rules I’ÞI\I’, where I’ÌI.


Author(s):  
Fatimah Audah Md. Zaki ◽  
Nurul Fariza Zulkurnain

<p>Mining frequent itemsets from large dataset has a major drawback in which the explosive number of itemsets requires additional mining process which might filter the interesting ones. Therefore, as the solution, the concept of closed frequent itemset was introduced that is lossless and condensed representation of all the frequent itemsets and their corresponding supports.  Unfortunately, many algorithms are not memory-efficient since it requires the storage of closed itemsets in main memory for duplication checks. This paper presents BFF, a scalable algorithm for discovering closed frequent itemsets from high-dimensional data. Unlike many well-known algorithms, BFF traverses the search tree in breadth-first manner resulted to a minimum use of memory and less running time. The tests conducted on a number of microarray datasets show that the performance of this algorithm improved significantly as the support threshold decreases which is crucial in generating more interesting rules.</p>


Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.


2021 ◽  
Vol 48 (4) ◽  
Author(s):  
Hafiz I. Ahmad ◽  
◽  
Alex T. H. Sim ◽  
Roliana Ibrahim ◽  
Mohammad Abrar ◽  
...  

Association rule mining (ARM) is used for discovering frequent itemsets for interesting relationships of associative and correlative behaviors within the data. This gives new insights of great value, both commercial and academic. The traditional ARM techniques discover interesting association rules based on a predefined minimum support threshold. However, there is no known standard of an exact definition of minimum support and providing an inappropriate minimum support value may result in missing important rules. In addition, most of the rules discovered by these traditional ARM techniques refer to already known knowledge. To address these limitations of the minimum support threshold in ARM techniques, this study proposes an algorithm to mine interesting association rules without minimum support using predicate logic and a property of a proposed interestingness measure (g measure). The algorithm scans the database and uses g measure’s property to search for interesting combinations. The selected combinations are mapped to pseudo-implications and inference rules of logic are used on the pseudo-implications to produce and validate the predicate rules. Experimental results of the proposed technique show better performance against state-of-the-art classification techniques, and reliable predicate rules are discovered based on the reliability differences of the presence and absence of the rule’s consequence.


2021 ◽  
Vol 50 (4) ◽  
pp. 627-644
Author(s):  
Shariq Bashir ◽  
Daphne Teck Ching Lai

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.


Author(s):  
Thanh Huan Phan ◽  
Hoài Bắc Lê

In 1993, Agrawal et al. proposed the first algorithm for mining traditional frequent itemset on binarytransactional database with unweighted items - This algorithmis essential in finding hindden relationships among items inyour data. Until 1998, with the development of various typesof transactional database - some researchers have proposed afrequent itemsets mining algorithms on transactional databasewith weighted items (the importance/meaning/value of itemsis different) - It provides more pieces of knowledge thantraditional frequent itemsets mining. In this article, the authors present a survey of frequent itemsets mining algorithmson transactional database with weighted items over the pasttwenty years. This research helps researchers to choose theright technical solution when it comes to scale up in big datamining. Finally, the authors give their recommendations anddirections for their future research.


Sign in / Sign up

Export Citation Format

Share Document