closed frequent itemsets
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 9)

H-INDEX

7
(FIVE YEARS 0)

2021 ◽  
Vol 11 (1) ◽  
pp. 01-11
Author(s):  
Youssef Fakir ◽  
Chaima Ahle Touateb ◽  
Rachid Elayachi

In the last decade, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analysed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose an algorithm, for mining closed itemsets, with the construction of an it-tree. This algorithm is compared with the DCI (direct counting & intersect) algorithm based on min support and computing time. CHARM is not memery-efficient. It needs to store all closed itemsets in the memory. The lower min-sup is, the more frequent closed itemsets there are so that the amounts of memory used by CHARM are increasing.


2021 ◽  
Vol 5 (2) ◽  
pp. 359-368
Author(s):  
Gede Aditra Pradnyana ◽  
Arif Djunaidy

Documents clustering based on frequent itemsets can be regarded a new method of documents clustering which is aimed to overcome curse of dimensionality of items produced by documents being clustered. The Maximum Capturing (MC) technique is an algorithm of documents clustering based on frequent itemsets that is capable of producing a better clustering quality in compared to other similar algorithms. However, since the maximum capturing technique employed frequent itemsets, it still suffers from such several weaknesses as the emergence of items redundancy that may still cause curse of dimensionality, difficult to determine the minimum support value from a set of documents to be clustered, and no weighting on items incurred to the resulting frequent itemsets.  To cope with those various weaknesses, in this research, an algorithm of documents clustering based on weighted top-k closed frequent itemsets, which is called as Weighted Maximum Capturing (WMC) algorithm, is developed. The proposed algorithm involves the frequent pattern tree algorithm to mine closed frequent itemsets from a set of documents without specifying the minimum support value of items to be generated.  Experimental results showed that improvement on the resulting clustering accuracy was produced. The resulting average values of F-measure of 0.713 and purity of 0.721 with improvement ratio of 1.4% for F-measure and 2% for purity.  Nevertheless, results of the scalability test showed very significant improvement.  The WMC algorithm only requires the average computing time of 623.77 minutes, 518.05 minutes faster than the average computing time required by the MC algorithm.


2021 ◽  
Vol 7 (2) ◽  
Author(s):  
Huy Quang Pham, Duc Tran, Ninh Bao Duong, Philippe Fournier-Viger, Alioune Ngom

Frequent itemset (FI) mining is an interesting data mining task. Instead of directly mining the FIs from data it is preferred to mine only the closed frequent itemsets (CFIs) first and then extract the FIs for each CFI. However, some algorithms require the generators for each CFI in order to extract the FIs, leading to an extra cost. In this paper, we introduce an effective algorithm, called NUCLEAR, which can induce the FIs from the lattice of CFIs without the need of the generators. It can enumerate generators as well by similar fashion. Experimental results showed that NUCLEAR is effective as compared to previous studies, especially, the time for extracting the FIs is usually much smaller than that for mining the CFIs.


Author(s):  
Youssef Fakir ◽  
Chaima Ahle Touate ◽  
Rachid Elayachi ◽  
Mohamed Fakir

In the last decade, the amount of collected data, in various computer science applications, has grown considerably. These large volumes of data need to be analysed in order to extract useful hidden knowledge. This work focuses on association rule extraction. This technique is one of the most popular in data mining. Nevertheless, the number of extracted association rules is often very high, and many of them are redundant. In this paper, we propose an algorithm, for mining closed itemsets, with the construction of an it-tree. This algorithm is compared with the DCI (direct counting & intersect) algorithm based on min support and computing time. CHARM is not memery-efficient. It needs to store all closed itemsets in the memory. The lower min-sup is, the more frequent closed itemsets there are so that the amounts of memory used by CHARM are increasing.


2019 ◽  
Vol 892 ◽  
pp. 157-167
Author(s):  
Fatimah Audah Md Zaki ◽  
Nurul Fariza Zulkurnain

The task in mining closed frequent itemsets requires the algorithm to mine the frequent ones then determine its closure. The efficiency of closure computation is very important as it will determine the total mining time and the required memory. Over the years, many closure computation methods have been proposed to achieve these goals. However, to the best of our knowledge, there is no suitable method that can be adapted for algorithms that enumerate the rowset lattice, which is effective for biological datasets. Therefore, this paper proposed a method for computing closure compare with the method used in BVBUC algorithm method. Finally, BVBUC_I is proposed and the performances of these algorithms were evaluated using two synthetic datasets and three real datasets. The results of these tests proved the efficiency of the proposed method.


2019 ◽  
Vol 4 (3) ◽  
pp. 237
Author(s):  
Maliha Momtaz ◽  
Abu Ahmed Ferdaus ◽  
Chowdhury Farhan Ahmed ◽  
Mohammad Samiullah

2019 ◽  
Vol 4 (3) ◽  
pp. 237
Author(s):  
Abu Ahmed Ferdaus ◽  
Mohammad Samiullah ◽  
Chowdhury Farhan Ahmed ◽  
Maliha Momtaz

Algorithms ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 194
Author(s):  
Yaron Gonen ◽  
Ehud Gudes ◽  
Kirill Kandalov

The Map-Reduce (MR) framework has become a popular framework for developing new parallel algorithms for Big Data. Efficient algorithms for data mining of big data and distributed databases has become an important problem. In this paper we focus on algorithms producing association rules and frequent itemsets. After reviewing the most recent algorithms that perform this task within the MR framework, we present two new algorithms: one algorithm for producing closed frequent itemsets, and the second one for producing frequent itemsets when the database is updated and new data is added to the old database. Both algorithms include novel optimizations which are suitable to the MR framework, as well as to other parallel architectures. A detailed experimental evaluation shows the effectiveness and advantages of the algorithms over existing methods when it comes to large distributed databases.


Sign in / Sign up

Export Citation Format

Share Document