Towards Scalable Algorithm for Closed Itemset Mining in High-Dimensional Data

Fatimah Audah Md. Zaki; Nurul Fariza Zulkurnain

doi:10.11591/ijeecs.v8.i2.pp487-494

Towards Scalable Algorithm for Closed Itemset Mining in High-Dimensional Data

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v8.i2.pp487-494 ◽

2017 ◽

Vol 8 (2) ◽

pp. 487

Author(s):

Fatimah Audah Md. Zaki ◽

Nurul Fariza Zulkurnain

Keyword(s):

High Dimensional Data ◽

Search Tree ◽

Frequent Itemsets ◽

Main Memory ◽

Frequent Itemset ◽

High Dimensional ◽

Major Drawback ◽

Scalable Algorithm ◽

Support Threshold ◽

Closed Frequent Itemset

<p>Mining frequent itemsets from large dataset has a major drawback in which the explosive number of itemsets requires additional mining process which might filter the interesting ones. Therefore, as the solution, the concept of closed frequent itemset was introduced that is lossless and condensed representation of all the frequent itemsets and their corresponding supports. Unfortunately, many algorithms are not memory-efficient since it requires the storage of closed itemsets in main memory for duplication checks. This paper presents BFF, a scalable algorithm for discovering closed frequent itemsets from high-dimensional data. Unlike many well-known algorithms, BFF traverses the search tree in breadth-first manner resulted to a minimum use of memory and less running time. The tests conducted on a number of microarray datasets show that the performance of this algorithm improved significantly as the support threshold decreases which is crucial in generating more interesting rules.</p>

Download Full-text

Class Association Rule Pada Metode Associative Classification

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.5207 ◽

2011 ◽

Vol 5 (3) ◽

pp. 17

Author(s):

Eka Karyawati ◽

Edi Winarko

Keyword(s):

Threshold Value ◽

Frequent Itemsets ◽

Main Memory ◽

Frequent Itemset ◽

Frequent Pattern ◽

Classification Rule ◽

Rule Generation ◽

Associative Classification ◽

Support Threshold ◽

Data Location

Frequent patterns (itemsets) discovery is an important problem in associative classification rule mining. Differents approaches have been proposed such as the Apriori-like, Frequent Pattern (FP)-growth, and Transaction Data Location (Tid)-list Intersection algorithm. This paper focuses on surveying and comparing the state of the art associative classification techniques with regards to the rule generation phase of associative classification algorithms. This phase includes frequent itemsets discovery and rules mining/extracting methods to generate the set of class association rules (CARs). There are some techniques proposed to improve the rule generation method. A technique by utilizing the concepts of discriminative power of itemsets can reduce the size of frequent itemset. It can prune the useless frequent itemsets. The closed frequent itemset concept can be utilized to compress the rules to be compact rules. This technique may reduce the size of generated rules. Other technique is in determining the support threshold value of the itemset. Specifying not single but multiple support threshold values with regard to the class label frequencies can give more appropriate support threshold value. This technique may generate more accurate rules. Alternative technique to generate rule is utilizing the vertical layout to represent dataset. This method is very effective because it only needs one scan over dataset, compare with other techniques that need multiple scan over dataset. However, one problem with these approaches is that the initial set of tid-lists may be too large to fit into main memory. It requires more sophisticated techniques to compress the tid-lists.

Download Full-text

Closed-Itemset Incremental-Mining Problem

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch029 ◽

2011 ◽

pp. 150-153

Author(s):

Luminita Dumitriu

Keyword(s):

Association Rules ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Incremental Mining ◽

Minimum Support ◽

Confidence Threshold ◽

Support Threshold ◽

Mining Association Rules

Association rules, introduced by Agrawal, Imielinski and Swami (1993), provide useful means to discover associations in data. The problem of mining association rules in a database is defined as finding all the association rules that hold with more than a user-given minimum support threshold and a user-given minimum confidence threshold. According to Agrawal, Imielinski and Swami, this problem is solved in two steps: 1. Find all frequent itemsets in the database. 2. For each frequent itemset I, generate all the association rules I’ÞI\I’, where I’ÌI.

Download Full-text

Contorting high dimensional data for efficient main memory KNN processing

Proceedings of the 2003 ACM SIGMOD international conference on on Management of data - SIGMOD '03 ◽

10.1145/872757.872815 ◽

2003 ◽

Cited By ~ 25

Author(s):

Bin Cui ◽

Beng Chin Ooi ◽

Jianwen Su ◽

Kian-Lee Tan

Keyword(s):

High Dimensional Data ◽

Main Memory ◽

High Dimensional

Download Full-text

Partition based Single Scan Method for Mining Frequent Item Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9237.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4917-4922

Keyword(s):

Unique Feature ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Minimum Support ◽

Itemset Mining ◽

Highly Sensitive ◽

Support Threshold ◽

Hidden Patterns ◽

The Cost ◽

Frequent Item Sets

Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.

Download Full-text

TKFIM: Top-K frequent itemset mining technique based on equivalence classes

PeerJ Computer Science ◽

10.7717/peerj-cs.385 ◽

2021 ◽

Vol 7 ◽

pp. e385

Author(s):

Saood Iqbal ◽

Abdul Shahid ◽

Muhammad Roman ◽

Zahid Khan ◽

Shaha Al-Otaibi ◽

...

Keyword(s):

State Of The Art ◽

Threshold Value ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Large Dataset ◽

Mining Technique ◽

Support Threshold ◽

Frequent Itemsets Mining ◽

And Performance ◽

The Given

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.

Download Full-text

Multi-Objective Optimization for High-Dimensional Maximal Frequent Itemset Mining

Applied Sciences ◽

10.3390/app11198971 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8971

Author(s):

Yalong Zhang ◽

Wei Yu ◽

Xuan Ma ◽

Hisakazu Ogura ◽

Dongfen Ye

Keyword(s):

Big Data ◽

Optimal Solution ◽

Solution Space ◽

Frequent Itemsets ◽

Frequent Itemset ◽

High Dimensional ◽

Lethal Gene ◽

Multi Objective Optimization ◽

Multi Objective ◽

Evolution Algorithms

The solution space of a frequent itemset generally presents exponential explosive growth because of the high-dimensional attributes of big data. However, the premise of the big data association rule analysis is to mine the frequent itemset in high-dimensional transaction sets. Traditional and classical algorithms such as the Apriori and FP-Growth algorithms, as well as their derivative algorithms, are unacceptable in practical big data analysis in an explosive solution space because of their huge consumption of storage space and running time. A multi-objective optimization algorithm was proposed to mine the frequent itemset of high-dimensional data. First, all frequent 2-itemsets were generated by scanning transaction sets based on which new items were added in as the objects of population evolution. Algorithms aim to search for the maximal frequent itemset to gather more non-void subsets because non-void subsets of frequent itemsets are all properties of frequent itemsets. During the operation of algorithms, lethal gene fragments in individuals were recorded and eliminated so that individuals may resurge. Finally, the set of the Pareto optimal solution of the frequent itemset was gained. All non-void subsets of these solutions were frequent itemsets, and all supersets are non-frequent itemsets. Finally, the practicability and validity of the proposed algorithm in big data were proven by experiments.

Download Full-text

Indexing high-dimensional data for main-memory similarity search

Information Systems ◽

10.1016/j.is.2010.05.001 ◽

2010 ◽

Vol 35 (7) ◽

pp. 825-843 ◽

Cited By ~ 3

Author(s):

Xiaohui Yu ◽

Junfeng Dong

Keyword(s):

Similarity Search ◽

High Dimensional Data ◽

Main Memory ◽

High Dimensional

Download Full-text

The Research of Generation Algorithm of Frequent Itemsets in High-Dimensional Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.710.127 ◽

2015 ◽

Vol 710 ◽

pp. 127-131

Author(s):

Qing Chao Jiang

Keyword(s):

Association Rules ◽

High Efficiency ◽

High Dimensional Data ◽

Frequent Itemsets ◽

Boolean Matrix ◽

High Dimensional ◽

Rule Mining ◽

Key Factor ◽

And Performance ◽

The Times

In the mining of association rules, the generation of frequent itemsets is a key factor that influence the efficiency and performance of the algorithm. With the increase of data dimension, it is obvious that the traditional association rules mining algorithm can’t meet the demand of high dimensional data mining. On the basis of Apriori algorithm, we put forward Split Mtrix _Apriori algorithm in this paper. By generating the Boolean matrix of the database, Split Mtrix _Apriori algorithm decreased the times of scanning database when generating the frequent itemsets. With adopting grouping processing strategy in the Boolean matrix, the algorithm can still keep high efficiency in dealing with high-dimensional data.So Split Mtrix _Apriori improved the efficiency of association rule mining significantly.

Download Full-text

Frequent Itemset Mining in High Dimensional Data: A Review

Lecture Notes in Electrical Engineering - Computational Science and Technology ◽

10.1007/978-981-13-2622-6_32 ◽

2018 ◽

pp. 325-334

Author(s):

Fatimah Audah Md. Zaki ◽

Nurul Fariza Zulkurnain

Keyword(s):

High Dimensional Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

High Dimensional ◽

Itemset Mining

Download Full-text

Large Sample Covariance Matrices and High-Dimensional Data Analysis

10.1017/cbo9781107588080 ◽

2015 ◽

Cited By ~ 26

Author(s):

Jianfeng Yao ◽

Shurong Zheng ◽

Zhidong Bai

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

Covariance Matrices ◽

High Dimensional ◽

Large Sample ◽

Sample Covariance Matrices ◽

Sample Covariance ◽

High Dimensional Data Analysis

Download Full-text