Class Association Rule Pada Metode Associative Classification

Eka Karyawati; Edi Winarko

doi:10.22146/ijccs.5207

Class Association Rule Pada Metode Associative Classification

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.5207 ◽

2011 ◽

Vol 5 (3) ◽

pp. 17

Author(s):

Eka Karyawati ◽

Edi Winarko

Keyword(s):

Threshold Value ◽

Frequent Itemsets ◽

Main Memory ◽

Frequent Itemset ◽

Frequent Pattern ◽

Classification Rule ◽

Rule Generation ◽

Associative Classification ◽

Support Threshold ◽

Data Location

Frequent patterns (itemsets) discovery is an important problem in associative classification rule mining. Differents approaches have been proposed such as the Apriori-like, Frequent Pattern (FP)-growth, and Transaction Data Location (Tid)-list Intersection algorithm. This paper focuses on surveying and comparing the state of the art associative classification techniques with regards to the rule generation phase of associative classification algorithms. This phase includes frequent itemsets discovery and rules mining/extracting methods to generate the set of class association rules (CARs). There are some techniques proposed to improve the rule generation method. A technique by utilizing the concepts of discriminative power of itemsets can reduce the size of frequent itemset. It can prune the useless frequent itemsets. The closed frequent itemset concept can be utilized to compress the rules to be compact rules. This technique may reduce the size of generated rules. Other technique is in determining the support threshold value of the itemset. Specifying not single but multiple support threshold values with regard to the class label frequencies can give more appropriate support threshold value. This technique may generate more accurate rules. Alternative technique to generate rule is utilizing the vertical layout to represent dataset. This method is very effective because it only needs one scan over dataset, compare with other techniques that need multiple scan over dataset. However, one problem with these approaches is that the initial set of tid-lists may be too large to fit into main memory. It requires more sophisticated techniques to compress the tid-lists.

Download Full-text

Towards Scalable Algorithm for Closed Itemset Mining in High-Dimensional Data

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v8.i2.pp487-494 ◽

2017 ◽

Vol 8 (2) ◽

pp. 487

Author(s):

Fatimah Audah Md. Zaki ◽

Nurul Fariza Zulkurnain

Keyword(s):

High Dimensional Data ◽

Search Tree ◽

Frequent Itemsets ◽

Main Memory ◽

Frequent Itemset ◽

High Dimensional ◽

Major Drawback ◽

Scalable Algorithm ◽

Support Threshold ◽

Closed Frequent Itemset

Mining frequent itemsets from large dataset has a major drawback in which the explosive number of itemsets requires additional mining process which might filter the interesting ones. Therefore, as the solution, the concept of closed frequent itemset was introduced that is lossless and condensed representation of all the frequent itemsets and their corresponding supports. Unfortunately, many algorithms are not memory-efficient since it requires the storage of closed itemsets in main memory for duplication checks. This paper presents BFF, a scalable algorithm for discovering closed frequent itemsets from high-dimensional data. Unlike many well-known algorithms, BFF traverses the search tree in breadth-first manner resulted to a minimum use of memory and less running time. The tests conducted on a number of microarray datasets show that the performance of this algorithm improved significantly as the support threshold decreases which is crucial in generating more interesting rules.

Download Full-text

TKFIM: Top-K frequent itemset mining technique based on equivalence classes

PeerJ Computer Science ◽

10.7717/peerj-cs.385 ◽

2021 ◽

Vol 7 ◽

pp. e385

Author(s):

Saood Iqbal ◽

Abdul Shahid ◽

Muhammad Roman ◽

Zahid Khan ◽

Shaha Al-Otaibi ◽

...

Keyword(s):

State Of The Art ◽

Threshold Value ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Large Dataset ◽

Mining Technique ◽

Support Threshold ◽

Frequent Itemsets Mining ◽

And Performance ◽

The Given

Frequently used items mining is a significant subject of data mining studies. In the last ten years, due to innovative development, the quantity of data has grown exponentially. For frequent Itemset (FIs) mining applications, it imposes new challenges. Misconceived information may be found in recent algorithms, including both threshold and size based algorithms. Threshold value plays a central role in generating frequent itemsets from the given dataset. Selecting a support threshold value is very complicated for those unaware of the dataset’s characteristics. The performance of algorithms for finding FIs without the support threshold is, however, deficient due to heavy computation. Therefore, we have proposed a method to discover FIs without the support threshold, called Top-k frequent itemsets mining (TKFIM). It uses class equivalence and set-theory concepts for mining FIs. The proposed procedure does not miss any FIs; thus, accurate frequent patterns are mined. Furthermore, the results are compared with state-of-the-art techniques such as Top-k miner and Build Once and Mine Once (BOMO). It is found that the proposed TKFIM has outperformed the results of these approaches in terms of execution and performance, achieving 92.70, 35.87, 28.53, and 81.27 percent gain on Top-k miner using Chess, Mushroom, and Connect and T1014D100K datasets, respectively. Similarly, it has achieved a performance gain of 97.14, 100, 78.10, 99.70 percent on BOMO using Chess, Mushroom, Connect, and T1014D100K datasets, respectively. Therefore, it is argued that the proposed procedure may be adopted on a large dataset for better performance.

Download Full-text

Frequent Itemset Mining Using LP-Growth Algorithm Based on Multiple Minimum Support Threshold Value (Multiple Item Support Frequent Pattern Growth)

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8046 ◽

2019 ◽

Vol 16 (4) ◽

pp. 1365-1372

Author(s):

M Sinthuja ◽

N Puviarasan ◽

P Aruna

Keyword(s):

Threshold Value ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Minimum Support ◽

Itemset Mining ◽

Multiple Item ◽

Pattern Growth ◽

Support Threshold

Download Full-text

Closed-Itemset Incremental-Mining Problem

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch029 ◽

2011 ◽

pp. 150-153

Author(s):

Luminita Dumitriu

Keyword(s):

Association Rules ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Incremental Mining ◽

Minimum Support ◽

Confidence Threshold ◽

Support Threshold ◽

Mining Association Rules

Association rules, introduced by Agrawal, Imielinski and Swami (1993), provide useful means to discover associations in data. The problem of mining association rules in a database is defined as finding all the association rules that hold with more than a user-given minimum support threshold and a user-given minimum confidence threshold. According to Agrawal, Imielinski and Swami, this problem is solved in two steps: 1. Find all frequent itemsets in the database. 2. For each frequent itemset I, generate all the association rules I’ÞI\I’, where I’ÌI.

Download Full-text

Finding Similar Documents Using Frequent Pattern Mining Methods

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488519500041 ◽

2019 ◽

Vol 27 (01) ◽

pp. 73-96 ◽

Cited By ~ 1

Author(s):

Mohammad Karim Sohrabi ◽

Hossein Azgomi

Keyword(s):

Similarity Search ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Search Problem ◽

High Quality ◽

Massive Datasets ◽

Dynamic Selection ◽

Support Threshold

Various problems are just rising with regard to mining in massive datasets, among which finding similar documents can be pinpointed. The Shingling method converts this problem to a set-based problem. Some of existing methods have used min-hashing to compress the results already driven from the shingling method and then have exploited LSH method to find candidate pairs for similarity search from all pairs of documents. In this paper, an apriori-based method is proposed for finding similar documents based on frequent itemset mining approach. To this end, the apriori algorithm is modified and is customized for similarity search problem. Modeling the similarity search problem as a frequent pattern mining problem, using a modified version of apriori, and dynamic selection the minimum support threshold are the most important advantages of the proposed method, which lead to its appropriate execution time and high quality results. The proposed method finds similar documents in less time than the combined method and MCVM method because it generates fewer candidate pairs for finding similar documents. Furthermore, experimental results show the high quality of the answers of the proposed methods.

Download Full-text

Partition based Single Scan Method for Mining Frequent Item Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9237.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4917-4922

Keyword(s):

Unique Feature ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Minimum Support ◽

Itemset Mining ◽

Highly Sensitive ◽

Support Threshold ◽

Hidden Patterns ◽

The Cost ◽

Frequent Item Sets

Frequent Itemset mining (FIM) concept and limitations are explored in this paper, for the purpose of extracting unknown hidden patterns as itemsets from the transactional database. Since candidate generation and support calculations are the major tasks in FIM, the major limitations of FIM are tackled, (i) huge possible frequent itemsets are generated as candidates at each pass (ii) Data base scan at each pass to calculate the support of the generated itemsets (iii) generated itemsets are highly sensitive to the minimum support threshold. SS-FIM a single scan algorithm is to deal with the above limitations. However, several unnecessary itemsets are being hashed in the buckets. To overcome the limitations, a partition based approach is proposed in this paper. The proposed approach, PSSFIM, takes single scan of the database to identify frequent itemsets. The unique feature of PSSFIM allow to generate size of candidate itemsets independent on the minimum support. It allows the candidates in hash that are possible for frequent, which intuitively reduces the cost in terms of verifying the support of generated candidates. It is compared with SS-FIM and Apriori with the standard datasets. The results show that the PSSFIM is good at the comparison of SS-FIM and Apriori.

Download Full-text

IHAC: Incorporating Heuristics for Efficient Rule Generation & Rule Selection in Associative Classification

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500106 ◽

2021 ◽

Vol 20 (01) ◽

pp. 2150010

Author(s):

Parashu Ram Pal ◽

Pankaj Pathak ◽

Shkurte Luma-Osmani

Keyword(s):

Association Rules ◽

Search Space ◽

Classification Rule ◽

Rule Generation ◽

Classification Methods ◽

Associative Classification ◽

Rule Mining ◽

Rule Selection ◽

Speed Up ◽

Experimental Findings

Associations rule mining along with classification rule mining are both significant techniques of mining of knowledge in the area of knowledge discovery in massive databases stored in different geographic locations of the world. Based on such combination of these two, class association rules for mining or associative classification methods have been generated, which, in far too many cases, showed higher prediction accuracy than platitudinous conventional classifiers. Motivated by the study, in this paper, we proposed a new approach, namely IHAC (Incorporating Heuristics for efficient rule generation & rule selection in Associative Classification). First, it utilises the database to decrease the search space and then explicitly explores the potent class association rules from the optimised database. This also blends rule generation and classifier building to speed up the overall classifier construction cycle. Experimental findings showed that IHAC performs better than any further associative classification methods.

Download Full-text

PENENTUAN POLA YANG SERING MUNCUL UNTUK PENJUALAN PUPUK MENGGUNAKAN ALGORITMA FP-GROWTH

I N F O R M A T I K A ◽

10.36723/juri.v9i2.97 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1

Author(s):

Chandra Eri Firman

Keyword(s):

Data Mining ◽

Association Rule ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Pattern ◽

Pattern Growth

Aturan asosiasi dengan melakukan analisis suatu transaksi penjualan. Analisis transaksi penjualan bertujuan untuk merancang strategi yang efektif dengan memanfaatkan data transaksi penjualan produk pupuk yang dibeli oleh konsumen. Association rule adalah teknik data mining untuk mencari hubungan antar-item dalam suatu dataset yang ditentukan dengan menggunakan Algoritma FP-Growth. Frequent Pattern Growth (FP-Growth) adalah salah satu alternatif algoritma yang dapat digunakan untuk menentukan himpunan data yang paling sering muncul (frequent itemset) dalam sebuah kumpulan data. Algoritma FP-Growth menggunakan konsep pembangunan tree dalam pencarian frequent itemsets. Dari perhitungan nilai confidence dari rule yang dihasilkan menggunakan Rapidminer-studio 7.3.0. Kata Kunci : Data Mining, Assosiation Rule, FP-Growth, Penjualan Produk

Download Full-text

A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation

Data & Knowledge Engineering ◽

10.1016/j.datak.2007.05.006 ◽

2008 ◽

Vol 64 (1) ◽

pp. 171-197 ◽

Cited By ~ 24

Author(s):

Rafal Rak ◽

Lukasz Kurgan ◽

Marek Reformat

Keyword(s):

Classification Rule ◽

Rule Generation ◽

Associative Classification

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text