scholarly journals Coverage-Based Classification Using Association Rule Mining

2020 ◽  
Vol 10 (20) ◽  
pp. 7013
Author(s):  
Jamolbek Mattiev ◽  
Branko Kavsek

Building accurate and compact classifiers in real-world applications is one of the crucial tasks in data mining nowadays. In this paper, we propose a new method that can reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier that selects “strong” class association rules based on overall coverage of the learning set. The advantage of the proposed classifier is that it generates significantly smaller rules on bigger datasets compared to traditional classifiers while maintaining the classification accuracy. We also discuss how the overall coverage of such classifiers affects their classification accuracy. Performed experiments measuring classification accuracy, number of classification rules and other relevance measures such as precision, recall and f-measure on 12 real-life datasets from the UCI ML repository (Dua, D.; Graff, C. UCI Machine Learning Repository. Irvine, CA: University of California, 2019) show that our method was comparable to 8 other well-known rule-based classification algorithms. It achieved the second-highest average accuracy (84.9%) and the best result in terms of average number of rules among all classification methods. Although not achieving the best results in terms of classification accuracy, our method proved to be producing compact and understandable classifiers by exhaustively searching the entire example space.

Author(s):  
Jamolbek Mattiev ◽  
Branko Kavsek

Huge amounts of data are being collected and analyzed nowadays. By using the popular rule-learning algorithms, the number of rule discovered on those ?big? datasets can easily exceed thousands. To produce compact, understandable and accurate classifiers, such rules have to be grouped and pruned, so that only a reasonable number of them are presented to the end user for inspection and further analysis. In this paper, we propose new methods that are able to reduce the number of class association rules produced by ?classical? class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose new associative classifiers, called DC, DDC and CDC, that use distance-based agglomerative hierarchical clustering as a post-processing step to reduce the number of its rules, and in the rule-selection step, we use different strategies (based on database coverage and cluster center) for each algorithm. Experimental results performed on selected datasets from the UCI ML repository show that our classifiers are able to learn classifiers containing significantly fewer rules than state-of-the-art rule learning algorithms on datasets with a larger number of examples. On the other hand, the classification accuracy of the proposed classifiers is not significantly different from state-of-the-art rule-learners on most of the datasets.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Associative Classification (AC) or Class Association Rule (CAR) mining is a very efficient method for the classification problem. It can build comprehensible classification models in the form of a list of simple IF-THEN classification rules from the available data. In this paper, we present a new, and improved discrete version of the Crow Search Algorithm (CSA) called NDCSA-CAR to mine the Class Association Rules. The goal of this article is to improve the data classification accuracy and the simplicity of classifiers. The authors applied the proposed NDCSA-CAR algorithm on eleven benchmark dataset and compared its result with traditional algorithms and recent well known rule-based classification algorithms. The experimental results show that the proposed algorithm outperformed other rule-based approaches in all evaluated criteria.


Author(s):  
Carson K.-S. Leung ◽  
Fan Jiang ◽  
Edson M. Dela Cruz ◽  
Vijay Sekar Elango

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.


2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Siti Sakira Kamaruddin ◽  
Yuhanis Yusof ◽  
Husniza Husni ◽  
Mohammad Hayel Al Refai

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.


2014 ◽  
Vol 8 (1) ◽  
pp. 303-307 ◽  
Author(s):  
Zhonglin Zhang ◽  
Zongcheng Liu ◽  
Chongyu Qiao

A method of tendency mining in dynamic association rule based on compatibility feature vector SVM classifier is proposed. Firstly, the class association rule set named CARs is mined by using the method of tendency mining in dynamic association rules. Secondly, the algorithm of SVM is used to construct the classifier based on compatibility feature vector to classify the obtained CARs taking advantage when dealing with high complex data. It uses a method based on judging rules’ weight to construct the model. At last, the method is compared with the traditional methods with respect to the mining accuracy. The method can solve the problem of high time complexity and have a higher accuracy than the traditional methods which is helpful to make mining dynamic association rules more accurate and effective. By analyzing the final results, it is proved that the method has lower complexity and higher classification accuracy.


Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 386
Author(s):  
Şahan Yoruç Selçuk ◽  
Perin Ünal ◽  
Özlem Albayrak ◽  
Moez Jomâa

Digital twins, virtual representations of real-life physical objects or processes, are becoming widely used in many different industrial sectors. One of the main uses of digital twins is predictive maintenance, and these technologies are being adapted to various new applications and datatypes in many industrial processes. The aim of this study was to propose a methodology to generate synthetic vibration data using a digital twin model and a predictive maintenance workflow, consisting of preprocessing, feature engineering, and classification model training, to classify faulty and healthy vibration data for state estimation. To assess the success of the proposed workflow, the mentioned steps were applied to a publicly available vibration dataset and the synthetic data from the digital twin, using five different state-of-the-art classification algorithms. For several of the classification algorithms, the accuracy result for the classification of healthy and faulty data achieved on the public dataset reached approximately 86%, and on the synthetic data, approximately 98%. These results showed the great potential for the proposed methodology, and future work in the area.


2013 ◽  
Vol 9 (1) ◽  
pp. 1-27 ◽  
Author(s):  
Harihar Kalia ◽  
Satchidananda Dehuri ◽  
Ashish Ghosh

Association rule mining is one of the fundamental tasks of data mining. The conventional association rule mining algorithms, using crisp set, are meant for handling Boolean data. However, in real life quantitative data are voluminous and need careful attention for discovering knowledge. Therefore, to extract association rules from quantitative data, the dataset at hand must be partitioned into intervals, and then converted into Boolean type. In the sequel, it may suffer with the problem of sharp boundary. Hence, fuzzy association rules are developed as a sharp knife to solve the aforesaid problem by handling quantitative data using fuzzy set. In this paper, the authors present an updated survey of fuzzy association rule mining procedures along with a discussion and relevant pointers for further research.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Associative Classification (AC) or Class Association Rule (CAR) mining is a very efficient method for the classification problem. It can build comprehensible classification models in the form of a list of simple IF-THEN classification rules from the available data. In this paper, we present a new, and improved discrete version of the Crow Search Algorithm (CSA) called NDCSA-CAR to mine the Class Association Rules. The goal of this article is to improve the data classification accuracy and the simplicity of classifiers. The authors applied the proposed NDCSA-CAR algorithm on eleven benchmark dataset and compared its result with traditional algorithms and recent well known rule-based classification algorithms. The experimental results show that the proposed algorithm outperformed other rule-based approaches in all evaluated criteria.


2018 ◽  
Vol 14 (2) ◽  
pp. 37-59 ◽  
Author(s):  
Ahmet Cumhur Öztürk ◽  
Belgin Ergenç

This article describes how association rule mining is used for extracting relations between items in transactional databases and is beneficial for decision-making. However, association rule mining can pose a threat to the privacy of the knowledge when the data is shared without hiding the confidential association rules of the data owner. One of the ways hiding an association rule from the database is to conceal the itemsets (co-occurring items) from which the sensitive association rules are generated. These sensitive itemsets are sanitized by the itemset hiding processes. Most of the existing solutions consider single support thresholds and assume that the databases are static, which is not true in real life. In this article, the authors propose a novel itemset hiding algorithm designed for the dynamic database environment and consider multiple itemset support thresholds. Performance comparisons of the algorithm is done with two dynamic algorithms on six different databases. Findings show that their dynamic algorithm is more efficient in terms of execution time and information loss and guarantees to hide all sensitive itemsets.


2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Venkatapathy Umarani ◽  
Muthusamy Punithavalli

AbstractThe discovery of association rules is an important and challenging data mining task. Most of the existing algorithms for finding association rules require multiple passes over the entire database, and I/O overhead incurred is extremely high for very large databases. An obvious approach to reduce the complexity of association rule mining is sampling. In recent times, several sampling-based approaches have been developed for speeding up the process of association rule mining. A proficient progressive sampling-based approach is presented for mining association rules from large databases. At first, frequent itemsets are mined from an initial sample and subsequently, the negative border is computed from the mined frequent itemsets. Based on the support computed for the midpoint itemset in the sorted negative border, the sample size is either increased or association rules are mined from it. In this paper, we have presented an extensive analysis of the progressive sampling-based approach with different real life datasets and, in addition, the performance of the approach is evaluated with the well-known association rule mining algorithm, Apriori. The experimental results show that accuracy and computation time of the progressive sampling-based approach is effectively improved in mining of association rules from the real life datasets.


Sign in / Sign up

Export Citation Format

Share Document