A Novel Efficient Classification Algorithm Based on Class Association Rules

2011 ◽  
Vol 135-136 ◽  
pp. 106-110
Author(s):  
Shou Juan Zhang ◽  
Quan Zhou

A novel classification algorithm based on class association rules is proposed in this paper. Firstly, the algorithm mines frequent items and rules only in one phase. Then, the algorithm ranks rules that pass the support and confidence thresholds using a global sorting method according to a series of parameters, including confidence, support, antecedent cardinality, class distribution frequency, item row order and rule antecedent length. Classifier building is based on rule items that do not overlap in the training phase and rule items that each training instance is covered by only a single rule. Experimental results on the 8 datasets from UCI ML Repository show that the proposed algorithm is highly competitive when compared with the C4.5,CBA,CMAR and CPAR algorithms in terms of classification accuracy and efficiency. This algorithm can offer an available associative classification technique for data mining.

Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


2014 ◽  
Vol 4 (4) ◽  
pp. 61-72
Author(s):  
Saed A. Muqasqas ◽  
Qasem A. Al Radaideh ◽  
Bilal A. Abul-Huda

Data classification as one of the main tasks of data mining has an important role in many fields. Classification techniques differ mainly in the accuracy of their models, which depends on the method adopted during the learning phase. Several researchers attempted to enhance the classification accuracy by combining different classification methods in the same learning process; resulting in a hybrid-based classifier. In this paper, the authors propose and build a hybrid classifier technique based on Naïve Bayes and C4.5 classifiers. The main goal of the proposed model is to reduce the complexity of the NBTree technique, which is a well known hybrid classification technique, and to improve the overall classification accuracy. Thirty six samples of UCI datasets were used in evaluation. Results have shown that the proposed technique significantly outperforms the NBTree technique and some other classifiers proposed in the literature in term of classification accuracy. The proposed classification approach yields an overall average accuracy equal to 85.70% over the 36 datasets.


Computation ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 99
Author(s):  
Pannapa Changpetch ◽  
Apasiri Pitpeng ◽  
Sasiprapa Hiriote ◽  
Chumpol Yuangyai

In this study, we designed a framework in which three techniques—classification tree, association rules analysis (ASA), and the naïve bayes classifier—were combined to improve the performance of the latter. A classification tree was used to discretize quantitative predictors into categories and ASA was used to generate interactions in a fully realized way, as discretized variables and interactions are key to improving the classification accuracy of the naïve Bayes classifier. We applied our methodology to three medical datasets to demonstrate the efficacy of the proposed method. The results showed that our methodology outperformed the existing techniques for all the illustrated datasets. Although our focus here was on medical datasets, our proposed methodology is equally applicable to datasets in many other areas.


2016 ◽  
Vol 2016 ◽  
pp. 1-14 ◽  
Author(s):  
Stamatis Karlos ◽  
Nikos Fazakis ◽  
Sotiris Kotsiantis ◽  
Kyriakos Sgarbas

Classification is one of the most important tasks of data mining techniques, which have been adopted by several modern applications. The shortage of enough labeled data in the majority of these applications has shifted the interest towards using semisupervised methods. Under such schemes, the use of collected unlabeled data combined with a clearly smaller set of labeled examples leads to similar or even better classification accuracy against supervised algorithms, which use labeled examples exclusively during the training phase. A novel approach for increasing semisupervised classification using Cascade Classifier technique is presented in this paper. The main characteristic of Cascade Classifier strategy is the use of a base classifier for increasing the feature space by adding either the predicted class or the probability class distribution of the initial data. The classifier of the second level is supplied with the new dataset and extracts the decision for each instance. In this work, a self-trained NB∇C4.5 classifier algorithm is presented, which combines the characteristics of Naive Bayes as a base classifier and the speed of C4.5 for final classification. We performed an in-depth comparison with other well-known semisupervised classification methods on standard benchmark datasets and we finally reached to the point that the presented technique has better accuracy in most cases.


EP Europace ◽  
2020 ◽  
Vol 22 (Supplement_1) ◽  
Author(s):  
W R Chiou ◽  
M C Hsieh ◽  
H N Chuang ◽  
C C Huang ◽  
J Y Chuang ◽  
...  

Abstract Background Novel oral anticoagulants (NOAC) is important in preventing thromboembolism in atrial fibrillation (AF) patients. Bleeding risk was evaluated by HAS-BLED score traditionally. Data mining is a relatively new discipline that has sprung up at the confluence of several other disciplines, driven primarily by the growth of large databases.  Purpose This study aimed to find a useful predictive model by data mining to assess the risk of rivaroxaban, an antithrombotic drug that causes bleeding in AF patients. The seven parameters of the HAS-BLED score were used to predict the effect of rivaroxaban on bleeding tendency in AF patients and may provide clinicians with appropriate treatments to avoid complications from bleeding events and reduce the incidence of health damage. Methods Through conducting a multicenter retrospective study, we identified patients with AF who were treated with rivaroxaban for more than 1 month between December 1, 2011 and November 30, 2016. After preprocessing, the established data were used for training and testing of data mining models. This study evaluated four models, including association rules, neural networks, Bayesian classification, and decision trees. Result Of the 872 enrolled cases, 432 were in any of the bleeding groups and 432 were in the non-bleeding randomized control group. After comparing the overall classification accuracy, omission error and over-prediction error, the decision tree proved to be the most accurate model for bleeding prediction. The overall classification accuracy is 77%, the omission error is 15%, the over-prediction error is 21.9%, and the AUC score is 0.84. The results show that the model has good discriminative ability and visibility of decision rules. Conclusion Among several data mining models, decision tree proved to be the most accurate model for bleeding prediction. The conclusion of this study can be used as a reference for supporting decision making before anticoagulation treatment and suggest future research to compare efficacy of bleeding prediction between HAS-BLED score and decision tree. Data mining comparison Model Omission error Commission error Overall accuracy AUC score Ranking Decision tree 15.0% 21.90% 77.00% 0.84 1 Association rules 16.8% 27.20% 76.50% 0.81 2 Neural networks 12.0% 26.40% 78.20% 0.83 3 Bayesian classification 16.1% 27.50% 76.50% 0.83 4


2014 ◽  
Vol 1 (1) ◽  
pp. 339-342
Author(s):  
Mirela Danubianu ◽  
Dragos Mircea Danubianu

AbstractSpeech therapy can be viewed as a business in logopaedic area that aims to offer services for correcting language. A proper treatment of speech impairments ensures improved efficiency of therapy, so, in order to do that, a therapist must continuously learn how to adjust its therapy methods to patient's characteristics. Using Information and Communication Technology in this area allowed collecting a lot of data regarding various aspects of treatment. These data can be used for a data mining process in order to find useful and usable patterns and models which help therapists to improve its specific education. Clustering, classification or association rules can provide unexpected information which help to complete therapist's knowledge and to adapt the therapy to patient's needs.


Sign in / Sign up

Export Citation Format

Share Document