A review of associative classification mining

AbstractAssociative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper.

Download Full-text

Evaluating pattern restrictions for associative classifiers

Intelligent Data Analysis ◽

10.3233/ida-200011 ◽

2020 ◽

Vol 24 ◽

pp. 105-122

Author(s):

González-Méndez Andy ◽

Martín Diana ◽

Morales Eduardo ◽

García-Borroto Milton

Keyword(s):

Association Rule ◽

Experimental Comparison ◽

Rule Discovery ◽

Classification Models ◽

Associative Classification ◽

Association Rule Discovery ◽

Pattern Recognition Approach ◽

Associative Classifiers ◽

The Impact ◽

First Time

Associative classification is a pattern recognition approach that integrates classification and association rule discovery to build accurate classification models. These models are formed by a collection of contrast patterns that fulfill some restrictions. In this paper, we introduce an experimental comparison of the impact of using different restrictions in the classification accuracy. To the best of our knowledge, this is the first time that such analysis is performed, deriving some interesting findings about how restrictions impact on the classification results. Contrasting these results with previously published papers, we found that their conclusions could be unintentionally biased by the restrictions they used. We found, for example, that the jumping restriction could severely damage the pattern quality in the presence of dataset noise. We also found that the minimal support restriction has a different effect in the accuracy of two associative classifiers, therefore deciding which one is the best depends on the support value. This paper opens some interesting lines of research, mainly in the creation of new restrictions and new pattern types by joining different restrictions.

Download Full-text

MAC: A Multiclass Associative Classification Algorithm

Journal of Information & Knowledge Management ◽

10.1142/s0219649212500116 ◽

2012 ◽

Vol 11 (02) ◽

pp. 1250011 ◽

Cited By ~ 23

Author(s):

Neda Abdelhamid ◽

Aladdin Ayesh ◽

Fadi Thabtah ◽

Samad Ahmadi ◽

Wael Hadi

Keyword(s):

Data Mining ◽

Rule Induction ◽

Classification Systems ◽

Data Repository ◽

Traditional Learning ◽

Data Sets ◽

Learning Approaches ◽

Associative Classification ◽

Data Mining Approach ◽

Association Rule Discovery

Associative classification (AC) is a data mining approach that uses association rule discovery methods to build classification systems (classifiers). Several research studies reveal that AC normally generates higher accurate classifiers than classic classification data mining approaches such as rule induction, probabilistic and decision trees. This paper proposes a new multiclass AC algorithm called MAC. The proposed algorithm employs a novel method for building the classifier that normally reduces the resulting classifier size in order to enable end-user to more understand and maintain it. Experimentations against 19 different data sets from the UCI data repository and using different common AC and traditional learning approaches have been conducted with reference to classification accuracy and the number of rules derived. The results show that the proposed algorithm is able to derive higher predictive classifiers than rule induction (RIPPER) and decision tree (C4.5) algorithms and very competitive to a known AC algorithm named MCAR. Furthermore, MAC is also able to produce less number of rules than MCAR in normal circumstances (standard support and confidence thresholds) and in sever circumstances (low support and confidence thresholds) and for most of the data sets considered in the experiments.

Download Full-text

Associative Classification Approaches: Review and Comparison

Journal of Information & Knowledge Management ◽

10.1142/s0219649214500270 ◽

2014 ◽

Vol 13 (03) ◽

pp. 1450027 ◽

Cited By ~ 23

Author(s):

Neda Abdelhamid ◽

Fadi Thabtah

Keyword(s):

Association Rule ◽

Rule Learning ◽

Learning Rule ◽

Data Representation ◽

Associative Classification ◽

Data Mining Approach ◽

Association Rule Discovery ◽

Multi Class Classification ◽

Rule Pruning ◽

New Research

Associative classification (AC) is a promising data mining approach that integrates classification and association rule discovery to build classification models (classifiers). In the last decade, several AC algorithms have been proposed such as Classification based Association (CBA), Classification based on Predicted Association Rule (CPAR), Multi-class Classification using Association Rule (MCAR), Live and Let Live (L3) and others. These algorithms use different procedures for rule learning, rule sorting, rule pruning, classifier building and class allocation for test cases. This paper sheds the light and critically compares common AC algorithms with reference to the abovementioned procedures. Moreover, data representation formats in AC mining are discussed along with potential new research directions.

Download Full-text

A New Classification Based on Association Algorithm

Journal of Information & Knowledge Management ◽

10.1142/s0219649210002486 ◽

2010 ◽

Vol 09 (01) ◽

pp. 55-64 ◽

Cited By ~ 12

Author(s):

Fadi Thabtah ◽

Qazafi Mahmood ◽

Lee McCluskey ◽

Hussein Abdel-Jaber

Keyword(s):

Data Mining ◽

Association Rule ◽

Prediction Method ◽

Computational Time ◽

Rule Discovery ◽

Associative Classification ◽

Classification Problems ◽

Association Rule Discovery ◽

Similar Class ◽

Class Labels

Associative classification is a branch in data mining that employs association rule discovery methods in classification problems. In this paper, we introduce a novel data mining method called Looking at the Class (LC), which can be utilised in associative classification approach. Unlike known algorithms in associative classification such as Classification based on Association rule (CBA), which combine disjoint itemsets regardless of their class labels in the training phase, our method joins only itemsets with similar class labels. This saves too many unnecessary itemsets combining during the learning step, and consequently results in massive saving in computational time and memory. Moreover, a new prediction method that utilises multiple rules to make the prediction decision is also developed in this paper. The experimental results on different UCI datasets reveal that LC algorithm outperformed CBA with respect to classification accuracy, memory usage, and execution time on most datasets we consider.

Download Full-text

Rule Preference Effect in Associative Classification Mining

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001281 ◽

2006 ◽

Vol 05 (01) ◽

pp. 13-20 ◽

Cited By ~ 8

Author(s):

Fadi Thabtah

Keyword(s):

Ranking Method ◽

Benchmark Problems ◽

Data Sets ◽

Associative Classification ◽

Rule Mining ◽

Ranking Methods ◽

Associative Classifiers ◽

The Impact ◽

Distribution Frequency

Classification based on association rule mining, also known as associative classification, is a promising approach in data mining that builds accurate classifiers. In this paper, a rule ranking process within the associative classification approach is investigated. Specifically, two common rule ranking methods in associative classification are compared with reference to their impact on accuracy. We also propose a new rule ranking procedure that adds more tie breaking conditions to the existing methods in order to reduce rule random selection. In particular, our method looks at the class distribution frequency associated with the tied rules and favours those that are associated with the majority class. We compare the impact of the proposed rule ranking method and two other methods presented in associative classification against 14 highly dense classification data sets. Our results indicate the effectiveness of the proposed rule ranking method on the quality of the resulting classifiers for the majority of the benchmark problems, which we consider. This provides evidence that adding more appropriate constraints to break ties between rules positively affects the predictive power of the resulting associative classifiers.

Download Full-text

Feature and Language Selection in Temporal Symbolic Regression for Interpretable Air Quality Modelling

Algorithms ◽

10.3390/a14030076 ◽

2021 ◽

Vol 14 (3) ◽

pp. 76

Author(s):

Estrella Lucena-Sánchez ◽

Guido Sciavicco ◽

Ionel Eduard Stan

Keyword(s):

Air Quality ◽

Fundamental Problem ◽

Multivariate Time Series ◽

Quality Data ◽

Data Sets ◽

Air Quality Modelling ◽

Regression Problem ◽

Car Traffic ◽

Pollutant Concentrations ◽

Language Selection

Air quality modelling that relates meteorological, car traffic, and pollution data is a fundamental problem, approached in several different ways in the recent literature. In particular, a set of such data sampled at a specific location and during a specific period of time can be seen as a multivariate time series, and modelling the values of the pollutant concentrations can be seen as a multivariate temporal regression problem. In this paper, we propose a new method for symbolic multivariate temporal regression, and we apply it to several data sets that contain real air quality data from the city of Wrocław (Poland). Our experiments show that our approach is superior to classical, especially symbolic, ones, both in statistical performances and the interpretability of the results.

Download Full-text

Extraction of knowledge on protein--protein interaction by association rule discovery

Bioinformatics ◽

10.1093/bioinformatics/18.5.705 ◽

2002 ◽

Vol 18 (5) ◽

pp. 705-714 ◽

Cited By ~ 53

Author(s):

T. Oyama ◽

K. Kitano ◽

K. Satou ◽

T. Ito

Keyword(s):

Protein Interaction ◽

Association Rule ◽

Rule Discovery ◽

Protein Protein Interaction ◽

Association Rule Discovery

Download Full-text

ASSOCIATIVE CLASSIFICATION OF MAMMOGRAMS BASED ON PARALLEL MINING OF IMAGE BLOCKS

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237212500470 ◽

2012 ◽

Vol 24 (06) ◽

pp. 513-524

Author(s):

Mohsen Alavash Shooshtari ◽

Keivan Maghooli ◽

Kambiz Badie

Keyword(s):

Association Rules ◽

Classification Systems ◽

Classification Model ◽

Automated Classification ◽

Associative Classification ◽

Classification Problems ◽

Association Rules Mining ◽

Parallel Mining ◽

Transactional Databases ◽

Unique Decision

One of the main objectives of data mining as a promising multidisciplinary field in computer science is to provide a classification model to be used for decision support purposes. In the medical imaging domain, mammograms classification is a difficult diagnostic task which calls for development of automated classification systems. Associative classification, as a special case of association rules mining, has been adopted in classification problems for years. In this paper, an associative classification framework based on parallel mining of image blocks is proposed to be used for mammograms discrimination. Indeed, association rules mining is applied to a commonly used mammography image database to classify digital mammograms into three categories, namely normal, benign and malign. In order to do so, first images are preprocessed and then features are extracted from non-overlapping image blocks and discretized for rule discovery. Association rules are then discovered through parallel mining of transactional databases which correspond to the image blocks, and finally are used within a unique decision-making scheme to predict the class of unknown samples. Finally, experiments are conducted to assess the effectiveness of the proposed framework. Results show that the proposed framework proved successful in terms of accuracy, precision, and recall, and suggest that the framework could be used as the core of any future associative classifier to support mammograms discrimination.

Download Full-text

Spatial and temporal distribution of slip for the 1992 Landers, California, earthquake

Bulletin of the Seismological Society of America ◽

10.1785/bssa0840030668 ◽

1994 ◽

Vol 84 (3) ◽

pp. 668-691 ◽

Cited By ~ 44

Author(s):

David J. Wald ◽

Thomas H. Heaton

Keyword(s):

Strike Slip ◽

Quality Data ◽

Data Sets ◽

Rupture Velocity ◽

Frequency Range ◽

Fault Surface ◽

Landers Earthquake ◽

Finite Fault ◽

Multiple Data Sets ◽

Fault Length

Abstract We have determined a source rupture model for the 1992 Landers earthquake (MW 7.2) compatible with multiple data sets, spanning a frequency range from zero to 0.5 Hz. Geodetic survey displacements, near-field and regional strong motions, broadband teleseismic waveforms, and surface offset measurements have been used explicitly to constrain both the spatial and temporal slip variations along the model fault surface. Our fault parameterization involves a variable-slip, multiple-segment, finite-fault model which treats the diverse data sets in a self-consistent manner, allowing them to be inverted both independently and in unison. The high-quality data available for the Landers earthquake provide an unprecedented opportunity for direct comparison of rupture models determined from independent data sets that sample both a wide frequency range and a diverse spatial station orientation with respect to the earthquake slip and radiation pattern. In all models, consistent features include the following: (1) similar overall dislocation patterns and amplitudes with seismic moments of 7 to 8 × 1026 dyne-cm (seismic potency of 2.3 to 2.7 km3); (2) very heterogeneous, unilateral strike slip distributed over a fault length of 65 km and over a width of at least 15 km, though slip is limited to shallower regions in some areas; (3) a total rupture duration of 24 sec and an average rupture velocity of 2.7 km/sec; and (4) substantial variations of slip with depth relative to measured surface offsets. The extended rupture length and duration of the Landers earthquake also allowed imaging of the propagating rupture front with better resolution than for those of prior shorter-duration, strike-slip events. Our imaging allows visualization of the rupture evolution, including local differences in slip durations and variations in rupture velocity. Rupture velocity decreases markedly at shallow depths, as well as near regions of slip transfer from one fault segment to the next, as rupture propagates northwestward along the multiply segmented fault length. The rupture front slows as it reaches the northern limit of the Johnson Valley/Landers faults where slip is transferred to the southern Homestead Valley fault; an abrupt acceleration is apparent following the transfer. This process is repeated, and is more pronounced, as slip is again passed from the northern Homestead Valley fault to the Emerson fault. Although the largest surface offsets were observed at the northern end of the rupture, our modeling indicates that substantial rupture was also relatively shallow (less than 10 km) in this region.

Download Full-text