scholarly journals An Exploration And Validation of Visual Factors in Understanding Classification Rule Sets

Author(s):  
Jun Yuan ◽  
Oded Nov ◽  
Enrico Bertini
2020 ◽  
Vol 27 (2) ◽  
pp. 353-374
Author(s):  
RICARDO P. BEAUSOLEIL

This paper presents an application of Tabu Search algorithm to association rule mining. We focus our attention specifically on classification rule mining, often called associative classification, where the consequent part of each rule is a class label. Our approach is based on seek a rule set handled as an individual. A Tabu search algorithm is used to search for Pareto-optimal rule sets with respect to some evaluation criteria such as accuracy and complexity. We apply a called Apriori algorithm for an association rules mining and then a multiobjective tabu search to a selection rules. We report experimental results where the effect of our multiobjective selection rules is examined for some well-known benchmark data sets from the UCI machine learning repository.


2013 ◽  
Vol 20 (05) ◽  
pp. 644-652
Author(s):  
ATTIYA KANWAL ◽  
SAHAR FAZAL ◽  
SOHAIL ASGHAR ◽  
Muhammad Naeem

Background: The pandemic of metabolic disorders is accelerating in the urbanized world posing huge burden to healthand economy. The key pioneer to most of the metabolic disorders is Diabetes Mellitus. A newly discovered form of diabetes is MaturityOnset Diabetes of the Young (MODY). MODY is a monogenic form of diabetes. It is inherited as autosomal dominant disorder. Till to date11 different MODY genes have been reported. Objective: This study aims to discover subgroups from the biological text documentsrelated to these genes in public domain database. Data Source: The data set was obtained from PubMed. Period: September-December,2011. Materials and Methodology: APRIORI-SD subgroup discovery algorithm is used for the task of discovering subgroups. A wellknown association rule learning algorithm APRIORI is first modified into classification rule learning algorithm APRIORI-C. APRIORI-Calgorithm generates the rule from the discretized dataset with the minimum support set to 0.42% with no confidence threshold. Total 580rules are generated at the given support. APRIOIR-C is further modified by making adaptation into APRIORI-SD. Results: Experimentalresults demonstrate that APRIORI discovers the substantially smaller rule sets; each rule has higher support and significance. The rulesthat are obtained by APRIORI-C are ordered by weighted relative accuracy. Conclusion: Only first 66 rules are ordered as they cover therelation between all the 11 MODY genes with each other. These 66 rules are further organized into 11 different subgroups. The evaluationof obtained results from literature shows that APRIORI-SD is a competitive subgroup discovery algorithm. All the association amonggenes proved to be true.


Author(s):  
Silvia Chiusano ◽  
Paolo Garza

In this chapter the authors make a comparative study of five well-known classification rule pruning methods with the aim of understanding their theoretical foundations and their impact on classification accuracy, percentage of pruned rules, model size, and number of wrongly and not classified data. Moreover, they analyze the characteristics of both the selected and pruned rule sets in terms of information content. A large set of experiments has been run in order to empirically evaluate the effect of the pruning methods when applied individually as well as when combined.


Data Mining ◽  
2011 ◽  
pp. 191-208 ◽  
Author(s):  
Rafael S. Parpinelli ◽  
Heitor S. Lopes ◽  
Alex A. Freitas

This work proposes an algorithm for rule discovery called Ant-Miner (Ant Colony-Based Data Miner). The goal of Ant-Miner is to extract classification rules from data. The algorithm is based on recent research on the behavior of real ant colonies as well as in some data mining concepts. We compare the performance of Ant-Miner with the performance of the wellknown C4.5 algorithm on six public domain data sets. The results provide evidence that: (a) Ant-Miner is competitive with C4.5 with respect to predictive accuracy; and (b) the rule sets discovered by Ant-Miner are simpler (smaller) than the rule sets discovered by C4.5.


2004 ◽  
Vol 29 (4) ◽  
pp. 635-674 ◽  
Author(s):  
Elena Baralis ◽  
Silvia Chiusano

Author(s):  
H. Y. Gu ◽  
H. T. Li ◽  
Z. Y. Liu ◽  
C. Y. Shao

Classification rule set is important for Land Cover classification, which refers to features and decision rules. The selection of features and decision are based on an iterative trial-and-error approach that is often utilized in GEOBIA, however, it is time-consuming and has a poor versatility. This study has put forward a rule set building method for Land cover classification based on human knowledge and machine learning. The use of machine learning is to build rule sets effectively which will overcome the iterative trial-and-error approach. The use of human knowledge is to solve the shortcomings of existing machine learning method on insufficient usage of prior knowledge, and improve the versatility of rule sets. A two-step workflow has been introduced, firstly, an initial rule is built based on Random Forest and CART decision tree. Secondly, the initial rule is analyzed and validated based on human knowledge, where we use statistical confidence interval to determine its threshold. The test site is located in Potsdam City. We utilised the TOP, DSM and ground truth data. The results show that the method could determine rule set for Land Cover classification semi-automatically, and there are static features for different land cover classes.


2002 ◽  
Vol 7 (1) ◽  
pp. 31-42
Author(s):  
J. Šaltytė ◽  
K. Dučinskas

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.


2001 ◽  
Vol 6 (2) ◽  
pp. 15-28 ◽  
Author(s):  
K. Dučinskas ◽  
J. Šaltytė

The problem of classification of the realisation of the stationary univariate Gaussian random field into one of two populations with different means and different factorised covariance matrices is considered. In such a case optimal classification rule in the sense of minimum probability of misclassification is associated with non-linear (quadratic) discriminant function. Unknown means and the covariance matrices of the feature vector components are estimated from spatially correlated training samples using the maximum likelihood approach and assuming spatial correlations to be known. Explicit formula of Bayes error rate and the first-order asymptotic expansion of the expected error rate associated with quadratic plug-in discriminant function are presented. A set of numerical calculations for the spherical spatial correlation function is performed and two different spatial sampling designs are compared.


2019 ◽  
Author(s):  
Sawyer Reid stippa ◽  
George Petropoulos ◽  
Leonidas Toulios ◽  
Prashant K. Srivastava

Archaeological site mapping is important for both understanding the history as well as protecting them from excavation during the developmental activities. As archaeological sites generally spread over a large area, use of high spatial resolution remote sensing imagery is becoming increasingly applicable in the world. The main objective of this study was to map the land cover of the Itanos area of Crete and of its changes, with specific focus on the detection of the landscape’s archaeological features. Six satellite images were acquired from the Pleiades and WorldView-2 satellites over a period of 3 years. In addition, digital photography of two known archaeological sites was used for validation. An Object Based Image Analysis (OBIA) classification was subsequently developed using the five acquired satellite images. Two rule-sets were created, one using the standard four bands which both satellites have and another for the two WorldView-2 images their four extra bands included. Validation of the thematic maps produced from the classification scenarios confirmed a difference in accuracy amongst the five images. Comparing the results of a 4-band rule-set versus the 8-band showed a slight increase in classification accuracy using extra bands. The resultant classifications showed a good level of accuracy exceeding 70%. Yet, separating the archaeological sites from the open spaces with little or no vegetation proved challenging. This was mainly due to the high spectral similarity between rocks and the archaeological ruins. The satellite data spatial resolution allowed for the accuracy in defining larger archaeological sites, but still was a difficulty in distinguishing smaller areas of interest. The digital photography data provided a very good 3D representation for the archaeological sites, assisting as well in validating the satellite-derived classification maps. All in all, our study provided further evidence that use of high resolution imagery may allow for archaeological sites to be located, but only where they are of a suitable size archaeological features.


2021 ◽  
Vol 11 (6) ◽  
pp. 2511
Author(s):  
Julian Hatwell ◽  
Mohamed Medhat Gaber ◽  
R. Muhammad Atif Azad

This research presents Gradient Boosted Tree High Importance Path Snippets (gbt-HIPS), a novel, heuristic method for explaining gradient boosted tree (GBT) classification models by extracting a single classification rule (CR) from the ensemble of decision trees that make up the GBT model. This CR contains the most statistically important boundary values of the input space as antecedent terms. The CR represents a hyper-rectangle of the input space inside which the GBT model is, very reliably, classifying all instances with the same class label as the explanandum instance. In a benchmark test using nine data sets and five competing state-of-the-art methods, gbt-HIPS offered the best trade-off between coverage (0.16–0.75) and precision (0.85–0.98). Unlike competing methods, gbt-HIPS is also demonstrably guarded against under- and over-fitting. A further distinguishing feature of our method is that, unlike much prior work, our explanations also provide counterfactual detail in accordance with widely accepted recommendations for what makes a good explanation.


Sign in / Sign up

Export Citation Format

Share Document