A Probabilistic Classifier System and Its Application in Data Mining

The article is about a new Classifier System framework for classification tasks called BYP CS (for BaYesian Predictive Classifier System). The proposed CS approach abandons the focus on high accuracy and addresses a well-posed Data Mining goal, namely, that of uncovering the low-uncertainty patterns of dependence that manifest often in the data. To attain this goal, BYP CS uses a fair amount of probabilistic machinery, which brings its representation language closer to other related methods of interest in statistics and machine learning. On the practical side, the new algorithm is seen to yield stable learning of compact populations, and these still maintain a respectable amount of predictive power. Furthermore, the emerging rules self-organize in interesting ways, sometimes providing unexpected solutions to certain benchmark problems.

Download Full-text

Predicting Browsers and Purchasers of Hotel Websites

Cornell Hospitality Quarterly ◽

10.1177/1938965512468225 ◽

2012 ◽

Vol 54 (1) ◽

pp. 38-48 ◽

Cited By ~ 16

Author(s):

Edmond H. C. Wu ◽

Rob Law ◽

Brianda Jiang

Keyword(s):

Data Mining ◽

Hong Kong ◽

Predictive Power ◽

Decision Rules ◽

Customer Segmentation ◽

Marketing Strategies ◽

Weight Of Evidence ◽

Marketing Resources ◽

Hotel Websites ◽

Analytical Power

A study of the online browsing and purchasing habits of some 1,400 outbound travelers in Hong Kong demonstrates the analytical power of weight-of-evidence (WOE) data mining. The WOE approach allows analysts to identify and transform the variables with the most predictive power regarding the likelihood of tourists’ online preferences and decisions. The study found that just over one-third of the respondents browsed hotel-related websites, and about half of those browsers had booked a room on those sites. Browsers in Hong Kong tended to be young, well educated, and well traveled. Those who used the hotel websites for purchases were, of course, part of the browser group, and were likewise relatively well educated. However, one unexpected variable set off those who used the websites for a hotel purchase, the length of their most recent trip. One possible reason is that long-haul tourists want to be sure of their accommodations, or this may reflect hotels’ free-night offers. The convenient use of model-based customer segmentation and decision rules would help hospitality practitioners effectively manage their marketing resources and activities, and enhance information-based marketing strategies to attract target customers.

Download Full-text

Development of a soldering quality classifier system using a hybrid data mining approach

Expert Systems with Applications ◽

10.1016/j.eswa.2011.11.097 ◽

2012 ◽

Vol 39 (5) ◽

pp. 5727-5738 ◽

Cited By ~ 14

Author(s):

Tsung-Nan Tsai

Keyword(s):

Data Mining ◽

Classifier System ◽

Data Mining Approach ◽

Hybrid Data

Download Full-text

Estimating student ability and problem difficulty using item response theory (IRT) and TrueSkill

Information Discovery and Delivery ◽

10.1108/idd-08-2018-0030 ◽

2019 ◽

Vol 47 (2) ◽

pp. 67-75 ◽

Cited By ~ 1

Author(s):

Youngjin Lee

Keyword(s):

Data Mining ◽

Item Response Theory ◽

Problem Solving ◽

Learning Environment ◽

Item Response ◽

Predictive Power ◽

Response Theory ◽

Content Type ◽

Computer Based Learning ◽

Computer Based

Purpose The purpose of this paper is to investigate an efficient means of estimating the ability of students solving problems in the computer-based learning environment. Design/methodology/approach Item response theory (IRT) and TrueSkill were applied to simulated and real problem solving data to estimate the ability of students solving homework problems in the massive open online course (MOOC). Based on the estimated ability, data mining models predicting whether students can correctly solve homework and quiz problems in the MOOC were developed. The predictive power of IRT- and TrueSkill-based data mining models was compared in terms of Area Under the receiver operating characteristic Curve. Findings The correlation between students’ ability estimated from IRT and TrueSkill was strong. In addition, IRT- and TrueSkill-based data mining models showed a comparable predictive power when the data included a large number of students. While IRT failed to estimate students’ ability and could not predict their problem solving performance when the data included a small number of students, TrueSkill did not experience such problems. Originality/value Estimating students’ ability is critical to determine the most appropriate time for providing instructional scaffolding in the computer-based learning environment. The findings of this study suggest that TrueSkill can be an efficient means for estimating the ability of students solving problems in the computer-based learning environment regardless of the number of students.

Download Full-text

Combining accuracy and success-rate to improve the performance of eXtended Classifier System (XCS) for data-mining and control applications

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2013.04.004 ◽

2013 ◽

Vol 26 (8) ◽

pp. 1930-1935 ◽

Cited By ~ 3

Author(s):

M. Shariat Panahi ◽

A. Karkhaneh Yousefi ◽

M. Khorshidi

Keyword(s):

Data Mining ◽

Success Rate ◽

Control Applications ◽

Classifier System ◽

And Control

Download Full-text

Robust on-line neural learning classifier system for data stream classification tasks

Soft Computing ◽

10.1007/s00500-014-1233-9 ◽

2014 ◽

Vol 18 (8) ◽

pp. 1441-1461 ◽

Cited By ~ 5

Author(s):

Andreu Sancho-Asensio ◽

Albert Orriols-Puig ◽

Elisabet Golobardes

Keyword(s):

Data Stream ◽

Learning Classifier System ◽

Stream Classification ◽

Data Stream Classification ◽

Classifier System ◽

Learning Classifier ◽

Neural Learning ◽

On Line ◽

Classification Tasks

Download Full-text

Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

Processes ◽

10.3390/pr7040222 ◽

2019 ◽

Vol 7 (4) ◽

pp. 222 ◽

Cited By ~ 4

Author(s):

Bodur ◽

Atsa’am

Keyword(s):

Data Mining ◽

Variable Selection ◽

Feature Space ◽

Selection Methods ◽

Selection Algorithm ◽

Fisher Score ◽

Healthcare Data ◽

Classification Tasks ◽

Risk Ratios ◽

Variable Ranking

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.

Download Full-text

Swarm intelligence for data mining classification tasks: an experimental study using medical decision problems

Swarm Intelligence - Volume 3: Applications ◽

10.1049/pbce119h_ch14 ◽

2018 ◽

pp. 403-428

Author(s):

Jose A. Saez ◽

Emilio Corchado

Keyword(s):

Data Mining ◽

Experimental Study ◽

Swarm Intelligence ◽

Decision Problems ◽

Medical Decision ◽

Classification Tasks

Download Full-text

Blog Data Mining: The Predictive Power of Sentiments

Data Mining for Business Applications ◽

10.1007/978-0-387-79420-4_13 ◽

2008 ◽

pp. 183-195 ◽

Cited By ~ 7

Author(s):

Yang Liu ◽

Xiaohui Yu ◽

Xiangji Huang ◽

Aijun An

Keyword(s):

Data Mining ◽

Predictive Power

Download Full-text

Approach to Clustering with Variance-Based XCS

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0885 ◽

2017 ◽

Vol 21 (5) ◽

pp. 885-894 ◽

Cited By ~ 1

Author(s):

Caili Zhang ◽

Takato Tatsumi ◽

Masaya Nakata ◽

Keiki Takadama ◽

◽

...

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Real World ◽

Input Data ◽

Bench Mark ◽

Learning Classifier System ◽

Classifier System ◽

Learning Classifier ◽

Real World Problems

This paper presents an approach to clustering that extends the variance-based Learning Classifier System (XCS-VR). In real world problems, the ability to combine similar rules is crucial in the knowledge discovery and data mining field. Conventionally, XCS-VR is able to acquire generalized rules, but it cannot further acquire more generalized rules from these rules. The proposed approach (called XCS-VRc) accomplishes this by integrating similar generalized rules. To validate the proposed approach, we designed a bench-mark problem to examine whether XCS-VRc can cluster both the generalized and more generalized features in the input data. The proposed XCS-VRc proved to be more efficient than XCS and the conventional XCS-VR.

Download Full-text

Learning classifier system ensemble for data mining

Proceedings of the 2005 workshops on Genetic and evolutionary computation - GECCO '05 ◽

10.1145/1102256.1102268 ◽

2005 ◽

Cited By ~ 5

Author(s):

Yang Gao ◽

Joshua Zhexue Huang ◽

Hongqiang Rong ◽

Daqian Gu

Keyword(s):

Data Mining ◽

Learning Classifier System ◽

Classifier System ◽

Learning Classifier

Download Full-text