A Probabilistic Classifier System and Its Application in Data Mining

2006 ◽  
Vol 14 (2) ◽  
pp. 183-221 ◽  
Author(s):  
Jorge Muruzábal

The article is about a new Classifier System framework for classification tasks called BYP CS (for BaYesian Predictive Classifier System). The proposed CS approach abandons the focus on high accuracy and addresses a well-posed Data Mining goal, namely, that of uncovering the low-uncertainty patterns of dependence that manifest often in the data. To attain this goal, BYP CS uses a fair amount of probabilistic machinery, which brings its representation language closer to other related methods of interest in statistics and machine learning. On the practical side, the new algorithm is seen to yield stable learning of compact populations, and these still maintain a respectable amount of predictive power. Furthermore, the emerging rules self-organize in interesting ways, sometimes providing unexpected solutions to certain benchmark problems.

2012 ◽  
Vol 54 (1) ◽  
pp. 38-48 ◽  
Author(s):  
Edmond H. C. Wu ◽  
Rob Law ◽  
Brianda Jiang

A study of the online browsing and purchasing habits of some 1,400 outbound travelers in Hong Kong demonstrates the analytical power of weight-of-evidence (WOE) data mining. The WOE approach allows analysts to identify and transform the variables with the most predictive power regarding the likelihood of tourists’ online preferences and decisions. The study found that just over one-third of the respondents browsed hotel-related websites, and about half of those browsers had booked a room on those sites. Browsers in Hong Kong tended to be young, well educated, and well traveled. Those who used the hotel websites for purchases were, of course, part of the browser group, and were likewise relatively well educated. However, one unexpected variable set off those who used the websites for a hotel purchase, the length of their most recent trip. One possible reason is that long-haul tourists want to be sure of their accommodations, or this may reflect hotels’ free-night offers. The convenient use of model-based customer segmentation and decision rules would help hospitality practitioners effectively manage their marketing resources and activities, and enhance information-based marketing strategies to attract target customers.


2019 ◽  
Vol 47 (2) ◽  
pp. 67-75 ◽  
Author(s):  
Youngjin Lee

Purpose The purpose of this paper is to investigate an efficient means of estimating the ability of students solving problems in the computer-based learning environment. Design/methodology/approach Item response theory (IRT) and TrueSkill were applied to simulated and real problem solving data to estimate the ability of students solving homework problems in the massive open online course (MOOC). Based on the estimated ability, data mining models predicting whether students can correctly solve homework and quiz problems in the MOOC were developed. The predictive power of IRT- and TrueSkill-based data mining models was compared in terms of Area Under the receiver operating characteristic Curve. Findings The correlation between students’ ability estimated from IRT and TrueSkill was strong. In addition, IRT- and TrueSkill-based data mining models showed a comparable predictive power when the data included a large number of students. While IRT failed to estimate students’ ability and could not predict their problem solving performance when the data included a small number of students, TrueSkill did not experience such problems. Originality/value Estimating students’ ability is critical to determine the most appropriate time for providing instructional scaffolding in the computer-based learning environment. The findings of this study suggest that TrueSkill can be an efficient means for estimating the ability of students solving problems in the computer-based learning environment regardless of the number of students.


Processes ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 222 ◽  
Author(s):  
Bodur ◽  
Atsa’am

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.


Author(s):  
Yang Liu ◽  
Xiaohui Yu ◽  
Xiangji Huang ◽  
Aijun An
Keyword(s):  

Author(s):  
Caili Zhang ◽  
Takato Tatsumi ◽  
Masaya Nakata ◽  
Keiki Takadama ◽  
◽  
...  

This paper presents an approach to clustering that extends the variance-based Learning Classifier System (XCS-VR). In real world problems, the ability to combine similar rules is crucial in the knowledge discovery and data mining field. Conventionally, XCS-VR is able to acquire generalized rules, but it cannot further acquire more generalized rules from these rules. The proposed approach (called XCS-VRc) accomplishes this by integrating similar generalized rules. To validate the proposed approach, we designed a bench-mark problem to examine whether XCS-VRc can cluster both the generalized and more generalized features in the input data. The proposed XCS-VRc proved to be more efficient than XCS and the conventional XCS-VR.


Sign in / Sign up

Export Citation Format

Share Document