information coefficient
Recently Published Documents


TOTAL DOCUMENTS

108
(FIVE YEARS 46)

H-INDEX

11
(FIVE YEARS 3)

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zhibin Xiong ◽  
Jun Huang

Purpose Ensemble models that combine multiple base classifiers have been widely used to improve prediction performance in credit risk evaluation. However, an arbitrary selection of base classifiers is problematic. The purpose of this paper is to develop a framework for selecting base classifiers to improve the overall classification performance of an ensemble model. Design/methodology/approach In this study, selecting base classifiers is treated as a feature selection problem, where the output from a base classifier can be considered a feature. The proposed correlation-based classifier selection using the maximum information coefficient (MIC-CCS), a correlation-based classifier selection under the maximum information coefficient method, selects the features (classifiers) using nonlinear optimization programming, which seeks to optimize the relationship between the accuracy and diversity of base classifiers, based on MIC. Findings The empirical results show that ensemble models perform better than stand-alone ones, whereas the ensemble model based on MIC-CCS outperforms the ensemble models with unselected base classifiers and other ensemble models based on traditional forward and backward selection methods. Additionally, the classification performance of the ensemble model in which correlation is measured with MIC is better than that measured with the Pearson correlation coefficient. Research limitations/implications The study provides an alternate solution to effectively select base classifiers that are significantly different, so that they can provide complementary information and, as these selected classifiers have good predictive capabilities, the classification performance of the ensemble model is improved. Originality/value This paper introduces MIC to the correlation-based selection process to better capture nonlinear and nonfunctional relationships in a complex credit data structure and construct a novel nonlinear programming model for base classifiers selection that has not been used in other studies.


2021 ◽  
Author(s):  
Shuliang Wang ◽  
Tisinee Surapunt

Abstract Bayesian network (BN) is a probability inference model to describe the explicit relationship of cause and effect, which may examine the complex system of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, Bayesian maximal information coefficient (BMIC) is proposed to uncover the causal correlations from a large dataset in a random system by integrating probabilistic graphical model (PGM) and maximal information coefficient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable to reduce the number of variables for the BN structural learning of PGM. Second BLR is to assign orientation in a graph resulting by a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes' Theorem. Third, Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed method obtains the highest score compared to the two traditional learning algorithms. Finally, the BMIC is applied to discover the causal correlations from the large dataset on Thai rice price by identifying causality change in the paddy price of Jasmine rice. The experimented results show the proposed BMIC returns the directional relationships with clue to identify the cause(s) and effect(s) on paddy price with better heuristic search.


Sign in / Sign up

Export Citation Format

Share Document