Bayesian Maximal Information Coefficient (BMIC) to Reason Novel Trends in Large Datasets
Abstract Bayesian network (BN) is a probability inference model to describe the explicit relationship of cause and effect, which may examine the complex system of rice price with data uncertainty. However, discovering the optimized structure from a super-exponential number of graphs in the search space is an NP-hard problem. In this paper, Bayesian maximal information coefficient (BMIC) is proposed to uncover the causal correlations from a large dataset in a random system by integrating probabilistic graphical model (PGM) and maximal information coefficient (MIC) with Bayesian linear regression (BLR). First, MIC is to capture the strong dependence between predictor variables and a target variable to reduce the number of variables for the BN structural learning of PGM. Second BLR is to assign orientation in a graph resulting by a posterior probability distribution. It conforms to what BN needs to acquire a conditional probability distribution when given the parents for each node by the Bayes' Theorem. Third, Bayesian information criterion (BIC) is treated as an indicator to determine the well-explained model with its data to ensure correctness. The score shows that the proposed method obtains the highest score compared to the two traditional learning algorithms. Finally, the BMIC is applied to discover the causal correlations from the large dataset on Thai rice price by identifying causality change in the paddy price of Jasmine rice. The experimented results show the proposed BMIC returns the directional relationships with clue to identify the cause(s) and effect(s) on paddy price with better heuristic search.