Mutual information-based multi-output tree learning algorithm

2021 ◽  
Vol 25 (6) ◽  
pp. 1525-1545
Author(s):  
Hyun-Seok Kang ◽  
Chi-Hyuck Jun

A tree model with low time complexity can support the application of artificial intelligence to industrial systems. Variable selection based tree learning algorithms are more time efficient than existing Classification and Regression Tree (CART) algorithms. To our best knowledge, there is no attempt to deal with categorical input variable in variable selection based multi-output tree learning. Also, in the case of multi-output regression tree, a conventional variable selection based algorithm is not suitable to large datasets. We propose a mutual information-based multi-output tree learning algorithm that consists of variable selection and split optimization. The proposed method discretizes each variable based on k-means into 2–4 clusters and selects the variable for splitting based on the discretized variables using mutual information. This variable selection component has relatively low time complexity and can be applied regardless of output dimension and types. The proposed split optimization component is more efficient than an exhaustive search. The performance of the proposed tree learning algorithm is similar to or better than that of a multi-output version of CART algorithm on a specific dataset. In addition, with a large dataset, the time complexity of the proposed algorithm is significantly reduced compared to a CART algorithm.

Diversity ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 502
Author(s):  
Yang-Liang Gu ◽  
Qi Huang ◽  
Lei Xu ◽  
Eric Zeus Rizo ◽  
Miguel Alonso ◽  
...  

In deserts, pond cladocerans suffer harsh conditions like low and erratic rainfall, high evaporation, and highly variable salinity, and they have limited species richness. The limited species can take advantage of ephippia or resting eggs for being dispersed with winds in such habitats. Thus, environmental selection is assumed to play a major role in community assembly, especially at a fine spatial scale. Located in Inner Mongolia, the Ulan Buh desert has plenty of temporary water bodies and a few permanent lakes filled by groundwater. To determine species diversity and the role of environmental selection in community assembly in such a harsh environment, we sampled 37 sand ponds in June 2012. Fourteen species of Cladocera were found in total, including six pelagic species, eight littoral species, and two benthic species. These cladocerans were mainly temperate and cosmopolitan fauna. Our classification and regression tree model showed that conductivity, dissolved oxygen, and pH were the main factors correlated with species richness in the sand ponds. Spatial analysis using a PCNM model demonstrated a broad-scale spatial structure in the cladoceran communities. Conductivity was the most significant environmental variable explaining cladoceran community variation. Two species, Moina cf. brachiata and Ceriodaphnia reticulata occurred commonly, with an overlap at intermediate conductivity. Our results, therefore, support that environmental selection plays a major role in structuring cladoceran communities in deserts.


2020 ◽  
Vol 39 (5) ◽  
pp. 6073-6087
Author(s):  
Meltem Yontar ◽  
Özge Hüsniye Namli ◽  
Seda Yanik

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.


Sign in / Sign up

Export Citation Format

Share Document