scholarly journals An Explainable Bayesian Decision Tree Algorithm

Author(s):  
Giuseppe Nuti ◽  
Lluís Antoni Jiménez Rugama ◽  
Andreea-Ingrid Cross

Bayesian Decision Trees provide a probabilistic framework that reduces the instability of Decision Trees while maintaining their explainability. While Markov Chain Monte Carlo methods are typically used to construct Bayesian Decision Trees, here we provide a deterministic Bayesian Decision Tree algorithm that eliminates the sampling and does not require a pruning step. This algorithm generates the greedy-modal tree (GMT) which is applicable to both regression and classification problems. We tested the algorithm on various benchmark classification data sets and obtained similar accuracies to other known techniques. Furthermore, we show that we can statistically analyze how was the GMT derived from the data and demonstrate this analysis with a financial example. Notably, the GMT allows for a technique that provides explainable simpler models which is often a prerequisite for applications in finance or the medical industry.

Classification problems in high dimensional data with small number of observations are becoming more common especially in microarray data. The performance in terms of accuracy is essential while handling sensitive data particularly in medical field. For this the stability of the selected features must be evaluated. Therefore, this paper proposes a new evaluation measure that incorporates the stability of the selected feature subsets and accuracy of the prediction. Booster in feature selection algorithm helps to achieve the same. The proposed work resolves both structured and unstructured data using convolution neural network based multimodal disease prediction and decision tree algorithm respectively. The algorithm is tested on heart disease dataset retrieved from UCI repository and the analysis shows the improved prediction accuracy.


Author(s):  
Muhamad Hasbullah Bin Mohd Razali ◽  
Rizauddin Bin Saian ◽  
Yap Bee Wah ◽  
Ku Ruhana Ku-Mahamud

<span>Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called hellinger-ant-tree-miner (HATM) which is inspired by ant colony optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio &lt; 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of hellinger distance. The statistical significance test shows that HATM has higher mean BACC score than ATM.</span>


2021 ◽  
Vol 1802 (3) ◽  
pp. 032090
Author(s):  
Qi Xia ◽  
Yu Wang ◽  
Jian Zhou ◽  
Shengqing Pei ◽  
Zhiqiang Geng ◽  
...  

2020 ◽  
pp. 276-292
Author(s):  
Ivana Podhorska ◽  
Jaromir Vrbka ◽  
George Lazaroiu ◽  
Maria Kovacova

The issue of enterprise financial distress represents the actual and interdisciplinary topic for the economic community. The bankrupt is thus one of the major externalities of today’s modern economies, which cannot be avoided even with every effort. Where there are investment opportunities, there are individuals and businesses that are willing to assume their financial obligations and the resulting risks to maintain and develop their standard of living or their economic activities. The decision tree algorithm is one of the most intuitive methods of data mining that can be used for financial distress prediction. Systematization literary sources and approaches prove that decision trees represent the part of the innovations in financial management. The main propose of the research is a possibility of application of a decision tree algorithm for the creation of the prediction model, which can be used in economy practice. The Paper's main aim is to create a comprehensive prediction model of enterprise financial distress based on decision trees, under the conditions of emerging markets. Paper methods are based on the decision tree, with emphasis on algorithm CART. Emerging markets included 17 countries: Slovak Republic, Czech Republic, Poland, Hungary, Romania, Bulgaria, Lithuania, Latvia, Estonia, Slovenia, Croatia, Serbia, Russia, Ukraine, Belarus, Montenegro, and Macedonia. Paper research is focused on the possibilities of implementation of a decision tree algorithm for the creation of a prediction model in the condition of emerging markets. Used data contained 2,359,731 enterprises from emerging markets (30% of total amount); divided into prosperous enterprises (1,802,027) and non-prosperous enterprises (557,704); obtained from Amadeus database. Input variables for the model represented 24 financial indicators, 3 dummy variables, and the countries' GDP data, in the years 2015 and 2016. The 80% of enterprises represented the training sample and 20% test sample, for model creation. The model correctly classified 93.2% of enterprises from both the training and test sample. Correctly classification of non-prosperous enterprises was 83.5% in both samples. The result of the research brings a new model for the identification of bankrupt enterprises. The created prediction model can be considered sufficiently suitable for classifying enterprises in emerging markets. Keywords prediction model, decision tree, emerging markets.


2021 ◽  
Vol 2 (01) ◽  
pp. 20-28
Author(s):  
Bahzad Charbuty ◽  
Adnan Abdulazeez

Decision tree classifiers are regarded to be a standout of the most well-known methods to data classification representation of classifiers. Different researchers from various fields and backgrounds have considered the problem of extending a decision tree from available data, such as machine study, pattern recognition, and statistics. In various fields such as medical disease analysis, text classification, user smartphone classification, images, and many more the employment of Decision tree classifiers has been proposed in many ways. This paper provides a detailed approach to the decision trees. Furthermore, paper specifics, such as algorithms/approaches used, datasets, and outcomes achieved, are evaluated and outlined comprehensively. In addition, all of the approaches analyzed were discussed to illustrate the themes of the authors and identify the most accurate classifiers. As a result, the uses of different types of datasets are discussed and their findings are analyzed.


2011 ◽  
Vol 403-408 ◽  
pp. 1002-1007
Author(s):  
Chandra Chandra ◽  
P. Ajitha

Current Classification algorithms require large amounts of data to be stored enduringly in the memory for long assortment and amount of time. Diverse classification techniques had been already proposed in the literature for both in the run of the mill environment and distributed environment. Mining of decision trees in the distributed environment can be able to handle the large amount of data but with high communication cost. A new distributed communication decision tree algorithm is proposed here which reduces the communication cost for the transmission of the data in the distributed and heterogeneous environment.


Sign in / Sign up

Export Citation Format

Share Document