A statistical-heuristic feature selection criterion for decision tree induction

This paper concerns feature selection for computational analysis in authenticating works of art. The various features designed and extracted from art work in art forgery detection or the identification of the characteristics of art work style are valuable only when they have a meaningful influence on a given task such as classification. This paper presents features applicable to authenticating the painting style of Piet Mondrian and demonstrates meaningful features by using two supervised learning algorithms, a decision tree induction algorithm C4.5 and the Feature Generating Machine (FGM), both of which are used to select important features in the course of learning.

Download Full-text

Model Feature Selection dalam Penentuan Parameter Pengelompokan Kompetensi SDM IG

KREA-TIF ◽

10.32832/kreatif.v7i2.2696 ◽

2019 ◽

Vol 7 (2) ◽

pp. 80

Author(s):

Budi Susetyo ◽

Puspa Eosina ◽

Immas Nurhayati ◽

Indupurnahayu Indupurnahayu

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Binary Tree ◽

Subset Selection ◽

Decision Tree Induction

<em>Industri geospasial memiliki prospek bisnis yang berkembang pesat di Indonesia, khususnya di sektor swasta. Untuk mengetahui seberapa besar potensi sumberdaya manusia sesuai dengan kompetensi bidang informasi geospasial tersebut dibutuhkan survey dan analisis terkait parameter beberapa parameter kompetensi. Tujuan penelitian ini adalah mencari pengukuran parameter yang paling mempengaruhi pengelompokan kompetensi sumberdaya manusia bidang informasi geospasial. Penelitian ini menggunakan data profil yang telah diolah menjadi 5 kategori index yaitu WEI, EFI, ENI, CFI, dan CPI. dengan jumlah sampel 46 data. Metode yang digunakan adalah k-means clustering untuk pembentukan cluster kompetensi yang selanjutnya dibandingkan di antara 4 ,5 dan 6 cluster. Evaluasi cluster yang dipilih adalah menggunakan Mean intercluster dissimilarity dengan rumus jarak Euclidean. Dihasilkan bahwa pengelompokan paling optimal adalah 4 cluster dengan nilai intercluster terbesar, yaitu 0.45699. Fature subset selection dilakukan terhadap data yang sudah membentuk 4 cluster untuk melihat parameter yang paling berpengaruh. Untuk hal ini, digunakan metode Decision Tree Induction dengan skema Binary Tree. Diperoleh nilai Impurity terkecil pada atribut EFI, yaitu sebesar 0.6857 yang menunjukkan bahwa atribut EFI adalah parameter yang paling berpengaruh dalam menentukan label sebuah data.</em>

Download Full-text

Cost-Sensitive Decision Tree Induction for Feature Selection and Sequential Minimal Optimisation for Classification: CSattribSelectorC4.5()

International Journal Of Data Mining And Emerging Technologies ◽

10.5958/j.2249-3220.3.2.008 ◽

2013 ◽

Vol 3 (2) ◽

pp. 58 ◽

Cited By ~ 1

Author(s):

Ankit B. Desai ◽

Jalpesh Vasa

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Decision Tree Induction

Download Full-text

Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning

Information Sciences ◽

10.1016/j.ins.2003.03.019 ◽

2004 ◽

Vol 163 (1-3) ◽

pp. 103-122 ◽

Cited By ~ 55

Author(s):

William H. Hsu

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Bayesian Network ◽

Network Structure ◽

Structure Learning ◽

Decision Tree Induction ◽

Bayesian Network Structure ◽

Bayesian Network Structure Learning ◽

Variable Ordering

Download Full-text

A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

Applied Intelligence ◽

10.1007/s10489-021-02281-x ◽

2021 ◽

Author(s):

Ferdinand Bollwein ◽

Stephan Westphal

Keyword(s):

Decision Tree ◽

Feature Space ◽

Classification Problems ◽

Decision Tree Induction ◽

Single Attribute ◽

Global Optimal ◽

The Individual ◽

Tree Building ◽

Very High ◽

Multiclass Classification Problems

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.

Download Full-text

Embedded Feature Construction in Fuzzy Decision Tree Induction for High Energy Physics Classification

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9283103 ◽

2020 ◽

Author(s):

Noelie Cherrier ◽

Jean-Philippe Poli ◽

Maxime Defurne ◽

Franck Sabatie

Keyword(s):

Decision Tree ◽

High Energy Physics ◽

High Energy ◽

Feature Construction ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Decision Tree Induction ◽

Energy Physics

Download Full-text

Diabetes disease prediction using decision tree for feature selection

Journal of Physics Conference Series ◽

10.1088/1742-6596/1964/6/062116 ◽

2021 ◽

Vol 1964 (6) ◽

pp. 062116

Author(s):

Jayakumar Sadhasivam ◽

V Muthukumaran ◽

J Thimmia Raja ◽

Rose Bindu Joseph ◽

Meram Munirathanam ◽

...

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Disease Prediction

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text