Decision tree classification: Ranking journals using IGIDI

Selection of an attribute for placement of the decision tree at an appropriate position (e.g. root of the tree) is an important decision. Many attribute selection measures such as Information Gain, Gini Index and Entropy have been developed for this purpose. The suitability of an attribute generally depends on the diversity of its values, relevance and dependency. Different attribute selection measures have different criteria for measuring the suitability of an attribute. Diversity Index is a classical statistical measure for determining the diversity of values, and according to our knowledge, it has never been used as an attribute selection method. In this article, we propose a novel attribute selection method for decision tree classification. In the proposed scheme, the average of Information Gain, Gini Index and Diversity Index are taken into account for assigning a weight to the attributes. The attribute with the highest average value is selected for the classification. We have empirically tested our proposed algorithm for classification of different data sets of scientific journals and conferences. We have developed a web-based application named JC-Rank that makes use of our proposed algorithm. We have also compared the results of our proposed technique with some existing decision tree classification algorithms.

Download Full-text

Improved differentiation classification of variable precision artificial intelligence higher education management

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219036 ◽

2021 ◽

pp. 1-10

Author(s):

Chao Dong ◽

Yan Guo

Keyword(s):

Artificial Intelligence ◽

Higher Education ◽

Data Mining ◽

Decision Tree ◽

Classification Accuracy ◽

Attribute Selection ◽

Higher Education Management ◽

Education Management ◽

Decision Tree Classification

The wide application of artificial intelligence technology in various fields has accelerated the pace of people exploring the hidden information behind large amounts of data. People hope to use data mining methods to conduct effective research on higher education management, and decision tree classification algorithm as a data analysis method in data mining technology, high-precision classification accuracy, intuitive decision results, and high generalization ability make it become a more ideal method of higher education management. Aiming at the sensitivity of data processing and decision tree classification to noisy data, this paper proposes corresponding improvements, and proposes a variable precision rough set attribute selection standard based on scale function, which considers both the weighted approximation accuracy and attribute value of the attribute. The number improves the anti-interference ability of noise data, reduces the bias in attribute selection, and improves the classification accuracy. At the same time, the suppression factor threshold, support and confidence are introduced in the tree pre-pruning process, which simplifies the tree structure. The comparative experiments on standard data sets show that the improved algorithm proposed in this paper is better than other decision tree algorithms and can effectively realize the differentiated classification of higher education management.

Download Full-text

Reducing Electrodes based on Decision Tree Classification for EEG Motor Movement Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16102 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 344

Author(s):

Jayesh Deep Dubey ◽

Deepak Arora ◽

Pooja Khanna

Keyword(s):

Random Forest ◽

Information Gain ◽

Random Forest Classifier ◽

Attribute Selection ◽

Average Decrease ◽

Motor Movement ◽

Movement Data ◽

Decision Tree Classification ◽

Eeg Data ◽

Eeg Recordings

Analysis of EEG data is one of the most important parts of Brain Computer Interface systems because EEG data consists of a substantial amount of crucial information that can be used for better study and improvements in BCI system. One of the problems with the analysis of EEG is the large amount of data that is produced, some of which might not be useful for the analysis. Therefore identifying the relevant data from the large amount of EEG data is important for better analysis. The objective of this study is to find out the performance of Random Forest classifier on the motor movement EEG data and reducing the number of electrodes that are considered in the EEG recording and analysis so that the amount of data that is produced through EEG recording is reduced and only relevant electrodes are considered in the analysis. The dataset used in the study is Physionet motor movement/imagery data which consists of EEG recordings obtained using 64 electrodes. These 64 electrodes were ranked based on their information gain with respect to the class using Info Gain attribute selection algorithm. The electrodes were then divided into 4 lists. List 1 consists of top 18 ranked electrodes and number of electrodes was increased by 15 [in ranked order] in each subsequent list. List 2, 3 and 4 consists of top 33, 48 and 64 electrodes respectively. The accuracy of random forest classifier for each of the list was compared with the accuracy of the classifier for the List 4 which consists of all the 64 electrodes. The additional electrodes in the List 4 were rejected because the accuracy of the classifier was almost same for List 4 and List3. Through this method we were able to reduce the electrodes from 64 to 48 with an average decrease of only 0.9% in the accuracy of the classifier. This reduction in the electrode can substantially reduce the time and effort required for analysis of EEG data.

Download Full-text

Decision tree classification algorithm for non-equilibrium data set based on random forests

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179937 ◽

2020 ◽

Vol 39 (2) ◽

pp. 1639-1648

Author(s):

Peng Wang ◽

Ningchao Zhang

Keyword(s):

Random Forest ◽

Decision Tree ◽

Wavelet Packet ◽

Classification Algorithm ◽

Data Sets ◽

Data Set ◽

Redundant Data ◽

Decision Tree Classification ◽

Equilibrium Data ◽

Non Equilibrium

In order to overcome the problems of poor accuracy and high complexity of current classification algorithm for non-equilibrium data set, this paper proposes a decision tree classification algorithm for non-equilibrium data set based on random forest. Wavelet packet decomposition is used to denoise non-equilibrium data, and SNM algorithm and RFID are combined to remove redundant data from data sets. Based on the results of data processing, the non-equilibrium data sets are classified by random forest method. According to Bootstrap resampling method with certain constraints, the majority and minority samples of each sample subset are sampled, CART is used to train the data set, and a decision tree is constructed. Obtain the final classification results by voting on the CART decision tree classification. Experimental results show that the proposed algorithm has the characteristics of high classification accuracy and low complexity, and it is a feasible classification algorithm for non-equilibrium data set.

Download Full-text

Learning optimization for decision tree classification of non-categorical data with information gain impurity criterion

2014 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2014.6889842 ◽

2014 ◽

Cited By ~ 7

Author(s):

K. I. Sofeikov ◽

I. Yu. Tyukin ◽

A. N. Gorban ◽

E. M. Mirkes ◽

D. V. Prokhorov ◽

...

Keyword(s):

Decision Tree ◽

Categorical Data ◽

Information Gain ◽

Decision Tree Classification

Download Full-text

A New Selection Measure for Classification Using Decision Trees

Journal of Information & Knowledge Management ◽

10.1142/s0219649204000626 ◽

2004 ◽

Vol 03 (01) ◽

pp. 1-7

Author(s):

B. Chandra ◽

Gaurav Saxena

Keyword(s):

Decision Trees ◽

Classification Accuracy ◽

Gini Index ◽

Real Life ◽

Data Sets ◽

Life Data ◽

The Past ◽

Real Life Data ◽

Selection Measures ◽

Balance Factor

The paper proposes a new selection measure for classification using decision trees for Data mining. Various algorithms have been proposed in the past for classification using decision trees viz. ID3, CART, SLIQ, etc. Selection measures like the Gain, Gain ratio, and Gini index have been proposed in these algorithms. However, none of the selection measures developed so far take into account the balancing of trees. This paper proposes a new selection measure which also takes into account the balancing of trees that will facilitate in improving the classification accuracy. The performance of the original SLIQ algorithm, C5 and the algorithm using the new selection measure (which takes into account the accuracy as well as the balance factor) was measured on the basis of the classification accuracy. Three real life data sets were chosen for this purpose.

Download Full-text

Handwritten digits recognition with decision tree classification: a machine learning approach

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i5.pp4446-4451 ◽

2019 ◽

Vol 9 (5) ◽

pp. 4446

Author(s):

Tsehay Admassu Assegie ◽

Pramod Sekharan Nair

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Classification Model ◽

Learning Approach ◽

Data Sets ◽

The Past ◽

Decision Tree Classification ◽

Machine Learning Approach ◽

Future Data ◽

Class Labels

Handwritten digits recognition is an area of machine learning, in which a machine is trained to identify handwritten digits. One method of achieving this is with decision tree classification model. A decision tree classification is a machine learning approach that uses the predefined labels from the past known sets to determine or predict the classes of the future data sets where the class labels are unknown. In this paper we have used the standard kaggle digits dataset for recognition of handwritten digits using a decision tree classification approach. And we have evaluated the accuracy of the model against each digit from 0 to 9.

Download Full-text

Comparative analysis of attribute selection measures used for attribute selection in decision tree induction

2012 International Conference on Radar, Communication and Computing (ICRCC) ◽

10.1109/icrcc.2012.6450584 ◽

2012 ◽

Author(s):

A. S. Bhatt

Keyword(s):

Comparative Analysis ◽

Decision Tree ◽

Attribute Selection ◽

Decision Tree Induction ◽

Selection Measures

Download Full-text

Classification of Loan Applications of Garima Bikas Bank Ltd Using Decision Tree Classification Method

Journal of Advanced College of Engineering and Management ◽

10.3126/jacem.v5i0.26763 ◽

2019 ◽

Vol 5 ◽

pp. 147-152

Author(s):

Subik Shrestha ◽

Laxman Paudel

Keyword(s):

Decision Tree ◽

Information Gain ◽

Application Process ◽

Tree Model ◽

Loan Repayment ◽

Decision Tree Classification ◽

Attribute Information ◽

The Status ◽

Hidden Patterns

There is a possibility in finding hidden patterns that might help find a relationship between the information provided by the Loan Applicants during the Loan Application process and the status of their loan repayment. This paper highlights on finding such patterns by building a Decision Tree with the help of the data provided during the loan application process. Eleven attribute information of Five Hundred sixty four loan applicants were collected from Garima Bikas Bank Ltd. A decision tree model with a depth of 6 has been built by calculating the entropy and information gain at each split and selecting the feature with the highest information gain.

Download Full-text

Investigation of a Joint Splitting Criteria for Decision Tree Classifier Use of Information Gain and Gini Index

TENCON 2018 - 2018 IEEE Region 10 Conference ◽

10.1109/tencon.2018.8650485 ◽

2018 ◽

Cited By ~ 3

Author(s):

Vikas Jain ◽

Ashish Phophalia ◽

Jignesh S. Bhatt

Keyword(s):

Decision Tree ◽

Gini Index ◽

Information Gain ◽

Decision Tree Classifier ◽

Tree Classifier

Download Full-text

Mate-tree: un algoritmo para la tarea de clasificación basado en operadores algebraicos y primitivas SQL [Mate-tree: an Algorithm for classification task based on algebraic operators and SQL primitives]

Ventana informatica ◽

10.30554/ventanainform.26.140.2012 ◽

2012 ◽

Author(s):

Ricardo Timarán Pereira

Keyword(s):

Data Mining ◽

Decision Tree ◽

Decision Trees ◽

Information Gain ◽

Classification Task ◽

Classification Algorithms ◽

Classification Rules ◽

Decision Tree Classification ◽

Expensive Process ◽

Algebraic Operators

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación. Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task.

Download Full-text