scholarly journals Entropy based C4.5-SHO algorithm with information gain optimization in data mining

2021 ◽  
Vol 7 ◽  
pp. e424
Author(s):  
G Sekhar Reddy ◽  
Suneetha Chittineni

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.

Soft computing dedicatedly works for decision making. In this domain a number of techniques are used for prediction, classification, categorization, optimization, and information extraction. Among rule mining is one of the essential methodologies. “IF Then Else” can work as rules, to classify, or predict an event in real world. Basically, that is rule based learning concept, additionally it is frequently used in various data mining applications during decision making and machine learning. There are some supervised learning approaches are available which can be used for rule mining. In this context decision tree is a helpful algorithm. The algorithm works on data splitting strategy using entropy and information gain. The data information is mapped in a tree structure for developing “IF Then Else” rules. In this work an application of rule based learning is presented for recycling of water in a distillation unit. By using the designed experimental still plant different attributes are collected with the observed distillated yield and instantaneous efficiency. This observed data is learned with the C4.5 decision tree algorithm and also predict the distillated yield and instantaneous efficiency. Finally to classify and predict the required parameters “IF Then Else” rules are prepared. The experimental results demonstrate, the proposed C4.5 algorithm provides higher accuracy as compared to similar state of art techniques. The proposed technique offers up to 5-9% improved outcome in terms of accuracy.


Author(s):  
Mohammad Vahid Sebt ◽  
Elahe Komijani ◽  
Shiva S. Ghasemi

<p class="0abstract">Nowadays, the banking system is known as one of the inherent sectors of customer relationship management systems. Its main advantage is to redesign a more responsive organization to satisfy the customers. The banking system aims to improve the structure of organizations to provide a better customer service through a set of automated and integrated processes. The final goal is to collect and reprocess the personal information of customers. To handle this dilemma, a number of new techniques in data mining provide a powerful tool to explore customers’ information regarding a set of data and tools for customer relationship management. Accordingly, the customers’ classification and coordination of banking system are the main challenging issues of today's world. These reasons motivate the attempts of this study to apply a composition of neural network by considering the C4.5 decision tree and the k-closest neighbor method as a variant of core boosting methodology with maximal strategy. To validate the proposed solution approach, a case study of Ansar Bank in Iran is utilized. From the results, it is observed that the proposed method provides a competitive output with the rate of 95% for the customers’ classification. It also outperforms other existing methods with the rate of C4.5 decision tree, neural network, Naive Bayes and KNN with the rate of 1.04%. The main finding of this research is to propose an algorithm with the error rate of 1.9% and error squared of 0.72% as the best performance among other methods from the literature.<strong></strong></p>


2013 ◽  
Vol 380-384 ◽  
pp. 1469-1472
Author(s):  
Gui Jun Shan

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.


2012 ◽  
Vol 457-458 ◽  
pp. 754-757
Author(s):  
Hong Yan Zhao

The Decision Tree technology, which is the main technology of the Data Mining classification and forecast, is the classifying rule that infers the Decision Tree manifestation through group of out-of-orders, the non-rule examples. Based on the research background of The Decision Tree’s concept, the C4.5 Algorithm and the construction of The Decision Tree, the using of C4.5 Decision Tree Algorithm was applied to result analysis of students’ score for the purpose of improving the teaching quality.


2020 ◽  
Vol 27 (3) ◽  
pp. 29-43
Author(s):  
Sihem Oujdi ◽  
Hafida Belbachir ◽  
Faouzi Boufares

Using data mining techniques on spatial data is more complex than on classical data. To be able to extract useful patterns, the spatial data mining algorithms must deal with the representation of data as stack of thematic layers and consider, in addition to the object of interest itself, its neighbors linked through implicit spatial relations. The application of the classification by decision trees combined with the visualization tools represents a convenient decision support tool for spatial data analysis. The purpose of this paper is to provide and evaluate an alternative spatial classification algorithm that supports the thematic-layered data organization, by the adaptation of the C4.5 decision tree algorithm to spatial data, named S-C4.5, inspired by the SCART and spatial ID3 algorithms and the adoption of the Spatial Join Index. Our work concerns both data organization and the algorithm adaptation. Decision tree construction was experimented on traffic accident dataset and benchmarked on both computation time and memory consumption according to different experimentations: study of phenomenon by a single and then by multiple other phenomena, including one or more spatial relations. Different approaches used show compromised and balanced results between memory usage and computation time.


2020 ◽  
Vol 8 (2) ◽  
pp. 23-39
Author(s):  
Hadi Khalilia ◽  
Thaer Sammar ◽  
Yazeed Sleet

Data mining is an important field; it has been widely used in different domains. One of the fields that make use of data mining is Educational Data Mining. In this study, we apply machine learning models on data obtained from Palestine Technical University-Kadoorie (PTUK) in Tulkarm for students in the department of computer engineering and applied computing. Students in both fields study the same major courses; C++ and Java. Therefore, we focused on these courses to predict student’s performance. The goal of our study is predicting students’ performance measured by (GPA) in the major. There are many techniques that are used in the educational data mining field. We applied three models on the obtained data which have been commonly used in the educational data mining field; the decision tree with information gain measure, the decision tree with Gini index measure, and the naive Bayes model. We used these models in our work because they are efficient and they have a high speed in data classification, and prediction. The results suggest that the decision tree with information gain measure outperforms other models with 0.66 accuracy. We had a deeper look on key features that we train our models; precisely, their branch of study at school, field of study in the University, and whether or not the students have a scholarship. These features have an influence on the prediction. For example, the accuracy of the decision tree with information gain measure increases to 0.71 when applied on the subset of students who studied in the scientific branch at high school. This study is important for both the students and the higher management of PTUK. The university will be able to do some predictions on the performance of the students. In the carried experiments, the prediction of the model was inline with the actual expectation.


2012 ◽  
Vol 532-533 ◽  
pp. 1685-1690 ◽  
Author(s):  
Zhi Kang Luo ◽  
Huai Ying Sun ◽  
De Wang

This paper presents an improved SPRINT algorithm. The original SPRINT algorithm is a scalable and parallelizable decision tree algorithm, which is a popular algorithm in data mining and machine learning communities. To improve the algorithm's efficiency, we propose an improved algorithm. Firstly, we select the splitting attributes and obtain the best splitting attribute from them by computing the information gain ratio of each attribute. After that, we calculate the best splitting point of the best splitting attribute. Since it avoids a lot of calculations of other attributes, the improved algorithm can effectively reduce the computation.


2007 ◽  
Vol 4 (1) ◽  
pp. 183-197 ◽  
Author(s):  
Efstathios Kirkos ◽  
Charalambos Spathis ◽  
Alexandros Nanopoulos ◽  
Yannis Manolopoulos

Data Mining methods can be used in order to facilitate auditors to issue their opinions. Numerous of these methods have not yet been tested on the purpose of discriminating cases of qualified opinions. In this study, we employ three Data Mining classification techniques to develop models capable of identifying qualified auditors' reports. The techniques used are C4.5 Decision Tree, Multilayer Perceptron Neural Network, and Bayesian Belief Network. The sample contains 450 publicly listed, nonfinancial U.K. and Irish firms. The input vector is composed of one qualitative and several quantitative variables. The three developed models are compared in terms of their performance. Additionally, variables that are associated with qualified reports and can be used as indicators are also revealed. The results of this study can be useful to internal and external auditors and company decision-makers.


Sign in / Sign up

Export Citation Format

Share Document