APPLICATION OF HYBRID GA-PSO TO IMPROVE THE PERFORMANCE OF DECISION TREE C5.0

Data mining is a data extraction process with large dimensions and information with the aim of obtaining information as knowledge to make decisions. Problems in the data mining process often occur in high-dimensional data processing. The solution to handling problems in high-dimensional data is to apply the hybrid genetic algorithm and particle swarm optimization (HGAPSO) method to improve the performance of the C5.0 decision tree classification model to make decisions quickly, precisely and accurately on classification data. In this study, there were 3 datasets sourced from the University of California, Irvine (UCI) machine learning repositories, namely lymphography, vehicle, and wine. The HGAPSO algorithm combined with the C5.0 decision tree testing method has the optimal accuracy for processing highdimensional data. The lymphography and vehicle data obtained an accuracy of 83.78% and 71.54%. The wine dataset has an accuracy of 0.56% lower than the conventional method because the data dimensions are smaller than the lymphography and vehicle dataset.

Download Full-text

A Novel Privacy Preserving Data mining using improved decision tree and KP-ABE on High Dimensional Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.10874 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 515

Author(s):

Aaluri Seenu ◽

M Kameswara Rao

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Privacy ◽

Privacy Preserving ◽

Classification Model ◽

Distributed Data Mining ◽

High Dimensional ◽

Distributed Data ◽

Privacy Preserving Data Mining ◽

Tree Classifier

In distributed data mining environment maintaining individual data or patterns is a major issue due to high dimensionality and data size. Distributed Data mining framework can help to find the essential decision making patterns from distributed data. Privacy preserving data mining (PPDM) has emerged as a main research area for data confidentiality and knowledge sharing in between the communicating parties. As the distributed data of the individuals are stored by the third party, it leads to the misuse of distributed information in digital networks. Most of the decision patterns generated using the machine learning models for business organizations, industries and individuals has to be encoded before it is publicly shared or published. As the amount of data collected from different sources are increasing exponentially, the time taken to preserve the patterns using the traditional privacy preserving data mining models also increasing due to high computational attribute selection measures and noise in the distributed data. Also, filling sparse values using the conventional models are inefficient and infeasible for privacy preserving models. In this paper, a novel privacy preserving based classification model was designed and implemented on large datasets. In this model, a filter-based privacy preserving model using improved decision tree classifier is implemented to preserve the decision patterns using IPPDM-KPABE model. Experimental results proved that the proposed model has high computational efficiency compared to the traditional privacy preserving model on high dimensional datasets.

Download Full-text

Opening the Black Box of Feature Extraction: Incorporating Visualization into High-Dimensional Data Mining Processes

Sixth International Conference on Data Mining (ICDM'06) ◽

10.1109/icdm.2006.121 ◽

2006 ◽

Cited By ~ 4

Author(s):

Jianting Zhang ◽

Le Gruenwald

Keyword(s):

Data Mining ◽

Feature Extraction ◽

High Dimensional Data ◽

Black Box ◽

High Dimensional

Download Full-text

An Analysis of Factors Influencing Foreign Language Self-Efficacy Based on C5.0 Decision Tree Algorithm in Data Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.538.460 ◽

2014 ◽

Vol 538 ◽

pp. 460-464

Author(s):

Xue Li

Keyword(s):

Data Mining ◽

Decision Tree ◽

Foreign Language ◽

Language Learning ◽

Learning Strategies ◽

Self Efficacy ◽

Foreign Language Learning ◽

Decision Tree Algorithm ◽

Tree Algorithm ◽

C5.0 Decision Tree

Based on inter-correlation and permeability among disciplines, the author makes an attempt to apply the information science to cognitive linguistics to provide a new perspective for the study of foreign languages. The correlation between self-efficacy and such four factors as anxiety, learning strategies, motivation and learners’ past achievement is analyzed by means of data mining and the extent to which the above factors affect self-efficacy in language learning is explored in this paper. The paper employs the decision tree algorithm in SPSS Clementine. C5.0 decision tree algorithm is adopted to analyze data in the study. The results are elicited from the researches carried out in this paper. The increased anxiety is bound to weaken learners’ motivation over time. It is obvious that learners have low self-efficacy. It is very important to employ strategies in foreign language learning. Ignorance of using learning strategies may result in unplanned learning with unsatisfactory achievements in spite of more efforts involved. Self-efficacy in foreign language learning may be weakened accordingly. Learners’ past achievement is a reference dimension in measuring self-efficacy with weaker influence.

Download Full-text

Data Mining for ERW Welded Tube Scheduling Rules Based on Decision Tree

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.1804 ◽

2011 ◽

Vol 403-408 ◽

pp. 1804-1807

Author(s):

Ning Zhao ◽

Shao Hua Dong ◽

Qing Tian

Keyword(s):

Data Mining ◽

Decision Tree ◽

Arc Welding ◽

Data Cleaning ◽

Data Extraction ◽

Electric Arc ◽

Welded Tube ◽

Electric Arc Welding ◽

Decision Tree Method ◽

Tree Method

In order to optimize electric- arc welding （ERW） welded tube scheduling , the paper introduces data cleaning, data extraction and transformation in detail and defines the datasets of sample attribute, which is based on analysis of production process of ERW welded tube. Furthermore, Decision-Tree method is adopted to achieve data mining and summarize scheduling rules which are validated by an example.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text