Nearest Neighbour-Based Fuzzy-Rough Feature Selection

Purpose The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets. Design/methodology/approach It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN). Findings The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively. Originality/value A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.

Download Full-text

OPTIMIZATION OF K-NEAREST NEIGHBOUR TO CATEGORIZE INDONESIAN'S NEWS ARTICLES

Asia-Pacific Journal of Information Technology and Multimedia ◽

10.17576/apjitm-2021-1001-04 ◽

2021 ◽

Vol 10 (01) ◽

pp. 43-51

Author(s):

Afdhalul Ihsan ◽

Ednawati Rainarli

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Selection Process ◽

Performance Testing ◽

Parameter Tuning ◽

Pso Algorithm ◽

Optimal Number ◽

Nearest Neighbour ◽

Chi Square

Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the feature selection process. There are several filter-based feature selection methods; some are Chi-Square, Information Gain, Genetic Algorithm, and Particle Swarm Optimization (PSO). Aghdam's research shows that PSO is the best among those methods. This study examined PSO to optimize the k-Nearest Neighbour (k-NN) algorithm's performance in categorizing news articles. k-NN is an algorithm that is simple and easy to implement. If we use the appropriate features, then the k-NN will be a reliable algorithm. PSO algorithm is used to select keywords (term features), and it is continued with classifying the documents using k-NN. The testing process consists of three stages. The stages are tuning the parameter of k-NN, the parameter of PSO, and measuring the testing performance. The parameter tuning process aims to determine the number of neighbours used in k-NN and optimize the PSO particles. Otherwise, the performance testing compares the performance of k-NN with and without using PSO. The optimal number of neighbours is 9, with the number of particles is 50. The testing showed that using the k-NN with PSO and a 50% reduction in terms. The results 20 per cent better accuracy than k-NN without PSO. Although the PSO's process did not always find the optimal conditions, the k-NN method can produce better accuracy. In this way, the k-NN method can work better in grouping news articles, especially in Indonesian language news articles

Download Full-text

Deep Learning Classifier for Gene Expression Datasets using a Hybrid LSTM Network

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1562.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1081-1089

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Bayesian Networks ◽

Selection Process ◽

Hybrid Approach ◽

Learning System ◽

Nearest Neighbour ◽

Chi Square ◽

Correlation Degree ◽

Chi Square Test

A deep learning system Long Short-term memory (LSTM) is incorporated for the classification of differentially expressed genes which causes certain abnormalities in the human body. The LSTM is employed along with the K-Nearest Neighbour (KNN) algorithm so as to achieve the classification to its precision. The feature selection process plays a vital as some of the existing algorithms tend to neglect the features of concern. The classification further leads to enhanced prediction method. The K-Nearest Neighbour method is used to filter the correlation degree between each value with target value. This hybrid algorithm has a clear leverage over the existing methods. This work is well supported by the Feature Selection which includes a hybrid of Principal Component Analysis and the CHI square test. This hybrid approach provides with a good feature selection which aides in the seamless flow of the process towards classification and prediction. The Eigen values and the Eigen vectors are computed which effectively leads to the identification of Principal components. The Chi Square test is implemented for calculating the scores. The features that are obtained are ranked by these scores and the datasets which has the highest scores are further taken for training. The algorithms employed in this work has a clear advantage over the Bayesian networks as the Bayesian networks are prone to errors within the layers which may cause the values to explode or vanish. The accuracy of the classification and the prediction process achieved is unsurpassed when compared to the existing methods.

Download Full-text

Feature Selection for Heterogeneous Ensembles of Nearest-neighbour Classifiers Using Hybrid Tabu Search

Natural Computing Series - Advances in Metaheuristics for Hard Optimization ◽

10.1007/978-3-540-72960-0_4 ◽

2007 ◽

pp. 69-85

Author(s):

Muhammad A. Tahir ◽

James E. Smith

Keyword(s):

Feature Selection ◽

Tabu Search ◽

Nearest Neighbour ◽

Selection For

Download Full-text

Boosting the Performance of Nearest Neighbour Methods with Feature Selection

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/3-540-45357-1_25 ◽

2001 ◽

pp. 210-221 ◽

Cited By ~ 1

Author(s):

Shlomo Geva

Keyword(s):

Feature Selection ◽

Nearest Neighbour

Download Full-text

Efficient Feature Selection and Nearest Neighbour Search for Hyperspectral Image Classification

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) ◽

10.1109/dicta.2016.7797035 ◽

2016 ◽

Cited By ~ 2

Author(s):

Alan Woodley ◽

Timothy Chappell ◽

Shlomo Geva ◽

Richi Nayak

Keyword(s):

Feature Selection ◽

Image Classification ◽

Hyperspectral Image ◽

Nearest Neighbour ◽

Hyperspectral Image Classification

Download Full-text

A genetic based wrapper feature selection approach using Nearest Neighbour Distance Matrix

2011 3rd Conference on Data Mining and Optimization (DMO) ◽

10.1109/dmo.2011.5976534 ◽

2011 ◽

Cited By ~ 5

Author(s):

Mohd Shamrie Sainin ◽

Rayner Alfred

Keyword(s):

Feature Selection ◽

Distance Matrix ◽

Nearest Neighbour ◽

Neighbour Distance ◽

Selection Approach ◽

Feature Selection Approach ◽

Wrapper Feature Selection

Download Full-text

Optimal feature selection for classification of power quality disturbances using wavelet packet-based fuzzy k-nearest neighbour algorithm

IET Generation Transmission & Distribution ◽

10.1049/iet-gtd:20080190 ◽

2009 ◽

Vol 3 (3) ◽

pp. 296-306 ◽

Cited By ~ 115

Author(s):

B.K. Panigrahi ◽

V.R. Pandi

Keyword(s):

Feature Selection ◽

Power Quality ◽

Wavelet Packet ◽

Nearest Neighbour ◽

Optimal Feature Selection ◽

Power Quality Disturbances ◽

Selection For ◽

Optimal Feature

Download Full-text