Nearest Neighbour-Based Fuzzy-Rough Feature Selection

Author(s):  
Richard Jensen ◽  
Neil Mac Parthaláin
2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Renuka Devi D. ◽  
Sasikala S.

Purpose The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets. Design/methodology/approach It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN). Findings The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively. Originality/value A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.


Author(s):  
Afdhalul Ihsan ◽  
Ednawati Rainarli

Text classification is the process of grouping documents based on similarity in categories. Some of the obstacles in doing text classification are many words appeared in the text, and some words come up with infrequent frequency (sparse words). The way to solve this problem is to conduct the feature selection process. There are several filter-based feature selection methods; some are Chi-Square, Information Gain, Genetic Algorithm, and Particle Swarm Optimization (PSO). Aghdam's research shows that PSO is the best among those methods. This study examined PSO to optimize the k-Nearest Neighbour (k-NN) algorithm's performance in categorizing news articles. k-NN is an algorithm that is simple and easy to implement. If we use the appropriate features, then the k-NN will be a reliable algorithm. PSO algorithm is used to select keywords (term features), and it is continued with classifying the documents using k-NN. The testing process consists of three stages. The stages are tuning the parameter of k-NN, the parameter of PSO, and measuring the testing performance. The parameter tuning process aims to determine the number of neighbours used in k-NN and optimize the PSO particles. Otherwise, the performance testing compares the performance of k-NN with and without using PSO. The optimal number of neighbours is 9, with the number of particles is 50. The testing showed that using the k-NN with PSO and a 50% reduction in terms. The results 20 per cent better accuracy than k-NN without PSO. Although the PSO's process did not always find the optimal conditions, the k-NN method can produce better accuracy. In this way, the k-NN method can work better in grouping news articles, especially in Indonesian language news articles


A deep learning system Long Short-term memory (LSTM) is incorporated for the classification of differentially expressed genes which causes certain abnormalities in the human body. The LSTM is employed along with the K-Nearest Neighbour (KNN) algorithm so as to achieve the classification to its precision. The feature selection process plays a vital as some of the existing algorithms tend to neglect the features of concern. The classification further leads to enhanced prediction method. The K-Nearest Neighbour method is used to filter the correlation degree between each value with target value. This hybrid algorithm has a clear leverage over the existing methods. This work is well supported by the Feature Selection which includes a hybrid of Principal Component Analysis and the CHI square test. This hybrid approach provides with a good feature selection which aides in the seamless flow of the process towards classification and prediction. The Eigen values and the Eigen vectors are computed which effectively leads to the identification of Principal components. The Chi Square test is implemented for calculating the scores. The features that are obtained are ranked by these scores and the datasets which has the highest scores are further taken for training. The algorithms employed in this work has a clear advantage over the Bayesian networks as the Bayesian networks are prone to errors within the layers which may cause the values to explode or vanish. The accuracy of the classification and the prediction process achieved is unsurpassed when compared to the existing methods.


Sign in / Sign up

Export Citation Format

Share Document