Classification of Hadith Topic of Indonesian Translation Using K-Nearest Neighbor and Chi-Square

Ghinaa Zain Nabiilah; Said Al Faraby; Mahendra Dwifebri Purbolaksono

doi:10.21108/ijoict.v7i2.573

Classification of Hadith Topic of Indonesian Translation Using K-Nearest Neighbor and Chi-Square

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.v7i2.573 ◽

2021 ◽

Vol 7 (2) ◽

pp. 11-22

Author(s):

Ghinaa Zain Nabiilah ◽

Said Al Faraby ◽

Mahendra Dwifebri Purbolaksono

Keyword(s):

Feature Selection ◽

Everyday Life ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Way Of Life ◽

Chi Square ◽

The Law ◽

Test Scenarios

Hadith is the main way of life for Muslims besides the Qur'an whose can be applied in everyday life. Hadith also contains all the words or deeds of the Prophet Muhammad which are used as a source of the law of Islam. Therefore, many readers, especially Muslims, are interested in studying hadith. However, the large number of hadiths makes it difficult for readers or those who are still unfamiliar with Islam to read them. Therefore, we conducted a study to classify hadith textually based on the type of teaching, so that readers can get an overview or other reference in reading and searching for hadith based on the type of teaching more easily. This study uses KNN and chi-square methods as feature selection. We also carried out several test scenarios, including implementing stopword removal modifications in preprocessing and experimenting with selecting k values for KNN to determine the best performance. The best performance was obtained by using the value of k = 7 on KNN without implementing chi-square and with stopword removal modification with a hammer loss value of 0.1042 or about 89.58% of the data correctly classified.

Download Full-text

An Ensemble-Based Feature Selection and Classification of Gene Expression using Support Vector Machine, K-Nearest Neighbor, Decision Tree

2019 International Conference on Communication and Electronics Systems (ICCES) ◽

10.1109/icces45898.2019.9002041 ◽

2019 ◽

Author(s):

Anu J Nair ◽

Rizwana Rasheed ◽

KM Maheeshma ◽

LS Aiswarya ◽

K R Kavitha

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Product Review Based Customer Sentiment Analysis using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2022010107 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Selection Technique ◽

Feature Selection Problem

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.

Download Full-text

A Comparative Study of Feature Selection Techniques for Bat Algorithm in Various Applications

MATEC Web of Conferences ◽

10.1051/matecconf/201815006006 ◽

2018 ◽

Vol 150 ◽

pp. 06006 ◽

Cited By ~ 2

Author(s):

Rozlini Mohamed ◽

Munirah Mohd Yusof ◽

Noorhaniza Wahidi

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Bat Algorithm ◽

Classification Performance ◽

Finite Sample ◽

K Nearest Neighbor ◽

Chi Square ◽

Finite Sample Size ◽

Feature Selection Techniques

Feature selection is a process to select the best feature among huge number of features in dataset, However, the problem in feature selection is to select a subset that give the better performs under some classifier. In producing better classification result, feature selection been applied in many of the classification works as part of preprocessing step; where only a subset of feature been used rather than the whole features from a particular dataset. This procedure not only can reduce the irrelevant features but in some cases able to increase classification performance due to finite sample size. In this study, Chi-Square (CH), Information Gain (IG) and Bat Algorithm (BA) are used to obtain the subset features on fourteen well-known dataset from various applications. To measure the performance of these selected features three benchmark classifier are used; k-Nearest Neighbor (kNN), Naïve Bayes (NB) and Decision Tree (DT). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure and ROC. The objective of these study is to analyse the outperform feature selection techniques among conventional and heuristic techniques in various applications.

Download Full-text

Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor

Procedia Computer Science ◽

10.1016/j.procs.2015.06.035 ◽

2015 ◽

Vol 54 ◽

pp. 301-310 ◽

Cited By ~ 16

Author(s):

Mukesh Kumar ◽

Nitish Kumar Rath ◽

Amitav Swain ◽

Santanu Kumar Rath

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Nearest Neighbor ◽

K Nearest Neighbor

Download Full-text

Feature Selection Using Genetic Algorithms for the Generation of a Recognition and Classification of Children Activities Model Using Environmental Sound

Mobile Information Systems ◽

10.1155/2020/8617430 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Antonio García-Dominguez ◽

Carlos E. Galván-Tejada ◽

Laura A. Zanella-Calzada ◽

Hamurabi Gamboa-Rosales ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Recursive Partitioning ◽

Feature Model ◽

K Nearest Neighbor ◽

Audio Signals ◽

Environmental Sound ◽

Learning Techniques ◽

Original Size

In the area of recognition and classification of children activities, numerous works have been proposed that make use of different data sources. In most of them, sensors embedded in children’s garments are used. In this work, the use of environmental sound data is proposed to generate a recognition and classification of children activities model through automatic learning techniques, optimized for application on mobile devices. Initially, the use of a genetic algorithm for a feature selection is presented, reducing the original size of the dataset used, an important aspect when working with the limited resources of a mobile device. For the evaluation of this process, five different classification methods are applied, k-nearest neighbor (k-NN), nearest centroid (NC), artificial neural networks (ANNs), random forest (RF), and recursive partitioning trees (Rpart). Finally, a comparison of the models obtained, based on the accuracy, is performed, in order to identify the classification method that presents the best performance in the development of a model that allows the identification of children activity based on audio signals. According to the results, the best performance is presented by the five-feature model developed through RF, obtaining an accuracy of 0.92, which allows to conclude that it is possible to automatically classify children activity based on a reduced set of features with significant accuracy.

Download Full-text

Feature Selection Approach based on Firefly Algorithm and Chi-square

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i4.pp2338-2350 ◽

2018 ◽

Vol 8 (4) ◽

pp. 2338 ◽

Cited By ~ 1

Author(s):

Emad Mohamed Mashhour ◽

Enas M. F. El Houby ◽

Khaled Tawfik Wassif ◽

Akram I. Salah

Keyword(s):

Feature Selection ◽

Discriminant Analysis ◽

Classification Accuracy ◽

Firefly Algorithm ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Chi Square ◽

Fitness Functions ◽

Selection Approach ◽

Feature Selection Approach

Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. K-nearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chi-square and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.

Download Full-text

Classification of Fish Species with Image Data Using K-Nearest Neighbor

International Journal of Computer and Information System (IJCIS) ◽

10.29040/ijcis.v2i2.33 ◽

2021 ◽

Vol 2 (2) ◽

pp. 54-58

Author(s):

Kaharuddin Kaharuddin ◽

Eka Wahyu Sholeha

Keyword(s):

Computer Science ◽

Everyday Life ◽

Fish Species ◽

Nearest Neighbor ◽

Image Data ◽

Test Results ◽

Shape Features ◽

K Nearest Neighbor ◽

The World

Abstract— Classification is a technique that many of us encounter in everyday life, classification science is also growing and being applied to various types of data and cases in everyday life, in computer science classification has been developed to facilitate human work, one example of its application is to classify fish species in the world, the number of fish species in the world is very much so that there are still many people who are sometimes confused to distinguish them, therefore in this study a study will be conducted to classify fish species using the K-Nearest Neighbor Method. 4 types of fish, all data totaling 160 data. The purpose of this study was to test the K-Nearest Neighbor method for classifying fish species based on color, texture, and shape features. Based on the test results, the accuracy value of the truth is obtained using the value of K = 7 with a percentage of the truth of 77.50%, the second-highest accuracy value is the value of K = 10, namely 76.88%. Based on the results of this study, it can be concluded that the K-Nearest Neighbor method has a good enough ability to classify, but it can be done by adding variables or adding more amount of data, and using other types of fish.

Download Full-text

Comparative between optimization feature selection by using classifiers algorithms on spam email

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp5479-5485 ◽

2019 ◽

Vol 9 (6) ◽

pp. 5479

Author(s):

Ghada Rawashdeh ◽

Rabiei Mamat ◽

Zuriana Binti Abu Bakar ◽

Noor Hafhizah Abd Rahim

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Harmony Search ◽

High Growth ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Spam Filter ◽

Simulating Annealing

<span lang="EN-US">Spam mail has become a rising phenomenon in a world that has recently witnessed high growth in the volume of emails. This indicates the need to develop an effective spam filter. At the present time, Classification algorithms for text mining are used for the classification of emails. This paper provides a description and evaluation of the effectiveness of three popular classifiers using optimization feature selections, such as Genetic algorithm, Harmony search, practical swarm optimization, and simulating annealing. The research focuses on a comparison of the effect of classifiers using K-nearest Neighbor (KNN), Naïve Bayesian (NB), and Support Vector Machine (SVM) on spam classifiers (without using feature selection) also enhances the reliability of feature selection by proposing optimization feature selection to reduce number of features that are not important.</span>

Download Full-text

Land Cover/Land Use Classification of Urban Areas

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001498000300 ◽

1998 ◽

Vol 12 (04) ◽

pp. 475-489 ◽

Cited By ~ 12

Author(s):

Jukka Heikkonen ◽

Aristide Varfis

Keyword(s):

Land Use ◽

Feature Extraction ◽

Feature Selection ◽

Land Cover ◽

Urban Areas ◽

Nearest Neighbor ◽

Self Organizing Map ◽

Land Use Classification ◽

K Nearest Neighbor

This paper proposes a method for remote sensing based land cover/land use classification of urban areas. The method consists of the following four main stages: feature extraction, feature coding, feature selection and classification. In the feature extraction stage, statistical, textural and Gabor features are computed within local image windows of different sizes and orientations to provide a wide variety of potential features for the classification. Then the features are encoded and normalized by means of the Self-Organizing Map algorithm. For feature selection a CART (Classification and Regression Trees) based algorithm was developed to select a subset of features for each class within the classification scheme at hand. The selected subset of features is not attached to any specific classifier. Any classifier capable of representing possible skewed and multi-modal feature distributions can be employed, such as multi-layer perceptron (MLP) or k-nearest neighbor (k-NN). The paper reports experiments in land cover/land use classification with the Landsat TM and ERS-1 SAR images gathered over the city of Lisbon to show the potentials of the proposed method.

Download Full-text

Feature Selection Methods for Predicting Household Food Insecurity

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2382.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1560-1568

Keyword(s):

Feature Selection ◽

Food Insecurity ◽

Nearest Neighbor ◽

Support Vector ◽

Household Food Insecurity ◽

Selection Methods ◽

K Nearest Neighbor ◽

Chi Square ◽

Household Food ◽

Specific Subset

Feature selection is a method of dimension reduction that is used to select a specific subset of appropriate features from the original features by removing unnecessary and redundant features that do not have a benefit in classification or prediction. In this paper, the feature selection approach was conducted using three feature selection methods namely: Filter based, Wrapper based and Embedded based to predict household food insecurity from the household income, consumption, and expenditure survey data (HICE). To implement the above feature selection methods, we proposed new hybrid method by integrating the filter based feature selection methods which is Feature importance, Univariate (chi-square) and Correlation coefficient. To validate the efficiency of the proposed feature selection methods, we used five classification algorithms namely: K-Nearest Neighbor (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB).

Download Full-text