scholarly journals Classification of Hadith Topic of Indonesian Translation Using K-Nearest Neighbor and Chi-Square

Author(s):  
Ghinaa Zain Nabiilah ◽  
Said Al Faraby ◽  
Mahendra Dwifebri Purbolaksono

Hadith is the main way of life for Muslims besides the Qur'an whose can be applied in everyday life. Hadith also contains all the words or deeds of the Prophet Muhammad which are used as a source of the law of Islam. Therefore, many readers, especially Muslims, are interested in studying hadith. However, the large number of hadiths makes it difficult for readers or those who are still unfamiliar with Islam to read them. Therefore, we conducted a study to classify hadith textually based on the type of teaching, so that readers can get an overview or other reference in reading and searching for hadith based on the type of teaching more easily. This study uses KNN and chi-square methods as feature selection. We also carried out several test scenarios, including implementing stopword removal modifications in preprocessing and experimenting with selecting k values ​​for KNN to determine the best performance. The best performance was obtained by using the value of k = 7 on KNN without implementing chi-square and with stopword removal modification with a hammer loss value of 0.1042 or about 89.58% of the data correctly classified.

2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


2018 ◽  
Vol 150 ◽  
pp. 06006 ◽  
Author(s):  
Rozlini Mohamed ◽  
Munirah Mohd Yusof ◽  
Noorhaniza Wahidi

Feature selection is a process to select the best feature among huge number of features in dataset, However, the problem in feature selection is to select a subset that give the better performs under some classifier. In producing better classification result, feature selection been applied in many of the classification works as part of preprocessing step; where only a subset of feature been used rather than the whole features from a particular dataset. This procedure not only can reduce the irrelevant features but in some cases able to increase classification performance due to finite sample size. In this study, Chi-Square (CH), Information Gain (IG) and Bat Algorithm (BA) are used to obtain the subset features on fourteen well-known dataset from various applications. To measure the performance of these selected features three benchmark classifier are used; k-Nearest Neighbor (kNN), Naïve Bayes (NB) and Decision Tree (DT). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure and ROC. The objective of these study is to analyse the outperform feature selection techniques among conventional and heuristic techniques in various applications.


2015 ◽  
Vol 54 ◽  
pp. 301-310 ◽  
Author(s):  
Mukesh Kumar ◽  
Nitish Kumar Rath ◽  
Amitav Swain ◽  
Santanu Kumar Rath

2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Antonio García-Dominguez ◽  
Carlos E. Galván-Tejada ◽  
Laura A. Zanella-Calzada ◽  
Hamurabi Gamboa-Rosales ◽  
Jorge I. Galván-Tejada ◽  
...  

In the area of recognition and classification of children activities, numerous works have been proposed that make use of different data sources. In most of them, sensors embedded in children’s garments are used. In this work, the use of environmental sound data is proposed to generate a recognition and classification of children activities model through automatic learning techniques, optimized for application on mobile devices. Initially, the use of a genetic algorithm for a feature selection is presented, reducing the original size of the dataset used, an important aspect when working with the limited resources of a mobile device. For the evaluation of this process, five different classification methods are applied, k-nearest neighbor (k-NN), nearest centroid (NC), artificial neural networks (ANNs), random forest (RF), and recursive partitioning trees (Rpart). Finally, a comparison of the models obtained, based on the accuracy, is performed, in order to identify the classification method that presents the best performance in the development of a model that allows the identification of children activity based on audio signals. According to the results, the best performance is presented by the five-feature model developed through RF, obtaining an accuracy of 0.92, which allows to conclude that it is possible to automatically classify children activity based on a reduced set of features with significant accuracy.


Author(s):  
Emad Mohamed Mashhour ◽  
Enas M. F. El Houby ◽  
Khaled Tawfik Wassif ◽  
Akram I. Salah

Dimensionality problem is a well-known challenging issue for most classifiers in which datasets have unbalanced number of samples and features. Features may contain unreliable data which may lead the classification process to produce undesirable results. Feature selection approach is considered a solution for this kind of problems. In this paperan enhanced firefly algorithm is proposed to serve as a feature selection solution for reducing dimensionality and picking the most informative features to be used in classification. The main purpose of the proposedmodel is to improve the classification accuracy through using the selected features produced from the model, thus classification errors will decrease. Modeling firefly in this research appears through simulating firefly position by cell chi-square value which is changed after every move, and simulating firefly intensity by calculating a set of different fitness functionsas a weight for each feature. K-nearest neighbor and Discriminant analysis are used as classifiers to test the proposed firefly algorithm in selecting features. Experimental results showed that the proposed enhanced algorithmbased on firefly algorithm with chi-square and different fitness functions can provide better results than others. Results showed that reduction of dataset is useful for gaining higher accuracy in classification.


Author(s):  
Kaharuddin Kaharuddin ◽  
Eka Wahyu Sholeha

Abstract— Classification is a technique that many of us encounter in everyday life, classification science is also growing and being applied to various types of data and cases in everyday life, in computer science classification has been developed to facilitate human work, one example of its application is to classify fish species in the world, the number of fish species in the world is very much so that there are still many people who are sometimes confused to distinguish them, therefore in this study a study will be conducted to classify fish species using the K-Nearest Neighbor Method. 4 types of fish, all data totaling 160 data. The purpose of this study was to test the K-Nearest Neighbor method for classifying fish species based on color, texture, and shape features. Based on the test results, the accuracy value of the truth is obtained using the value of K = 7 with a percentage of the truth of 77.50%, the second-highest accuracy value is the value of K = 10, namely 76.88%. Based on the results of this study, it can be concluded that the K-Nearest Neighbor method has a good enough ability to classify, but it can be done by adding variables or adding more amount of data, and using other types of fish.


Author(s):  
Ghada Rawashdeh ◽  
Rabiei Mamat ◽  
Zuriana Binti Abu Bakar ◽  
Noor Hafhizah Abd Rahim

<span lang="EN-US">Spam mail has become a rising phenomenon in a world that has recently witnessed high growth in the volume of emails. This indicates the need to develop an effective spam filter. At the present time, Classification algorithms for text mining are used for the classification of emails. This paper provides a description and evaluation of the effectiveness of three popular classifiers using optimization feature selections, such as Genetic algorithm, Harmony search, practical swarm optimization, and simulating annealing. The research focuses on a comparison of the effect of classifiers using K-nearest Neighbor (KNN), Naïve Bayesian (NB), and Support Vector Machine (SVM) on spam classifiers (without using feature selection) also enhances the reliability of feature selection by proposing optimization feature selection to reduce number of features that are not important.</span>


Author(s):  
Jukka Heikkonen ◽  
Aristide Varfis

This paper proposes a method for remote sensing based land cover/land use classification of urban areas. The method consists of the following four main stages: feature extraction, feature coding, feature selection and classification. In the feature extraction stage, statistical, textural and Gabor features are computed within local image windows of different sizes and orientations to provide a wide variety of potential features for the classification. Then the features are encoded and normalized by means of the Self-Organizing Map algorithm. For feature selection a CART (Classification and Regression Trees) based algorithm was developed to select a subset of features for each class within the classification scheme at hand. The selected subset of features is not attached to any specific classifier. Any classifier capable of representing possible skewed and multi-modal feature distributions can be employed, such as multi-layer perceptron (MLP) or k-nearest neighbor (k-NN). The paper reports experiments in land cover/land use classification with the Landsat TM and ERS-1 SAR images gathered over the city of Lisbon to show the potentials of the proposed method.


2020 ◽  
Vol 9 (1) ◽  
pp. 1560-1568

Feature selection is a method of dimension reduction that is used to select a specific subset of appropriate features from the original features by removing unnecessary and redundant features that do not have a benefit in classification or prediction. In this paper, the feature selection approach was conducted using three feature selection methods namely: Filter based, Wrapper based and Embedded based to predict household food insecurity from the household income, consumption, and expenditure survey data (HICE). To implement the above feature selection methods, we proposed new hybrid method by integrating the filter based feature selection methods which is Feature importance, Univariate (chi-square) and Correlation coefficient. To validate the efficiency of the proposed feature selection methods, we used five classification algorithms namely: K-Nearest Neighbor (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB).


Sign in / Sign up

Export Citation Format

Share Document