Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor

Hadith is the second source of law for Muslims after the Qur'an which comes from various forms of the words, actions and stipulations of the Prophet Muhammad or referred to as his sunnah. In order to make it easier for Muslims to apply the teachings of the hadiths, a classification system is needed that can categorize a hadith into a class or a combination of two of the three classes which called a multi-label classification. In building a text classification system, there are various classification techniques, one of which is k-Nearest Neighbor (KNN). KNN is a simple and effective classification method for text classification, but has a weakness in processing data with high vector dimensions so that the computation time is higher and the efficiency of text classification is very low. Mutual Information (MI) is used as a feature selection method to reduce vector dimensions because it has the ability to show how strong a feature is in making a correct prediction of a class. In this study Problem Transformation Method with the Binary Relevance (BR) approach is used so that the multi label classification process can be accomplished. The optimum results obtained in this study shows the value of hamming loss is 0.0886 or about 91.14% of data were correctly classified and computational time for 595 seconds by using MI as a feature selection, but without stemming.

Download Full-text

Unsupervised feature selection with least-squares quadratic mutual information

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i3.pp1619-1628 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1619

Author(s):

Janya Sainui ◽

Chouvanee Srivisal

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Nearest Neighbor ◽

Feature Selection Method ◽

K Nearest Neighbor ◽

Underlying Assumption ◽

Neighbor Graph ◽

High Dependency ◽

Benchmark Datasets ◽

Nearest Neighbor Graph

We propose the feature selection method based on the dependency between features in an unsupervised manner. The underlying assumption is that the most important feature should provide high dependency between itself and the rest of the features. Therefore, the top m features with maximum dependency scores should be selected, but the redundant features should be ignored. To deal with this problem, the objective function that is applied to evaluate the dependency between features plays a crucial role. However, previous methods mainly used the mutual information (MI), where the MI estimator based on the k-nearest neighbor graph, resulting in its estimation dependent on the selection of parameter, k, without a systematic way to select it. This implies that the MI estimator tends to be less reliable. Here, we introduce the leastsquares quadratic mutual information (LSQMI) that is more sensible because its tuning parameters can be selected by cross-validation. We show through the experiments that the use of LSQMI performed better than that of MI. In addition, we compared the proposed method to the three counterpart methods using six UCI benchmark datasets. The results demonstrated that the proposed method is useful for selecting the informative features as well as discarding the redundant ones.

Download Full-text

A CATEGORY CLASSIFICATION ALGORITHM FOR INDONESIAN AND MALAY NEWS DOCUMENTS

Jurnal Teknologi ◽

10.11113/jt.v78.9549 ◽

2016 ◽

Vol 78 (8-2) ◽

Cited By ~ 1

Author(s):

Jafreezal Jaafar ◽

Zul Indra ◽

Nurshuhaini Zamin

Keyword(s):

Feature Selection ◽

Text Classification ◽

Feature Selection Method ◽

Selection Method ◽

Online News ◽

Language Identification ◽

Computational Time ◽

Accuracy Rate ◽

Similar Morphology ◽

Manual Classification

Text classification (TC) provides a better way to organize information since it allows better understanding and interpretation of the content. It deals with the assignment of labels into a group of similar textual document. However, TC research for Asian language documents is relatively limited compared to English documents and even lesser particularly for news articles. Apart from that, TC research to classify textual documents in similar morphology such Indonesian and Malay is still scarce. Hence, the aim of this study is to develop an integrated generic TC algorithm which is able to identify the language and then classify the category for identified news documents. Furthermore, top-n feature selection method is utilized to improve TC performance and to overcome the online news corpora classification challenges: rapid data growth of online news documents, and the high computational time. Experiments were conducted using 280 Indonesian and 280 Malay online news documents from the year 2014 – 2015. The classification method is proven to produce a good result with accuracy rate of up to 95.63% for language identification, and 97.5%% for category classification. While the category classifier works optimally on n = 60%, with an average of 35 seconds computational time. This highlights that the integrated generic TC has advantage over manual classification, and is suitable for Indonesian and Malay news classification.

Download Full-text

Feature Selection Method for Hydraulic System Faults Diagnosis Based on GA-PLS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.1130 ◽

2010 ◽

Vol 44-47 ◽

pp. 1130-1134

Author(s):

Sheng Li ◽

Pei Lin Zhang ◽

Bing Li

Keyword(s):

Feature Selection ◽

Hydraulic System ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Classification Model ◽

K Nearest Neighbor ◽

K Nearest Neighbor Algorithm ◽

Faults Diagnosis

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

A mutual information and information entropy pair based feature selection method in text classification

2010 International Conference on Computer Application and System Modeling (ICCASM 2010) ◽

10.1109/iccasm.2010.5620805 ◽

2010 ◽

Author(s):

Zhili Pei ◽

Yuxin Zhou ◽

Lisha Liu ◽

Lihua Wang ◽

Yinan Lu ◽

...

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Text Classification ◽

Information Entropy ◽

Feature Selection Method ◽

Selection Method ◽

Entropy Pair

Download Full-text

Text Classification Using K-Nearest Neighbor Algorithm and Firefly Algorithm for Text Feature Selection

Lecture Notes in Electrical Engineering - Advances in Electrical and Computer Technologies ◽

10.1007/978-981-15-5558-9_47 ◽

2020 ◽

pp. 527-539

Author(s):

R. Janani ◽

S. Vijayarani

Keyword(s):

Feature Selection ◽

Text Classification ◽

Firefly Algorithm ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Text Feature ◽

K Nearest Neighbor Algorithm

Download Full-text

Metode Deteksi Intrusi Menggunakan Algoritme Extreme Learning Machine dengan Correlation-based Feature Selection

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813358 ◽

2021 ◽

Vol 8 (1) ◽

pp. 103

Author(s):

Sulandri Sulandri ◽

Achmad Basuki ◽

Fitra Abdurrachman Bachtiar

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Extreme Learning Machine ◽

Computing Time ◽

Feature Selection Method ◽

Computation Time ◽

Selection Method ◽

Computational Time ◽

Correlation Based Feature Selection ◽

Learning Machine

Deteksi intrusi pada jaringan komputer merupakan kegiatan yang sangat penting dilakukan untuk menjaga keamanan data dan informasi. Deteksi intrusi merupakan proses monitor traffic pada sebuah jaringan untuk mendeteksi adanya pola data yang dianggap mencurigakan, yang memungkinkan terjadinya serangan jaringan. Penelitian ini melakukan analisis pada traffic jaringan untuk mengetahui apakah paket tersebut mengandung intrusi atau merupakan paket normal. Data traffic yang digunakan untuk deteksi intrusi pada penelitian ini diambil dari dataset KDD Cup. Metode yang digunakan untuk melakukan deteksi intrusi dengan cara klasifikasi yaitu dengan menggunakan metode Extreme Learning Machine (ELM). Namun, dengan menggunakan metode ELM saja tidak mampu untuk menghasilkan akurasi yang baik maka, pada metode ELM perlu ditambahkan metode seleksi fitur Correlation-Based Feature Selection (CFS) untuk meningkatkan hasil akurasi dan waktu komputasi. Hasil penelitian yang dilakukan dengan menggunakan metode ELM menunjukkan tingkat akurasi mencapai 81,97% dengan waktu komputasi 3,39 detik. Setelah ditambahkan metode seleksi fitur CFS pada ELM tingkat akurasi meningkat secara signifikan menjadi 98,00% dengan waktu komputasi 2,32 detik. AbstractIntrusion detection of computer networks is a very important activity carried out to maintain data and information security. Intrusion detection is the process of monitoring traffic on a network to detect any data patterns that are considered suspicious, which allows network attacks. This research analyzes the network traffic to find out whether the packet contains intrusion or is a normal packet. Traffic data used for intrusion detection in this study were taken from the KDD Cup dataset. The method used to do intrusion detection by classification is using the Extreme Learning Machine (ELM) method. However, using the ELM method alone is not able to produce good accuracy, so the ELM method needs to be added to the Correlation-Based Feature Selection (CFS) feature selection method to improve the accuracy and computational time. The results of the research conducted using the ELM method showed an accuracy rate of 81.97% with a computation time of 3.39 seconds. After adding the CFS feature selection method to ELM the accuracy level increased significantly to 98.00% with a computing time of 2.32 seconds.

Download Full-text

The Effect of Feature Selection on Music Genre Classification

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v09.i04.p13 ◽

2021 ◽

Vol 9 (4) ◽

pp. 549

Author(s):

I Nyoman Yusha Tresnatama Giri ◽

Luh Arida Ayu Rahning Putri

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Pearson Correlation ◽

Feature Selection Method ◽

K Nearest Neighbor ◽

Genre Classification ◽

Average Accuracy ◽

Independent Variable ◽

Music Genre Classification ◽

Selection Of

One of the things that affects classification results is the correlation of features to the class of a data. This research was conducted to determine the effect of the reduction of features (independent variable) that have the weakest correlation or have a distant relationship with the class (dependent variable). Bivariate Pearson Correlation is used as a feature selection method and K-Nearest Neighbor is used as a classification method. Results of the test showing that, 75.1% average accuracy was obtained for classification without feature selection, while using feature selection, average accuracy was obtained in the range of 75% - 79.3%. The average accuracy obtained by the selection of features tends to be higher compared to the accuracy obtained without selection of features.

Download Full-text