scholarly journals Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor

2020 ◽  
Vol 9 (3) ◽  
pp. 357-364
Author(s):  
Afrian Hanafi ◽  
Adiwijaya Adiwijaya ◽  
Widi Astuti

Hadith is the second source of law for Muslims after the Qur'an which comes from various forms of the words, actions and stipulations of the Prophet Muhammad or referred to as his sunnah. In order to make it easier for Muslims to apply the teachings of the hadiths, a classification system is needed that can categorize a hadith into a class or a combination of two of the three classes which called a multi-label classification. In building a text classification system, there are various classification techniques, one of which is k-Nearest Neighbor (KNN). KNN is a simple and effective classification method for text classification, but has a weakness in processing data with high vector dimensions so that the computation time is higher and the efficiency of text classification is very low. Mutual Information (MI) is used as a feature selection method to reduce vector dimensions because it has the ability to show how strong a feature is in making a correct prediction of a class. In this study Problem Transformation Method with the Binary Relevance (BR) approach is used so that the multi label classification process can be accomplished. The optimum results obtained in this study shows the value of hamming loss is 0.0886 or about 91.14% of data were correctly classified and computational time for 595 seconds by using MI as a feature selection, but without stemming.

Author(s):  
Janya Sainui ◽  
Chouvanee Srivisal

We propose the feature selection method based on the dependency between features in an unsupervised manner. The underlying assumption is that the most important feature should provide high dependency between itself and the rest of the features. Therefore, the top m features with maximum dependency scores should be selected, but the redundant features should be ignored. To deal with this problem, the objective function that is applied to evaluate the dependency between features plays a crucial role. However, previous methods mainly used the mutual information (MI), where the MI estimator based on the k-nearest neighbor graph, resulting in its estimation dependent on the selection of parameter, k, without a systematic way to select it. This implies that the MI estimator tends to be less reliable. Here, we introduce the leastsquares quadratic mutual information (LSQMI) that is more sensible because its tuning parameters can be selected by cross-validation. We show through the experiments that the use of LSQMI performed better than that of MI. In addition, we compared the proposed method to the three counterpart methods using six UCI benchmark datasets. The results demonstrated that the proposed method is useful for selecting the informative features as well as discarding the redundant ones.


2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Jafreezal Jaafar ◽  
Zul Indra ◽  
Nurshuhaini Zamin

Text classification (TC) provides a better way to organize information since it allows better understanding and interpretation of the content. It deals with the assignment of labels into a group of similar textual document. However, TC research for Asian language documents is relatively limited compared to English documents and even lesser particularly for news articles. Apart from that, TC research to classify textual documents in similar morphology such Indonesian and Malay is still scarce. Hence, the aim of this study is to develop an integrated generic TC algorithm which is able to identify the language and then classify the category for identified news documents. Furthermore, top-n feature selection method is utilized to improve TC performance and to overcome the online news corpora classification challenges: rapid data growth of online news documents, and the high computational time. Experiments were conducted using 280 Indonesian and 280 Malay online news documents from the year 2014 – 2015. The classification method is proven to produce a good result with accuracy rate of up to 95.63% for language identification, and 97.5%% for category classification. While the category classifier works optimally on n = 60%, with an average of 35 seconds computational time. This highlights that the integrated generic TC has advantage over manual classification, and is suitable for Indonesian and Malay news classification.


2010 ◽  
Vol 44-47 ◽  
pp. 1130-1134
Author(s):  
Sheng Li ◽  
Pei Lin Zhang ◽  
Bing Li

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.


Author(s):  
GULDEN UCHYIGIT ◽  
KEITH CLARK

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.


2021 ◽  
Vol 8 (1) ◽  
pp. 103
Author(s):  
Sulandri Sulandri ◽  
Achmad Basuki ◽  
Fitra Abdurrachman Bachtiar

<p>Deteksi intrusi pada jaringan komputer merupakan kegiatan yang sangat penting dilakukan untuk menjaga keamanan data dan informasi. Deteksi intrusi merupakan proses monitor <em>tra</em><em>f</em><em>fi</em><em>c</em> pada sebuah jaringan untuk mendeteksi adanya pola data yang dianggap mencurigakan, yang memungkinkan terjadinya serangan jaringan. Penelitian ini melakukan analisis pada <em>traffic</em> jaringan untuk mengetahui apakah paket tersebut mengandung intrusi atau merupakan paket normal. Data <em>traffic </em>yang digunakan untuk deteksi intrusi pada penelitian ini diambil dari <em>dataset</em> KDD Cup. Metode yang digunakan untuk melakukan deteksi intrusi dengan cara klasifikasi yaitu dengan menggunakan metode <em>Extreme Learning Machine</em> (ELM). Namun, dengan menggunakan metode ELM saja tidak mampu untuk menghasilkan akurasi yang baik maka, pada metode ELM perlu ditambahkan metode seleksi fitur <em>Correlation-Based Feature Selection</em> (CFS) untuk meningkatkan hasil akurasi dan waktu komputasi. Hasil penelitian yang dilakukan dengan menggunakan metode ELM menunjukkan tingkat akurasi mencapai 81,97% dengan waktu komputasi 3,39 detik. Setelah ditambahkan metode seleksi fitur CFS pada ELM tingkat akurasi meningkat secara signifikan menjadi 98,00% dengan waktu komputasi 2,32 detik.</p><p> </p><p><em><strong>Abstract</strong></em></p><p><em>Intrusion detection of computer networks is a very important activity carried out to maintain data and information security. Intrusion detection is the process of monitoring traffic on a network to detect any data patterns that are considered suspicious, which allows network attacks. This research analyzes the network traffic to find out whether the packet contains intrusion or is a normal packet. Traffic data used for intrusion detection in this study were taken from the KDD Cup dataset. The method used to do intrusion detection by classification is using the Extreme Learning Machine (ELM) method. However, using the ELM method alone is not able to produce good accuracy, so the ELM method needs to be added to the Correlation-Based Feature Selection (CFS) feature selection method to improve the accuracy and computational time. The results of the research conducted using the ELM method showed an accuracy rate of 81.97% with a computation time of 3.39 seconds. After adding the CFS feature selection method to ELM the accuracy level increased significantly to 98.00% with a computing time of 2.32 seconds.</em><em></em></p>


2021 ◽  
Vol 9 (4) ◽  
pp. 549
Author(s):  
I Nyoman Yusha Tresnatama Giri ◽  
Luh Arida Ayu Rahning Putri

One of the things that affects classification results is the correlation of features to the class of a data. This research was conducted to determine the effect of the reduction of features (independent variable) that have the weakest correlation or have a distant relationship with the class (dependent variable). Bivariate Pearson Correlation is used as a feature selection method and K-Nearest Neighbor is used as a classification method. Results of the test showing that, 75.1% average accuracy was obtained for classification without feature selection, while using feature selection, average accuracy was obtained in the range of 75% - 79.3%. The average accuracy obtained by the selection of features tends to be higher compared to the accuracy obtained without selection of features.


Sign in / Sign up

Export Citation Format

Share Document