QUERY ANSWERING SYSTEM OF SHAHIH HADITH MUTTAFAQUN ‘ALAIH USING INDONESIAN THESAURUS BASED ON QUERY EXPANSION  AND NAÏVE BAYES CLASSIFIER

Abstract— Hadith are all the words, deeds and provisions of the Prophet Muhammad SAW that are used as the second of Islamic law after Al-Quran. The purpose of this study is to make an Information Retrieval system called the Query Answering System is expected to facilitate users in searching and finding the hadith documents as the user's needs. This study implements the Naïve Bayes Classifier method combined with Indonesian thesaurus as a query expansion to find the hadith documents that relevant to the input query. Based on the testing of 50 query data, the test results show that the use of query expansion gives better results than without using query expansion. Where based on testing of the top 1 data without using query expansion obtained an average recall value of 62%, an average precision value of 62%, an average accuracy value of 92.4% and an average value of the f-measure of 62%, while testing using query expansion obtained an average recall value of 66%, an average precision value of 66%, an average accuracy value of 93.2% and an average f -measure value of 66%. Based on the test results, the use of query expansion shows an improvement in the average recall value of 4%, an improvement in the average precision value of 4%, and an improvement in the average accuracy value of 0.8% and an improvement in the average f-measure value of 4% compared on without using query expansion. Index Terms—hadith, information retrieval, query expansion, naïve bayes.

Download Full-text

Applying Similarity Measures to Improve Query Expansion

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.6.31 ◽

2021 ◽

pp. 2053-2063

Author(s):

Wajih A. Ghani A. Hussain

Keyword(s):

Information Technologies ◽

Query Expansion ◽

Similarity Measures ◽

Relevant Information ◽

Jaccard Similarity ◽

Average Precision ◽

Average Recall ◽

Retrieval Efficiency ◽

Data Source ◽

F Measure

The huge evolving in the information technologies, especially in the few last decades, has produced an increase in the volume of data on the World Wide Web, which is still growing significantly. Retrieving the relevant information on the Internet or any data source with a query created by a few words has become a big challenge. To override this, query expansion (QE) has an important function in improving the information retrieval (IR), where the original query of user is recreated to a new query by appending new related terms with the same importance. One of the problems of query expansion is the choosing of suitable terms. This problem leads to another challenge of how to retrieve the important documents with high precision, high recall, and high F measure. In this paper, we solve this problem through applying different similarity measures with the use of English WordNet. The obtained results proved that, with a suitable selection method, we are able to take advantage of English WordNet to improve the retrieval efficiency. The work proposed in this paper is extracting the terms from all the documents and query, then applying the following steps: preprocessing, expanding the query based on English WordNet, selecting the best terms, weighting of term, and finally using the cosine similarity and Jaccard similarity to obtain the relevant documents. Our practical results were applied on the DUC2002 dataset that contains 559 documents distributed over several categories. The average precision of cosine (for random queries) = 100% whereas the average precision of Jaccard = 84.4 %, and the average recall of cosine = 86.8% whereas the average recall of Jaccard = 73.4%. The average f-measure of cosine = 92%, whereas the average f-measure of Jaccard = 76%.

Download Full-text

A Naïve Bayes Approach to Classifying Topics in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8945 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8945 ◽

Cited By ~ 9

Author(s):

Irena Spasić ◽

Pete Burnap ◽

Mark Greenwood ◽

Michael Arribas-Ayllon

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Suicide Notes ◽

Matching Rules ◽

F Measure

The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.

Download Full-text

ANALISIS SENTIMEN PENGGUNA TWITTER TERHADAP COVID-19 DI INDONESIA MENGGUNAKAN METODE NAIVE BAYES CLASSIFIER (NBC)

JATI (Jurnal Mahasiswa Teknik Informatika) ◽

10.36040/jati.v4i2.2695 ◽

2020 ◽

Vol 4 (2) ◽

pp. 214-221

Author(s):

Maria Mega Mala Olhang ◽

Sentot Achmadi ◽

F.X Ari Wibisono

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Hand Sanitizer ◽

F Measure

Media sosial khususnya Twitter pada saat ini banyak membahas mengenai penyebaran virus corona atau lebih dikenal dengan COVID-19. Diawali dengan ditemukan kasus pertama di Wuhan, China, pemberitaan mengenai virus corona terus berlanjut hingga penyebarannya sampai ke Indonesia. Pemberitaan melalui artikel di Twitter mengenai dampak dari adanya COVID-19 ini antra lain persediaan bahan pokok yang mulai meningkat harganya termasuk harga masker dan hand sanitizer juga penyampaian setuju dan tidak setujunya masyarakat terhadap kebijakan pemerintah yang dianggap kurang tanggap dalam menangani kasus ini sangat banyak diminati dan dikritik oleh masyarakat. Pada penelitian ini, dilakukan proses menganalisis sentimen masyarakat terhadap aspirasi yang disampaikan melalui Twitter yaitu mengembangkan sistem dengan mengacu pada berbagai sistem yang sudah ada sebelumnya dengan menggunakan metode Naïve Bayes Classifier untuk mengklasifikasikan sentimen. Masukan pada sistem ini berupa tweet yang diperoleh dari Twitter menggunakan keyword seperti #coronavirusindonesia atau #covid-19 dengan jumlah data tidak melebihi 500 data tweet. Sedangkan outputnya berupa pengelompokkan sentimen positif dan negatif dari setiap tweet yang sudah melewati tahap pre proceessing. Dari hasil pengujian, dokumen dengan jumlah sebanyak 75 tweet diperoleh hasil pengukuran akurasi recall 32%, precission 80%, F-Measure 45% serta rata-rata akurasi 36%.

Download Full-text

A Comparative Approach of Dimensionality Reduction Techniques in Text Classification

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3146 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4974-4979 ◽

Cited By ~ 2

Author(s):

S. Rahamat Basha ◽

J. K. Rani

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Comparative Approach ◽

Bayes Classifier ◽

Average Precision ◽

Data Set ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques ◽

Document Categorization ◽

Feature Selection Techniques

This work deals with document classification. It is a supervised learning method (it needs a labeled document set for training and a test set of documents to be classified). The procedure of document categorization includes a sequence of steps consisting of text preprocessing, feature extraction, and classification. In this work, a self-made data set was used to train the classifiers in every experiment. This work compares the accuracy, average precision, precision, and recall with or without combinations of some feature selection techniques and two classifiers (KNN and Naive Bayes). The results concluded that the Naive Bayes classifier performed better in many situations.

Download Full-text

Analisis Sentimen New Normal Pada Masa Covid-19 Menggunakan Algoritma Naive Bayes Classifier

Jointer - Journal of Informatics Engineering ◽

10.53682/jointer.v2i01.33 ◽

2021 ◽

Vol 2 (01) ◽

pp. 16-23

Author(s):

Shania Kaparang ◽

Daniel Riano Kaparang ◽

Vivi Pegie Rantung

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

New Normal ◽

Testing Data ◽

F Measure

Dampak dari pandemi covid-19 begitu besar sehingga pemerintah harus memiliki kebijakan agar dapat mengurangi dampaknya. Salah satu kebijakan pemerintah yaitu new normal yang mewajibkan seluruh masyarakat untuk pakai masker, jaga jarak dan cuci tangan. Dalam penerapannya tentu ada sentimen-sentimen baik positif maupun negatif yang diunggah ke dalam Twitter. Penelitian ini bertujuan untuk membuat pemodelan analisis sentimen masyarakat mengenai kebijakan new normal pemerintah pada masa pandemi covid-19 di Indonesia. Tahapan penelitian ini yakni crawling data, labeling, penghapusan data netral, preprocessing, pembagian training data dan testing data, pembuatan sistem klasifikasi naïve bayes, uji coba sistem dan visualisasi hasil penelitian dengan menggunakan wordcloud. Performa sistem klasifikasinya antara lain, tingkat akurasi 80,37%, presisi 87,38%, recall 82,57% dan f-measure 84,91%. Hasil dari penelitian ini yaitu 5194 tweets terklasifikasi sentimen positif dan 2908 tweets terklasifikasi sentiment negative, hal ini menunjukkan bahwa sentimen positif lebih banyak daripada sentimen negatif. Tetapi dari jumlahnya bisa dilihat bahwa perbandingannya tidak terlalu jauh antara sentimen positif dan sentimen negatif, artinya ada respon masyarakat yang masih kurang terhadap kebijakan pemerintah new normal pada masa pandemi.

Download Full-text

Analisis Perbandingan Metode K-Nearest Neighbor dan Naïve Bayes Clasiffier Pada Dataset Penyakit Jantung

Indonesian Journal of Data and Science ◽

10.33096/ijodas.v1i3.20 ◽

2020 ◽

Vol 1 (3) ◽

Author(s):

Sahar Sahar

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Naïve Bayes Classifier ◽

F Measure

Di Indonesia telah terjadi pergeseran kejadian penyakit jantung dan pembuluh darah dari urutan ke-l0 tahun 1980 menjadi urutan ke-8 tahun 1986. Sedangkan penyebab kematian tetap menduduki peringkat ke-3. Dalam proses pengklasifikasian ini untuk mengetahui apakah termaksud penyakit jantung atau non penyakit jantung dengan mengunakan rumus dari metode K-Nearest Neighbor dan Naive Bayes Classifier yang menggunakan library scikit learn. Dalam proses penelitian ini kita melakukan perhitungan hasil nilai performa yang terdiri dari akurasi, presisi, recall dan f-measure pada dataset penyakit jantung. Menggunakan metode klasifikasi yg memiliki hasil uji performa tertinggi/terbaik. Berdasarkan hasil pengujian, didapatkan tingkat akurasi pada metode K-Nearest Neighbor sebesar 67%, presisi 65%, recall 73%, dan f-measure 96% pada nilai K=250 dan metode jarak Manhattan, tingkat akurasi pada metode jarak Euclidean sebesar 65%, presisi 65%, recall 69%, dan f-measure 67% pada nilai K=250 sedangkan pada metode Naïve Bayes Classifier tingkat akurasi yang didapatkan sebesar 58%, presisi 90%, recall 55% , dan f-measure 68%. Performa metode klasifikasi terbaik pada dataset Penyakit jantung yaitu metode KNN (K-Nearest Neighbor).

Download Full-text

Analisis Sentimen Calon Presiden Indonesia 2019 dari Media Sosial Twitter Menggunakan Metode Naive Bayes

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v5i3.34368 ◽

2019 ◽

Vol 5 (3) ◽

pp. 279

Author(s):

Sitti Nurul Jannah Fitriyyah ◽

Novi Safriadi ◽

Enda Esyudha Pratama

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

F Measure

Pada tahun 2019 Indonesia akan mengadakan pesta demokrasi pemilihan kepala negara Indonesia. Setiap tokoh politik yang dicalonkan menjadi kepala negara akan mempertimbangkan popularitas mereka berdasarkan opini masyarakat. Sejak diumumkan nama calon Presiden Indonesia 2019 oleh Komisi Pemilihan Umum(KPU) nama-nama tersebut mulai banyak diperbincangkan, terutama di media sosial salah satunya adalah twitter. Terdapat berbagai opini pengguna twitter yang bersentimen negatif positif dan netral. Namun untuk menentukan sentimen dari pengguna twitter membutuhkan usaha dan waktu yang cukup banyak dikarenakan banyaknya jumlah tweet yang digunakan. Dibutuhkan pembelajaran mesin yang dengan cepat dalam pengklasisifikasian tweet tersebut dalam kelas negatif, positif dan netral. Naive Bayes Classifier adalah metode klasifikasi text yang memiliki kecepatan pemrosesan dan akurasi yang cukup tinggi apabila diterapkan pada data yang banyak, besar, dan beragam. Sebelum data tweet diklasifikasikan, data tersebut harus melalui beberapa proses, seperti prepocessing, pembobotan kata dan pemecahan data. Tujuan dari penelitian ini adalah mengetahui bagimana penerapan metode Naive Bayes pada sentimen pengguna twiter di 2 kelas (negatif, positif) dan 3 kelas (negatif, positif, netral). Hasil dari penelitian ini diperoleh bahwa dilakukan pengujian 3 kelas dan 2 kelas untuk setiap pasangan calon (paslon). Pada pengujian 3 kelas paslon 01 dan paslon 02 didapat hasil akurasi berturut-turut sebagai berikut 64,6% dan 58%. Sedangkan pada pengujian 2 kelas paslon 01 dan paslon 02 didapat hasil akurasi berturut-turut sebagai berikut 77,7% dan 88%. Performansi tertinggi terdapat pada calon presiden nomor urut dua dengan nilai f-measure sebesar 0,88.

Download Full-text

INVESTIGATION OF SPEECH DISFLUENCIES CLASSIFICATION ON DIFFERENT THRESHOLD SELECTION TECHNIQUES USING ENERGY FEATURE EXTRACTION

MALAYSIAN JOURNAL OF COMPUTING ◽

10.24191/mjoc.v4i1.4979 ◽

2019 ◽

Vol 4 (1) ◽

pp. 178

Author(s):

Raseeda Hamzah ◽

Nursuriati Jamil

Keyword(s):

Feature Extraction ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Classifier ◽

Threshold Selection ◽

Local Maxima ◽

Parliamentary Debate ◽

Energy Feature ◽

Measure Evaluation ◽

F Measure

Filled pause and Elongation are the two types of speech disfluencies that need more suitable acoustical features to be classified correctly since they are always being misclassified. This work concentrates on developing an accurate and robust energy feature extraction for modelling filled pause and elongation by investigating different energy features using local maxima points of the speech energy. Method: In this paper, we extracted peak values from each frame of a voiced signal by implementing different thresholding techniques to classify filled pause and elongation. These energy features are evaluated by using statistical naïve Bayes classifier to see the contribution on the classification processes. Various samples of sustained syllables and filled pauses of spontaneous speech were extracted from Malaysian Parliamentary Debate Database of the year 2008. A naïve Bayes was used as a classifier. We performed F-measure evaluation to investigate the significant differences in mean of filled pause and elongation samples. Results: Results revealed that our proposed LM-E has increase the classification with up to 71% and 75% F-measure for elongation and filled pause. Conclusion: The best achieved accuracies in both filled pause and elongation classification were varied depending on the types of thresholding techniques applied during the local maxima of speech energy extraction. The most contributed thresholding technique is our proposed technique which is by using the adaptive height as the threshold that extracts the local maxima of the speech energy (LM-E).

Download Full-text

Implementasi Naïve bayes Clasifier dalam Klasifikasi Jenis Berita

Foristek ◽

10.54757/fs.v10i1.52 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Dessy Santi ◽

Jumadil Nangi ◽

Natalis Ransi

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Test Results ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Sometimes the classification of news categories is still an obstacle. Classification can be wrong because it is still subjective. As a result, the selected category does not match the uploaded news description. Based on these problems, the authors feel the need to make Classification of News Types with the Naïve Bayes Classifier Algorithm. The importance of this system is to be able to classify news and help news seekers to get the news they want. Based on the test results, the Naïve Bayes Classifier algorithm has a good performance for the classification of news types. This is evidenced in testing using news data taken from www.kompasiana.com, then news is classified into four categories namely politics, economics, sports, and entertainment. The classification results using 16 test news obtained an accuracy of 87.5%.

Download Full-text

IMPLEMENTASI ALGORITMA MULTINOMIAL NAIVE BAYES CLASSIFIER

JURNAL TEKNIK INFORMATIKA ◽

10.15408/jti.v10i2.6822 ◽

2018 ◽

Vol 10 (2) ◽

pp. 109-118

Author(s):

Anif Hanifa Setianingrum ◽

Dea Herwinda Kalokasari ◽

Imam Marzuki Shofi

Keyword(s):

Text Mining ◽

Classification System ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Classification Code ◽

F Measure

ABSTRAK Informasi diperkirakan lebih dari 80% tersimpan dalam bentuk teks tidak terstruktur. Oleh karena itu, dibutuhkan sistem pengelolaan teks yaitu dengan metode text mining yang diyakini memiliki potensial nilai komersial tinggi. Salah satu implementasi dari text mining yaitu klasifikasi teks. Tidak hanya dokumen, pemanfaatan klasifikasi juga digunakan pada surat. Peneliti mengkaji Multinomial Naive Bayes Classifier untuk mengklasifikasi surat keluar sehingga dapat menentukan nomor surat secara otomatis. Sistem klasifikasi didukung dengan confix-stripping stemmer untuk menemukan kata dasar dan TF-IDF untuk pembobotan kata. Pengujian diukur dengan menggunakan confusion matrix. Dari hasil pengujian menunjukkan bahwa implementasi Multinomial Naive Bayes Classifier pada sistem klasifikasi surat memiliki tingkat accuracy, precision, recall, dan F-measure berturut-turut sebesar 89,58%, 79,17%, 78,72%, dan 77,05%. ABSTRACT The information estimated that more than 80% is stored in the form of unstructured text. Therefore, it takes a text management system, namely text mining method is believed to have high potential commercial. One of text mining implementation is text classification. Not only documents, the use of classification is also used in official letter. Researcher examined Multinomial Naive Bayes Classifier to classify the letter so it can determine the letters classification code automatically. The classification system is supported by confix-stripping stemmer to find root and TF-IDF for term weighting. The test used by confusion matrix of a classified as a measure of its quality. The test results showed that the implementation of Multinomial Naive Bayes Classifier on letter classification system has a level of accuracy, precision, recall, and F-measure respectively for 89.58%, 79.17%, 78.72% and 77.05%.How to Cite : Setianingrum, A. H. Kalokasari, D.H . Shofi. I. M. (2017). IMPLEMENTASI ALGORITMA MULTINOMIAL NAIVE BAYES CLASSIFIER. Jurnal Teknik Informatika, 10(2), 109-118. doi: 10.15408/jti.v10i2.6822Permalink/DOI: http://dx.doi.org/10.15408/jti.v10i2.6822

Download Full-text