scholarly journals QUERY ANSWERING SYSTEM OF SHAHIH HADITH MUTTAFAQUN ‘ALAIH USING INDONESIAN THESAURUS BASED ON QUERY EXPANSION AND NAÏVE BAYES CLASSIFIER

MATICS ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 10
Author(s):  
Muhammad Fairuz Zumar Rounaqi

<p class="Text"><em><span style="font-size: 9.0pt; line-height: 105%;">Abstract</span></em><span style="font-size: 9.0pt; line-height: 105%;">— Hadith are all the words, deeds and provisions of the Prophet Muhammad SAW that are used as the second of Islamic law after Al-Quran. The purpose of this study is to make an Information Retrieval system called the Query Answering System is expected to facilitate users in searching and finding the hadith documents as the user's needs. This study implements the Naïve Bayes Classifier method combined with Indonesian thesaurus as a query expansion to find the hadith documents that relevant to the input query. Based on the testing of 50 query data, the test results show that the use of query expansion gives better results than without using query expansion. Where based on testing of the top 1 data without using query expansion obtained an average recall value of 62%, an average precision value of 62%, an average accuracy value of 92.4% and an average value of the f-measure of 62%, while testing using query expansion obtained an average recall value of 66%, an average precision value of 66%, an average accuracy value of 93.2% and an average f -measure value of 66%. Based on the test results, the use of query expansion shows an improvement in the average recall value of 4%, an improvement in the average precision value of 4%, and an improvement in the average accuracy value of 0.8% and an improvement in the average f-measure value of 4% compared on without using query expansion.</span></p><p class="MsoNormal"> </p><p class="IndexTerms"><em>Index Terms</em>—hadith, information retrieval, query expansion, naïve bayes. </p>

2021 ◽  
pp. 2053-2063
Author(s):  
Wajih A. Ghani A. Hussain

The huge evolving in the information technologies, especially in the few last decades, has produced an increase in the volume of data on the World Wide Web, which is still growing significantly. Retrieving the relevant information on the Internet or any data source with a query created by a few words has become a big challenge. To override this, query expansion (QE) has an important function in improving the information retrieval (IR), where the original query of user is recreated to a new query by appending new related terms with the same importance. One of the problems of query expansion is the choosing of suitable terms. This problem leads to another challenge of how to retrieve the important documents with high precision, high recall, and high F measure. In this paper, we solve this problem through applying different similarity measures with the use of English WordNet. The obtained results proved that, with a suitable selection method, we are able to take advantage of English WordNet to improve the retrieval efficiency. The work proposed in this paper is extracting the terms from all the documents and query, then applying the following steps: preprocessing, expanding the query based on English WordNet, selecting the best terms, weighting of term, and finally using the cosine similarity and Jaccard similarity to obtain the relevant documents. Our practical results were applied on the DUC2002 dataset that contains 559 documents distributed over several categories. The average precision of cosine (for random queries) = 100% whereas the average precision of Jaccard = 84.4 %, and the average recall of cosine = 86.8%   whereas the average recall of Jaccard = 73.4%. The average f-measure of cosine = 92%, whereas the average f-measure of Jaccard = 76%.


2012 ◽  
Vol 5s1 ◽  
pp. BII.S8945 ◽  
Author(s):  
Irena Spasić ◽  
Pete Burnap ◽  
Mark Greenwood ◽  
Michael Arribas-Ayllon

The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.


2020 ◽  
Vol 4 (2) ◽  
pp. 214-221
Author(s):  
Maria Mega Mala Olhang ◽  
Sentot Achmadi ◽  
F.X Ari Wibisono

Media sosial khususnya Twitter pada saat ini banyak membahas mengenai penyebaran virus corona atau lebih dikenal dengan COVID-19. Diawali dengan ditemukan kasus pertama di Wuhan, China, pemberitaan mengenai virus corona terus berlanjut hingga penyebarannya sampai ke Indonesia. Pemberitaan melalui artikel di Twitter mengenai dampak dari adanya COVID-19 ini antra lain persediaan bahan pokok yang mulai meningkat harganya termasuk harga masker dan hand sanitizer juga penyampaian setuju dan tidak setujunya masyarakat terhadap kebijakan pemerintah yang dianggap kurang tanggap dalam menangani kasus ini sangat banyak diminati dan dikritik oleh masyarakat. Pada penelitian ini, dilakukan proses menganalisis sentimen masyarakat terhadap aspirasi yang disampaikan melalui Twitter yaitu mengembangkan sistem dengan mengacu pada berbagai sistem yang sudah ada sebelumnya dengan menggunakan metode Naïve Bayes Classifier untuk mengklasifikasikan sentimen. Masukan pada sistem ini berupa tweet yang diperoleh dari Twitter menggunakan keyword seperti #coronavirusindonesia atau #covid-19 dengan jumlah data tidak melebihi 500 data tweet. Sedangkan outputnya berupa pengelompokkan sentimen positif dan negatif dari setiap tweet yang sudah melewati tahap pre proceessing. Dari hasil pengujian, dokumen dengan jumlah sebanyak 75 tweet diperoleh hasil pengukuran akurasi recall 32%, precission 80%, F-Measure 45% serta rata-rata akurasi 36%.  


2019 ◽  
Vol 9 (6) ◽  
pp. 4974-4979 ◽  
Author(s):  
S. Rahamat Basha ◽  
J. K. Rani

This work deals with document classification. It is a supervised learning method (it needs a labeled document set for training and a test set of documents to be classified). The procedure of document categorization includes a sequence of steps consisting of text preprocessing, feature extraction, and classification. In this work, a self-made data set was used to train the classifiers in every experiment. This work compares the accuracy, average precision, precision, and recall with or without combinations of some feature selection techniques and two classifiers (KNN and Naive Bayes). The results concluded that the Naive Bayes classifier performed better in many situations.


2021 ◽  
Vol 2 (01) ◽  
pp. 16-23
Author(s):  
Shania Kaparang ◽  
Daniel Riano Kaparang ◽  
Vivi Pegie Rantung

Dampak dari pandemi covid-19 begitu besar sehingga pemerintah harus memiliki kebijakan agar dapat mengurangi dampaknya. Salah satu kebijakan pemerintah yaitu new normal yang mewajibkan seluruh masyarakat untuk pakai masker, jaga jarak dan cuci tangan. Dalam penerapannya tentu ada sentimen-sentimen baik positif maupun negatif yang diunggah ke dalam Twitter. Penelitian ini bertujuan untuk membuat pemodelan analisis sentimen masyarakat mengenai kebijakan new normal pemerintah pada masa pandemi covid-19 di Indonesia. Tahapan penelitian ini yakni crawling data, labeling, penghapusan data netral, preprocessing, pembagian training data dan testing data, pembuatan sistem klasifikasi naïve bayes, uji coba sistem dan visualisasi hasil penelitian dengan menggunakan wordcloud. Performa sistem klasifikasinya antara lain, tingkat akurasi 80,37%, presisi 87,38%, recall 82,57% dan  f-measure 84,91%. Hasil dari penelitian ini yaitu 5194 tweets terklasifikasi sentimen positif dan 2908 tweets terklasifikasi sentiment negative, hal ini menunjukkan bahwa sentimen positif lebih banyak daripada sentimen negatif. Tetapi dari jumlahnya bisa dilihat bahwa perbandingannya tidak terlalu jauh antara sentimen positif dan sentimen negatif, artinya ada respon masyarakat yang masih kurang terhadap kebijakan pemerintah new normal pada masa pandemi.


2020 ◽  
Vol 1 (3) ◽  
Author(s):  
Sahar Sahar

Di Indonesia telah terjadi pergeseran kejadian penyakit jantung dan pembuluh darah dari urutan ke-l0 tahun 1980 menjadi urutan ke-8 tahun 1986. Sedangkan penyebab kematian tetap menduduki peringkat ke-3. Dalam proses pengklasifikasian ini untuk mengetahui apakah termaksud penyakit jantung atau non penyakit jantung dengan mengunakan rumus dari metode K-Nearest Neighbor dan Naive Bayes Classifier yang menggunakan library scikit learn. Dalam proses penelitian ini kita melakukan perhitungan hasil nilai performa yang terdiri dari akurasi, presisi, recall dan f-measure pada dataset penyakit jantung. Menggunakan metode klasifikasi yg memiliki hasil uji performa tertinggi/terbaik. Berdasarkan hasil pengujian, didapatkan tingkat akurasi pada metode K-Nearest Neighbor sebesar  67%, presisi 65%, recall 73%, dan f-measure 96% pada nilai K=250 dan metode jarak Manhattan, tingkat akurasi pada metode jarak Euclidean sebesar 65%, presisi 65%, recall 69%, dan f-measure 67% pada nilai K=250  sedangkan pada metode Naïve Bayes Classifier tingkat akurasi yang didapatkan sebesar 58%, presisi 90%, recall 55% , dan f-measure 68%. Performa metode klasifikasi terbaik pada dataset Penyakit jantung yaitu metode KNN (K-Nearest Neighbor).


2019 ◽  
Vol 5 (3) ◽  
pp. 279
Author(s):  
Sitti Nurul Jannah Fitriyyah ◽  
Novi Safriadi ◽  
Enda Esyudha Pratama

Pada tahun 2019 Indonesia akan mengadakan pesta demokrasi pemilihan kepala negara Indonesia. Setiap tokoh politik yang dicalonkan menjadi kepala negara akan mempertimbangkan popularitas mereka berdasarkan opini masyarakat. Sejak diumumkan nama calon Presiden Indonesia 2019 oleh Komisi Pemilihan Umum(KPU) nama-nama tersebut mulai banyak diperbincangkan, terutama di media sosial salah satunya adalah twitter. Terdapat berbagai opini pengguna twitter yang bersentimen negatif positif dan netral. Namun untuk menentukan sentimen dari pengguna twitter membutuhkan usaha dan waktu yang cukup banyak dikarenakan banyaknya jumlah tweet yang digunakan. Dibutuhkan pembelajaran mesin yang dengan cepat dalam pengklasisifikasian tweet tersebut dalam kelas negatif, positif dan netral. Naive Bayes Classifier adalah metode klasifikasi text yang memiliki kecepatan pemrosesan dan akurasi yang cukup tinggi apabila diterapkan pada data yang banyak, besar, dan beragam. Sebelum data tweet diklasifikasikan, data tersebut harus melalui beberapa proses, seperti prepocessing, pembobotan kata dan pemecahan data. Tujuan dari penelitian ini adalah mengetahui bagimana penerapan metode Naive Bayes pada sentimen pengguna twiter di 2 kelas (negatif, positif) dan 3 kelas (negatif, positif, netral). Hasil dari penelitian ini diperoleh bahwa dilakukan pengujian 3 kelas dan 2 kelas untuk setiap pasangan calon (paslon). Pada pengujian 3 kelas paslon 01 dan paslon 02 didapat hasil akurasi berturut-turut sebagai berikut 64,6% dan 58%. Sedangkan pada pengujian 2 kelas paslon 01 dan paslon 02 didapat hasil akurasi berturut-turut sebagai berikut 77,7% dan 88%. Performansi tertinggi terdapat pada calon presiden nomor urut dua dengan nilai f-measure sebesar 0,88.


2019 ◽  
Vol 4 (1) ◽  
pp. 178
Author(s):  
Raseeda Hamzah ◽  
Nursuriati Jamil

Filled pause and Elongation are the two types of speech disfluencies that need more suitable acoustical features to be classified correctly since they are always being misclassified. This work concentrates on developing an accurate and robust energy feature extraction for modelling filled pause and elongation by investigating different energy features using local maxima points of the speech energy. Method: In this paper, we extracted peak values from each frame of a voiced signal by implementing different thresholding techniques to classify filled pause and elongation. These energy features are evaluated by using statistical naïve Bayes classifier to see the contribution on the classification processes. Various samples of sustained syllables and filled pauses of spontaneous speech were extracted from Malaysian Parliamentary Debate Database of the year 2008. A naïve Bayes was used as a classifier. We performed F-measure evaluation to investigate the significant differences in mean of filled pause and elongation samples. Results: Results revealed that our proposed LM-E has increase the classification with up to 71% and 75% F-measure for elongation and filled pause. Conclusion:  The best achieved accuracies in both filled pause and elongation classification were varied depending on the types of thresholding techniques applied during the local maxima of speech energy extraction. The most contributed thresholding technique is our proposed technique which is by using the adaptive height as the threshold that extracts the local maxima of the speech energy (LM-E).


Foristek ◽  
2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Dessy Santi ◽  
Jumadil Nangi ◽  
Natalis Ransi

Sometimes the classification of news categories is still an obstacle. Classification can be wrong because it is still subjective. As a result, the selected category does not match the uploaded news description. Based on these problems, the authors feel the need to make Classification of News Types with the Naïve Bayes Classifier Algorithm. The importance of this system is to be able to classify news and help news seekers to get the news they want. Based on the test results, the Naïve Bayes Classifier algorithm has a good performance for the classification of news types. This is evidenced in testing using news data taken from www.kompasiana.com, then news is classified into four categories namely politics, economics, sports, and entertainment. The classification results using 16 test news obtained an accuracy of 87.5%.


2018 ◽  
Vol 10 (2) ◽  
pp. 109-118
Author(s):  
Anif Hanifa Setianingrum ◽  
Dea Herwinda Kalokasari ◽  
Imam Marzuki Shofi

ABSTRAK Informasi diperkirakan lebih dari 80% tersimpan dalam bentuk teks tidak terstruktur. Oleh karena itu, dibutuhkan sistem pengelolaan teks yaitu dengan metode text mining yang diyakini memiliki potensial nilai komersial tinggi. Salah satu implementasi dari text mining yaitu klasifikasi teks. Tidak hanya dokumen, pemanfaatan klasifikasi juga digunakan pada surat. Peneliti mengkaji Multinomial Naive Bayes Classifier untuk mengklasifikasi surat keluar sehingga dapat menentukan nomor surat secara otomatis. Sistem klasifikasi didukung dengan confix-stripping stemmer untuk menemukan kata dasar dan TF-IDF untuk pembobotan kata. Pengujian diukur dengan menggunakan confusion matrix. Dari hasil pengujian menunjukkan bahwa implementasi Multinomial Naive Bayes Classifier pada sistem klasifikasi surat memiliki tingkat accuracy, precision, recall, dan F-measure berturut-turut sebesar 89,58%, 79,17%, 78,72%, dan 77,05%.  ABSTRACT The information estimated that more than 80% is stored in the form of unstructured text. Therefore, it takes a text management system, namely text mining method is believed to have high potential commercial. One of text mining implementation is text classification. Not only documents, the use of classification is also used in official letter. Researcher examined Multinomial Naive Bayes Classifier to classify the letter so it can determine the letters classification code automatically. The classification system is supported by confix-stripping stemmer to find root and TF-IDF for term weighting. The test used by confusion matrix of a classified as a measure of its quality. The test results showed that the implementation of Multinomial Naive Bayes Classifier on letter classification system has a level of accuracy, precision, recall, and F-measure respectively for 89.58%, 79.17%, 78.72% and 77.05%.How to Cite : Setianingrum, A. H. Kalokasari, D.H . Shofi. I. M. (2017). IMPLEMENTASI ALGORITMA MULTINOMIAL NAIVE BAYES CLASSIFIER. Jurnal Teknik Informatika, 10(2), 109-118. doi: 10.15408/jti.v10i2.6822Permalink/DOI: http://dx.doi.org/10.15408/jti.v10i2.6822


Sign in / Sign up

Export Citation Format

Share Document