Applying Naive Bayes Classifier to Document Clustering

Document clustering partitions sets of unlabeled documents so that documents in clusters share common concepts. A Naive Bayes Classifier (BC) is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. BC requires a small amount of training data to estimate parameters required for classification. Since training data must be labeled, we propose an Iterative Bayes Clustering (IBC) algorithm. To improve IBC performance, we propose combining IBC with Comparative Advantage-based (CA) initialization method. Experimental results show that our proposal improves performance significantly over classical clustering methods.

Download Full-text

COMPARISON OF NAIVE BAYES ALGORITHM AND C.45 ALGORITHM IN CLASSIFICATION OF POOR COMMUNITIES RECEIVING NON CASH FOOD ASSISTANCE IN WANASARI VILLAGE KARAWANG REGENCY

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v17i1.1191 ◽

2020 ◽

Vol 17 (1) ◽

pp. 37-42

Author(s):

Yuris Alkhalifi ◽

Ainun Zumarniansyah ◽

Rian Ardianto ◽

Nila Hardi ◽

Annisa Elfina Augustia

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Total Sample ◽

Naïve Bayes ◽

Food Assistance ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.

Download Full-text

Sentiment Analysis Of Full Day School Policy Comment Using Naïve Bayes Classifier Algorithm

SinkrOn ◽

10.33395/sinkron.v5i1.10564 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Miftahul Kahfi Al Fath ◽

Arini Arini ◽

Nasrul Hakiem

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

School Policy ◽

Naïve Bayes ◽

Day School ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Full Day

Sentiment analysis is an important and emerging research topic today. Sentiment analysis is done to see opinion or tendency of opinion to a problem or object by someone, whether it tends to have a negative or positive view. The main purpose of this study is to find out public sentiment on Full Day school's policy comment from Facebook Page of Kemendikbud RI and to find out the performance of the Naïve Bayes Classifier Algorithm. In this study, the authors used the Naïve Bayes Classifier algorithm with trigram and quad ram character feature selection with two different training data models and labeling of training data using Lexicon Based method in the classification of public sentiment toward the Full day school policy. The result of this research shows that public negative sentiment toward Full Day School policy is more than positive or neutral sentiment. The highest accuracy value is the Naïve Bayes Classifier algorithm with trigram feature selection of 300 data training models with a value of 80%. The greater of training data and feature selection used on the Naïve Bayes Classifier Algorithm affected the accurate result.

Download Full-text

Klasifikasi sinopsis novel menggunakan metode naïve bayes classifier

Repositor ◽

10.22219/repositor.v1i2.799 ◽

2019 ◽

Vol 1 (2) ◽

pp. 125

Author(s):

Vinna Rahmayanti ◽

Setio Basuki ◽

Hilman Hilman

Keyword(s):

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

The Novel ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Romantic Comedy ◽

Document Frequency

It is undeniable that technological progress is developing very quickly in the field of computers, now with computers the work that was originally done by humans can be taken over by computers to help human work itself, like case studi of this research is a system that can classification the text like synopsis into genre group. Genre is the style of story in a novel, there are many genres in the novel that are expected to be romantic, comedy, mystery, horror and others, by knowing the genre of the novel the reader will be able to know the story style of the novel. The method used in this research is TF-IDF (Term Frequency Inverse Document Frequency) and Naïve Bayes Classifier. The TF-IDF method is used to get the weight of each word contained in the resulting document is used in the Naïve Bayes Classifier method to get the synopsis classification results into genre. Based on the evaluation using a confusion matrix using 600 training data and 200 test data obtained an accuracy of 80.5%.AbstractIt is undeniable that technological progress is developing very quickly in the field of computers, now with computers the work that was originally done by humans can be taken over by computers to help human work itself, like case studi of this research is a system that can classification the text like synopsis into genre group. Genre is the style of story in a novel, there are many genres in the novel that are expected to be romantic, comedy, mystery, horror and others, by knowing the genre of the novel the reader will be able to know the story style of the novel. The method used in this research is TF-IDF (Term Frequency Inverse Document Frequency) and Naïve Bayes Classifier. The TF-IDF method is used to get the weight of each word contained in the resulting document is used in the Naïve Bayes Classifier method to get the synopsis classification results into genre. Based on the evaluation using a confusion matrix using 600 training data and 200 test data obtained an accuracy of 80.5%.

Download Full-text

Knowing Personality Traits on Facebook Status Using the Naïve Bayes Classifier

International Journal of Artificial Intelligence & Robotics (IJAIR) ◽

10.25139/ijair.v2i1.2636 ◽

2020 ◽

Vol 2 (1) ◽

pp. 22

Author(s):

Mohammad Zoqi Sarwani ◽

Muhammad Shubkhan Salafudin ◽

Dian Ahkam Sani

Keyword(s):

Social Media ◽

Big Five ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Surrounding Environment ◽

Testing Data

With the development of social media trends among students by using Facebook social media, students can communicate and pour out everything that is felt in the form of status. Personality is the character or various characters of a person - therefore, how a person to adjust to the surrounding environment for the achievement of communication smoothly. In the personality category, many things classify a person's category in the psychologist theory. In this exercise, the Big Five, the psychologist theory, is described in five codes, namely Openness, Conscientiousness, Extraversion, Agreeables, Neuroticism. Naive Bayes Classifier is used to determine the highest probability value with the aim to determine the highest value. The data used are two namely training data and testing data obtained from the Facebook status of students. From the data obtained can be tested in the system that the accuracy value is 88%.

Download Full-text

A Naïve Bayes Approach to Classifying Topics in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8945 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8945 ◽

Cited By ~ 9

Author(s):

Irena Spasić ◽

Pete Burnap ◽

Mark Greenwood ◽

Michael Arribas-Ayllon

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Suicide Notes ◽

Matching Rules ◽

F Measure

The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.

Download Full-text

Visualisasi dan Analisa Data Penyebaran Covid-19 dengan Metode Klasifikasi Naïve Bayes

Jurnal JTIK (Jurnal Teknologi Informasi dan Komunikasi) ◽

10.35870/jtik.v5i4.233 ◽

2021 ◽

Vol 5 (4) ◽

pp. 389

Author(s):

Muhammad Ikbal ◽

Septi Andryana ◽

Ratih Titi Komala Sari

Keyword(s):

General Public ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

The World ◽

The Status ◽

Case Data

The covid-19 virus became a pandemic in 2020. The spread of covid cases has hit the whole world, reaching 63 million cases in 190 countries as of November 2020. Information regarding the spread of covid is necessary for the general public. This research will produce a system that can provide information on the geographic distribution of covid cases. The data on the distribution of covid cases in this study were also used to analyze the classification using the Naive Bayes Classifier method. The Naive Bayes Classifier method works by using probability calculations so that this research can be used to classify the covid status in an area. The results of this study have succeeded in providing information on the status of the covid pandemic based on data on covid cases that have occurred around the world. Covid case data becomes training data for the analysis of the Naive Bayes classifier method so that it can determine the status of the Covid pandemic based on test data provided by system users. This research has succeeded in helping users to know the status of the Covid pandemic in an area well because it has reliable training data.Keywords:System, Covid, Naïve Bayes Classifier.

Download Full-text

Analisis Sentimen Masyarakat Terhadap Pilpres 2019 Berdasarkan Opini Dari Twitter Menggunakan Metode Naive Bayes Classifier

Journal of Computer and Information Systems Ampera ◽

10.51519/journalcisa.v1i3.45 ◽

2020 ◽

Vol 1 (3) ◽

pp. 185-199

Author(s):

Khoirul Zuhri ◽

Nurul Adha Oktarini Saputri

Keyword(s):

Sentiment Analysis ◽

Hate Speech ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

The Public ◽

Positive Sentiment

Twitter is a social media that is currently popular, where the public is free to comment and write anything. It is not uncommon for the public to comment with harsh words and even hate speech. The 2019 presidential election drew many comments, some praised, criticized and insulted. To be able to dig up information and classify a text, sentiment analysis is needed. In this study, sentiment analysis is a process of classifying textual documents into two classes, namely negative and positive sentiment classes. Opinion data were obtained from the Twitter social network in the form of tweets. The data used was 3337 tweets consisting of 80% training data and 20% training data. Training data is data with known sentiment. This study aims to determine whether a tweet is a positive or negative tweet conveyed on Twitter in Indonesian. The classification of tweet data uses the naïve Bayes classifier algorithm. The classification results of the test data show that the Naïve Bayes Classifier algorithm provides an accuracy value of 71%. The accuracy value for each sentiment is 71% for positive sentiment and 70% for negative sentiment

Download Full-text

Perbandingan Seleksi Fitur Term Frequency & Tri-Gram Character Menggunakan Algoritma Naïve Bayes Classifier (Nbc) Pada Tweet Hashtag #2019gantipresiden

Kilat ◽

10.33322/kilat.v9i1.878 ◽

2020 ◽

Vol 9 (1) ◽

pp. 103-114

Author(s):

Arini - Arini ◽

Luh Kesuma Wardhani ◽

Dimas - Octaviano

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

High Accuracy ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Term Frequency ◽

Selection Of

Towards an election year (elections) in 2019 to come, many mass campaign conducted through social media networks one of them on twitter. One online campaign is very popular among the people of the current campaign with the hashtag #2019GantiPresiden. In studies sentiment analysis required hashtag 2019GantiPresiden classifier and the selection of robust functionality that mendaptkan high accuracy values. One of the classifier and feature selection algorithms are Naive Bayes classifier (NBC) with Tri-Gram feature selection Character & Term-Frequency which previous research has resulted in a fairly high accuracy. The purpose of this study was to determine the implementation of Algorithm Naive Bayes classifier (NBC) with each selection and compare features and get accurate results from Algorithm Naive Bayes classifier (NBC) with both the selection of the feature. The author uses the method of observation to collect data and do the simulation. By using the data of 1,000 tweets originating from hashtag # 2019GantiPresiden taken on 15 September 2018, the author divides into two categories: 950 tweets as training data and 50 tweets as test data where the labeling process using methods Lexicon Based sentiment. From this study showed Naïve Bayes classifier algorithm accuracy (NBC) with feature selection Character Tri-Gram by 76% and Term-Frequency by 74%,the result show that the feature selection Character Tri-Gram better than Term-Frequency.

Download Full-text

Multi-Event Naive Bayes Classifier for Activity Recognition in the UCAmI Cup

Proceedings ◽

10.3390/proceedings2191264 ◽

2018 ◽

Vol 2 (19) ◽

pp. 1264

Author(s):

Antonio Jiménez ◽

Fernando Seco

Keyword(s):

Activity Recognition ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Short Paper ◽

Data Sets ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Everyday Objects

This short paper presents the activity recognition results obtained from the CAR-CSIC team for the UCAmI’18 Cup. We propose a multi-event naive Bayes classifier for estimating 24 different activities in real-time. We use all the sensorial information provided for the competition, i.e., binary sensors fixed to everyday objects, proximity BLE-based tags, location-aware smart floor sensing and the wrist’s acceleration. The results using training data-sets of 7 days show accuracies (true positives) about 68%; however for the three extra data-sets of the competition we were able to reach a 60.5% accuracy.

Download Full-text

Learning the naive Bayes classifier with optimization models

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2013-0059 ◽

2013 ◽

Vol 23 (4) ◽

pp. 787-795 ◽

Cited By ~ 30

Author(s):

Sona Taheri ◽

Musa Mammadov

Keyword(s):

Real World ◽

Naive Bayes ◽

Optimization Problems ◽

Naïve Bayes ◽

Training Data ◽

Optimization Models ◽

Naive Bayes Classifier ◽

Conditional Probabilities ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Abstract Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.

Download Full-text