scholarly journals Understanding 21st Century Bordeaux Wines from Wine Reviews Using Naïve Bayes Classifier

Beverages ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 5 ◽  
Author(s):  
Zeqing Dong ◽  
Xiaowan Guo ◽  
Syamala Rajana ◽  
Bernard Chen

Wine has been popular with the public for centuries; in the market, there are a variety of wines to choose from. Among all, Bordeaux, France, is considered as the most famous wine region in the world. In this paper, we try to understand Bordeaux wines made in the 21st century through Wineinformatics study. We developed and studied two datasets: the first dataset is all the Bordeaux wine from 2000 to 2016; and the second one is all wines listed in a famous collection of Bordeaux wines, 1855 Bordeaux Wine Official Classification, from 2000 to 2016. A total of 14,349 wine reviews are collected in the first dataset, and 1359 wine reviews in the second dataset. In order to understand the relation between wine quality and characteristics, Naïve Bayes classifier is applied to predict the qualities (90+/89−) of wines. Support Vector Machine (SVM) classifier is also applied as a comparison. In the first dataset, SVM classifier achieves the best accuracy of 86.97%; in the second dataset, Naïve Bayes classifier achieves the best accuracy of 84.62%. Precision, recall, and f-score are also used as our measures to describe the performance of our models. Meaningful features associate with high quality 21 century Bordeaux wines are able to be presented through this research paper.

With the increase in the usage of mobile technology, the rate of information is duplicated as a huge volume. Due to the volume duplication of message, the identification of spam messages leads to challenging task. The growth of mobile usage leads to instant communication only through messages. This drastically leads to hackers and unauthorized users to the spread and misuse of sending spam messages. The identification of spam messages is a research oriented problem for the mobile service providers in order to raise the number of customers and to retain them. With this overview, this paper focuses on identifying and prediction of spam and ham messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The identification of spam and ham messages is done in the following ways. Firstly, the levels of spread of target variable namely spam or ham is identified and they are depicted as a graph. Secondly, the essential tokens that are responsible for the spam and ham messages are identified and they are found by using the hashing Vectorizer and it is portrayed in the form of spam and Ham messages word cloud. Thirdly, the hash vectorized SMS Spam Message detection dataset is fitted to various classifiers like Ada Boost Classifier, Extra Tree classifier, KNN classifier, Random Forest classifier, Linear SVM classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. The evaluation of the classifier models are done by analyzing the Performance analysis metrics like Accuracy, Recall, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Linear Support Vector Machine classifier have achieved the effective performance indicators with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.71%.


2021 ◽  
Vol 2 (2) ◽  
pp. 96-104
Author(s):  
REYNALDA NABILA CIKANIA

Halodoc is a telemedicine-based healthcare application that connects patients with health practitioners such as doctors, pharmacies, and laboratories. There are some comments from halodoc users, both positive and negative comments. This indicates the public's concern for the Halodoc application so it is necessary to analyze the sentiment or comments that appear on the Halodoc application service, especially during the COVID-19 pandemic in order for Halodoc application services to be better. The Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) algorithms are used to analyze the public sentiment of Halodoc's telemedicine service application users. The negative category sentiment classification result was 12.33%, while the positive category sentiment was 87.67% from 5,687 reviews which means that the positive review sentiment is more than the negative review sentiment. The accuracy performance of the Naive Bayes Classifier Algorithm resulted in an accuracy rate of 87.77% with an AUC value of 57.11% and a G-Mean of 40.08%, while svm algorithm with KERNEL RBF had an accuracy value of 86.1% with an AUC value of 60.149% and a G-Mean value of 49.311%. Based on the accuracy value of the model can be known SVM Kernel RBF model better than NBC on classifying the review of user sentiment of halodoc telemedicine service


2019 ◽  
Vol 12 (2) ◽  
pp. 32-38
Author(s):  
Iin Ernawati

This study was conducted to text-based data mining or often called text mining, classification methods commonly used method Naïve bayes classifier (NBC) and support vector machine (SVM). This classification is emphasized for Indonesian language documents, while the relationship between documents is measured by the probability that can be proven with other classification algorithms. This evident from the conclusion that the probability result Naïve Bayes Classifier (NBC) word “party” at least in the economic document and political. Then the result of the algorithm support vector machine (svm) with the word “price” and “kpk” contains in both economic and politic document.  


2021 ◽  
Vol 20 (2) ◽  
pp. 177
Author(s):  
Putri Agung Permatasari ◽  
Linawati Linawati ◽  
Lie Jasa

Media sosial saat ini telah menjadi bagian penting dalam kehidupan sehari-hari tidak hanya untuk kebutuhan pribadi melainkan bisa di gunakan dalam bisnis, serta banyak hal yang bisa dilakukan. Media sosial yang digunakan seperti Facebook, Twitter, Youtube, Instagram, Likenid, dan Whatsapp. Dengan adanya media sosial tersebut banyaknya data yang ada berupa gambar, comment berupa text atau emoticon, video, dan lainnya, sehingga masyarakat bebas beropini. Dengan adanya analisis sentimen opini yang berkembang dan banyak di media sosial tersebut dapat menghasilkan data dan informasi yang bermanfaat. Dalam analisis sentimen diperlukannya algoritma klasifikasi data diantaranya Naive Bayes Classifier, Support Vector Machine, K-NN, RNN, C4.5, Lexicon Based, LDA Based Topic Modeling, dan beberapa algoritma lainnya. Artikel ini menelaah beberapa literature analisis sentimen pada media sosial. Saat ini media sosial yang sering digunakan dalam analisis adalah Twitter dan pengguna algoritma yang dapat meningkatkan tingkat akurasi adalah algoritma Naive Bayes Classifier dan Support Vector Machine.  Hasil perhitungan akurasi klasifikasi data berbeda-beda terlihat pada data uji pada penelitian tersebut.


Author(s):  
Debby Alita ◽  
Sigit Priyanta ◽  
Nur Rokhman

Background: Indonesia is an active Twitter user that is the largest ranked in the world. Tweets written by Twitter users vary, from tweets containing positive to negative responses. This agreement will be utilized by the parties concerned for evaluation.Objective: On public comments there are emoticons and sarcasm which have an influence on the process of sentiment analysis. Emoticons are considered to make it easier for someone to express their feelings but not a few are also other opinion researchers, namely by ignoring emoticons, the reason being that it can interfere with the sentiment analysis process, while sarcasm is considered to be produced from the results of the sarcasm sentiment analysis in it.Methods: The emoticon and no emoticon categories will be tested with the same testing data using classification method are Naïve Bayes Classifier and Support Vector Machine. Sarcasm data will be proposed using the Random Forest Classifier, Naïve Bayes Classifier and Support Vector Machine method.Results: The use of emoticon with sarcasm detection can increase the accuracy value in the sentiment analysis process using Naïve Bayes Classifier method.Conclusion: Based on the results, the amount of data greatly affects the value of accuracy. The use of emoticons is excellent in the sentiment analysis process. The detection of superior sarcasm only by using the Naïve Bayes Classifier method due to differences in the amount of sarcasm data and not sarcasm in the research process.Keywords:  Emoticon, Naïve Bayes Classifier, Random Forest Classifier, Sarcasm, Support Vector Machine


With the growing volume and the amount of spam message, the demand for identifying the effective method for spam detection is in claim. The growth of mobile phone and Smartphone has led to the drastic increase in the SMS spam messages. The advancement and the clean process of mobile message servicing channel have attracted the hackers to perform their hacking through SMS messages. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the owners. With this background, this paper focuses on predicting the Spam SMS messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of Spam message detection is achieved in four ways. Firstly, the distribution of the target variable Spam Type the dataset is identified and represented by the graphical notations. Secondly, the top word features for the Spam and Ham messages in the SMS messages is extracted using Count Vectorizer and it is displayed using spam and Ham word cloud. Thirdly, the extracted Counter vectorized feature importance SMS Spam Message detection dataset is fitted to various classifiers like KNN classifier, Random Forest classifier, Linear SVM classifier, Ada Boost classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. Performance analysis is done by analyzing the performance metrics like Accuracy, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Multinomial Naive Bayes classifier have achieved the effective prediction with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.20%..


Sign in / Sign up

Export Citation Format

Share Document