Klasifikasi Rating Otomatis pada Dokumen Teks Ulasan Produk Elektronik Menggunakan Metode N-gram dan Naïve Bayes

Rahmawan Bagus Trianto; Andri Triyono; Dhika Malita Puspita Arum

doi:10.32493/informatika.v5i3.6110

Klasifikasi Rating Otomatis pada Dokumen Teks Ulasan Produk Elektronik Menggunakan Metode N-gram dan Naïve Bayes

Jurnal Informatika Universitas Pamulang ◽

10.32493/informatika.v5i3.6110 ◽

2020 ◽

Vol 5 (3) ◽

pp. 295

Author(s):

Rahmawan Bagus Trianto ◽

Andri Triyono ◽

Dhika Malita Puspita Arum

Keyword(s):

Feature Extraction ◽

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Lack Of Information ◽

N Gram ◽

Bayes Algorithm ◽

Online Product Ratings ◽

Product Description

Online product ratings usually provide descriptive reviews and also reviews in the form of ratings. Likewise, what was done at the Lazada online store. Descriptive review can provide a clear view compared to a rating review to other potential buyers. However, in reality there is a mismatch between the description review and the rating given. This creates a lack of information for sellers as well as potential buyers. Automatic classification of buyer descriptive reviews is proposed in this study so that there is a match between descriptive reviews and rating reviews. This automatic classification descriptive review uses the Naive Bayes algorithm with n-gram feature extraction and TF-IDF word weighting. The results of this study obtained the best accuracy of 94.06%, a recall of 91.73% and precision of 90.71% in Bigram feature extraction. With this accuracy value it can be used as a reference or model for classifying product description reviews, so that the feedback process between sellers and buyers can run well.

Download Full-text

Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i1.2672 ◽

2021 ◽

Vol 5 (1) ◽

pp. 264

Author(s):

Esti Mulyani ◽

Fachrul Pralienka Bani Muhamad ◽

Kurnia Adi Cahyanto

Keyword(s):

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Main Task ◽

Test Results ◽

Book Title ◽

Feature Extraction And Selection ◽

N Gram ◽

Bayes Algorithm

Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.

Download Full-text

Sentiment Analysis Of Government Policy On Corona Case Using Naive Bayes Algorithm

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.60718 ◽

2021 ◽

Vol 15 (1) ◽

pp. 55

Author(s):

Auliya Rahman Isnain ◽

Nurman Satya Marga ◽

Debby Alita

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Naive Bayes ◽

Text Processing ◽

Naïve Bayes ◽

New Normal ◽

Public Sentiment ◽

N Gram ◽

Bayes Algorithm ◽

Economic Stabilization

The Indonesian government has enforced the New Normal rule in maintaining economic stabilization and also restraining the spread of the virus during the Covid 19 pandemic. This has become a hot topic of conversation on social media Twitter, many people think positive and negative.The research conducted is a representation of text mining and text processing using machine learning using the Naive Bayes Classifier classification method, the objective of the analysis is to determine whether public sentiment towards the New Normal policy is positive or negative, and also as a basis for measuring the performance of the TF-IDF feature extraction and N-gram in machine learning uses the Naive Bayes method.The results of this study resulted in the accuracy rate of the Naive Bayes method with the TF-IDF feature selection. The total accuracy was 81% with a Precision value of 78%, Recall 91%, and f1-Score 84%. The highest results were obtained from the use of the Naive Bayes and Trigram algorithm parameters, namely 84%, namely 84% Precision, 86% Recall, and 85% f1-Score. The Naive Bayes algorithm with the use of the trigram type N-Gram feature extraction shows a fairly good performance in the process of classifying public tweet data.

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text

Analysis and Classification of Danger Level in Android Applications Using Naive Bayes Algorithm

2018 6th International Conference on Information and Communication Technology (ICoICT) ◽

10.1109/icoict.2018.8528733 ◽

2018 ◽

Author(s):

Ridho Alif Utama ◽

Parman Sukarno ◽

Erwid Musthofa Jadied

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Android Applications ◽

Bayes Algorithm ◽

Danger Level

Download Full-text

Classifying the Level of Energy-Environmental Efficiency Rating of Brazilian Ethanol

Energies ◽

10.3390/en13082067 ◽

2020 ◽

Vol 13 (8) ◽

pp. 2067

Author(s):

Nilsa Duarte da Silva Lima ◽

Irenilza de Alencar Nääs ◽

João Gilberto Mendes dos Reis ◽

Raquel Baracat Tosi Rodrigues da Silva

Keyword(s):

Decision Tree ◽

High Efficiency ◽

Rating Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Environmental Efficiency ◽

Classification Model ◽

Bayes Algorithm ◽

J48 Decision Tree

The present study aimed to assess and classify energy-environmental efficiency levels to reduce greenhouse gas emissions in the production, commercialization, and use of biofuels certified by the Brazilian National Biofuel Policy (RenovaBio). The parameters of the level of energy-environmental efficiency were standardized and categorized according to the Energy-Environmental Efficiency Rating (E-EER). The rating scale varied between lower efficiency (D) and high efficiency + (highest efficiency A+). The classification method with the J48 decision tree and naive Bayes algorithms was used to predict the models. The classification of the E-EER scores using a decision tree using the J48 algorithm and Bayesian classifiers using the naive Bayes algorithm produced decision tree models efficient at estimating the efficiency level of Brazilian ethanol producers and importers certified by the RenovaBio. The rules generated by the models can assess the level classes (efficiency scores) according to the scale discretized into high efficiency (Classification A), average efficiency (Classification B), and standard efficiency (Classification C). These results might generate an ethanol energy-environmental efficiency label for the end consumers and resellers of the product, to assist in making a purchase decision concerning its performance. The best classification model was naive Bayes, compared to the J48 decision tree. The classification of the Energy Efficiency Note levels using the naive Bayes algorithm produced a model capable of estimating the efficiency level of Brazilian ethanol to create labels.

Download Full-text

Classification of EEG Signal for Detecting Cybersickness through Time Domain Feature Extraction using NaÏve Bayes

2018 International Conference on Computer Engineering, Network and Intelligent Multimedia (CENIM) ◽

10.1109/cenim.2018.8711320 ◽

2018 ◽

Cited By ~ 3

Author(s):

Moch.Asyroful Mawalid ◽

Alfi Zuhriya Khoirunnisa ◽

Mauridhi Hery Purnomo ◽

Adhi Dharma Wibawa

Keyword(s):

Feature Extraction ◽

Time Domain ◽

Naive Bayes ◽

Naïve Bayes ◽

Eeg Signal

Download Full-text

Feature Extraction and Classification of Proteomics Data Using Stationary Wavelet Transform and Naive Bayes Classifier

2010 4th International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2010.5516610 ◽

2010 ◽

Author(s):

Dan Liu ◽

Yuan-yuan Huang ◽

Chen-xiang Ma

Keyword(s):

Feature Extraction ◽

Wavelet Transform ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Stationary Wavelet Transform ◽

Proteomics Data ◽

Naïve Bayes Classifier

Download Full-text

Automatic classification of leukocytes using morphological features and Naïve Bayes classifier

2016 IEEE Region 10 Conference (TENCON) ◽

10.1109/tencon.2016.7848161 ◽

2016 ◽

Cited By ~ 9

Author(s):

Anjali Gautam ◽

Priyanka Singh ◽

Balasubramanian Raman ◽

Harvendra Bhadauria

Keyword(s):

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Morphological Features ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Analisis Klasifikasi Kanker Payudara Menggunakan Algoritma Naive Bayes

INFORMAL: Informatics Journal ◽

10.19184/isj.v4i3.14170 ◽

2020 ◽

Vol 4 (3) ◽

pp. 117

Author(s):

Hardian Oktavianto ◽

Rahman Puji Handri

Keyword(s):

Breast Cancer ◽

Naive Bayes ◽

Naïve Bayes ◽

World Health ◽

Average Percentage ◽

Average Value ◽

Treatment Measures ◽

Bayes Algorithm ◽

Health Organization

Breast cancer is one of the highest causes of death among women, this disease ranks second cause of death after lung cancer. According to the world health organization, 1 million women get a diagnosis of breast cancer every year and half of them die, in general this is due to early treatment and slow treatment resulting in new cancers being detected after entering the final stage. In the field of health and medicine, machine learning-based classification has been carried out to help doctors and health professionals in classifying the types of cancer, to determine which treatment measures should be performed. In this study breast cancer classification will be carried out using the Naive Bayes algorithm to group the types of cancer. The dataset used is from the Wisconsin breast cancer database. The results of this study are the ability of the Naive Bayes algorithm for the classification of breast cancer produces a good value, where the average percentage of correctly classified data reaches 96.9% and the average percentage of data is classified as incorrect only 3.1%. While the level of effectiveness of classification with naive bayes is high, where the average value of precision and recall is around 0.96. The highest precision and recall values are when the test data uses a percentage split of 40% with the respective values reaching 0.974 and 0.973.

Download Full-text

ANALISIS SENTIMEN PADA PEMERINTAHAN TERPILIH PADA PILPRES 2019 DITWITTER MENGGUNAKAN ALGORITME NAÏVEBAYES

JURTEKSI ◽

10.33330/jurteksi.v7i1.851 ◽

2020 ◽

Vol 7 (1) ◽

pp. 101-106

Author(s):

Febby Apri Wenando ◽

Regiolina Hayami ◽

Agung Jefrianto Anggrawan

Keyword(s):

Presidential Election ◽

Naive Bayes ◽

Vice President ◽

Naïve Bayes ◽

Weighting Method ◽

The Social ◽

Twitter Account ◽

N Gram ◽

Bayes Algorithm ◽

Modeling Data

Abstract: The Presidential general election on 2019 became one of the most popular topics on twitter nowdays. The society give their opinion about the pair of candidates that they are support through the social media. This research was predicts about the society sentimens toward the candidates of President and Vice President of Republic of Indonesia. The data was used based on the tweet on the @jokowi twitter account. The retrieval of data by using the Tweepy library with the Python 2.7 programming language. This research was classified became of two of society sentiments classes, namely positive and negative. The modeling was used of the weighting method Unigram, Bigram, Trigram, N-Gram (1-2) and N-Gram (1-3) that used the Naïve Bayes Algorithm on the Weka Application. The modeling data was used by the dataset of 646 sentences. The highest results of this reseach were obtained by Unigram Weighting, namely: 81.4% accuracy, 81.5% precision, 81.3% recall with a time of 0.3 s.Keywords: classification, naïve bayes, 2019 presidential election, twitter, unigram Abstrak: Pemilihan Umum tentang Pilpres 2019 menjadi salah satu topik yang ramai diperbincangkan di Twitter. Adu pendapat di sosial media oleh masyarakat mengandung opini terhadap pasangan calon yang didukungnya. Penelitian ini memprediksi sentimen masyarakat kepada pasangan calon Presiden dan Wakil Presiden Republik Indonesia. Data yang digunakan adalah tweet yang ada pada akun Twitter @jokowi. Pengambilan data menggunakan library Tweepy dengan bahasa pemrograman Python 2.7. Penelitian ini mengklasifikasi sentimen masyarakat menjadi 2 kelas, yaitu positif dan negatif. Kemudian dilakukan pemodelan dengan metode pembobotan Unigram, Bigram, Trigram, N-Gram (1-2) Dan N-Gram (1-3) menggunakan Algoritme Naïve Bayes pada Aplikasi Weka. Pembuatan model menggunakan dataset yang berjumlah 646 kalimat. Hasil tertinggi yang diperoleh pada penelitian ini adalah dengan menggunakan Pembobotan Unigram, yaitu : akurasi 81,4%, presisi 81,5 % , recall 81,3 % dengan catatan waktu 0,3s.Kata kunci: klasifikasi, naïve bayes, pilpres 2019, twitter, unigram.

Download Full-text