Classification of Text Documents based on Naive Bayes using N-Gram Features

Libraries have the main task in the processing of library materials by classifying books according to certain ways. Dewey Decimal Classification (DDC) is the method most commonly used in the world to determine book classification (labeling) in libraries. The advantages of this DDC method are universal and more systematic. However, this method is less efficient considering the large number of books that must be classified in a library, as well as labeling that must follow label updates on the DDC. An automatic classification system will be the perfect solution to this problem. Automatic classification can be done by applying the text mining method. In this study, searching for words in the book title was carried out with N-Gram (Unigram, Bigram, Trigram) as a feature generation. The features that have been raised are then selected for features. The process of book title classification is carried out using the Naïve Bayes Multinomial algorithm. This study examines the effect of Unigram, Bigram, Trigram on the classification of book titles using the feature extraction and selection feature on Multinomial Naïve Bayes algorithm. The test results show Unigram has the highest accuracy value of 74.4%.

Download Full-text

Klasifikasi Rating Otomatis pada Dokumen Teks Ulasan Produk Elektronik Menggunakan Metode N-gram dan Naïve Bayes

Jurnal Informatika Universitas Pamulang ◽

10.32493/informatika.v5i3.6110 ◽

2020 ◽

Vol 5 (3) ◽

pp. 295

Author(s):

Rahmawan Bagus Trianto ◽

Andri Triyono ◽

Dhika Malita Puspita Arum

Keyword(s):

Feature Extraction ◽

Naive Bayes ◽

Automatic Classification ◽

Naïve Bayes ◽

Lack Of Information ◽

N Gram ◽

Bayes Algorithm ◽

Online Product Ratings ◽

Product Description

Online product ratings usually provide descriptive reviews and also reviews in the form of ratings. Likewise, what was done at the Lazada online store. Descriptive review can provide a clear view compared to a rating review to other potential buyers. However, in reality there is a mismatch between the description review and the rating given. This creates a lack of information for sellers as well as potential buyers. Automatic classification of buyer descriptive reviews is proposed in this study so that there is a match between descriptive reviews and rating reviews. This automatic classification descriptive review uses the Naive Bayes algorithm with n-gram feature extraction and TF-IDF word weighting. The results of this study obtained the best accuracy of 94.06%, a recall of 91.73% and precision of 90.71% in Bigram feature extraction. With this accuracy value it can be used as a reference or model for classifying product description reviews, so that the feedback process between sellers and buyers can run well.

Download Full-text

Varying Naive Bayes Models with Applications to Classification of Chinese Text Documents

SSRN Electronic Journal ◽

10.2139/ssrn.2562219 ◽

2015 ◽

Author(s):

Guoyu Guan ◽

Jianhua Guo ◽

Hansheng Wang

Keyword(s):

Chinese Text ◽

Naive Bayes ◽

Naïve Bayes ◽

Text Documents

Download Full-text

Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter

Neural Computing and Applications ◽

10.1007/s00521-019-04248-z ◽

2019 ◽

Vol 31 (12) ◽

pp. 9207-9220 ◽

Cited By ~ 8

Author(s):

Jamilu Awwalu ◽

Azuraliza Abu Bakar ◽

Mohd Ridzwan Yaakub

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

N Gram

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text

Analysis and Classification of Danger Level in Android Applications Using Naive Bayes Algorithm

2018 6th International Conference on Information and Communication Technology (ICoICT) ◽

10.1109/icoict.2018.8528733 ◽

2018 ◽

Author(s):

Ridho Alif Utama ◽

Parman Sukarno ◽

Erwid Musthofa Jadied

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Android Applications ◽

Bayes Algorithm ◽

Danger Level

Download Full-text

Classification of community opinion on the use of the Transjakarta bus based on twitter social network using naïve bayes method

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1010/1/012030 ◽

2021 ◽

Vol 1010 ◽

pp. 012030

Author(s):

B. D Meilani ◽

R K Hapsari ◽

I F Novian

Keyword(s):

Social Network ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Method ◽

Naive Bayes Method

Download Full-text

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

JURNAL INFOTEL ◽

10.20895/infotel.v9i4.312 ◽

2017 ◽

Vol 9 (4) ◽

pp. 416 ◽

Cited By ~ 1

Author(s):

Nelly Indriani Widiastuti ◽

Ednawati Rainarli ◽

Kania Evita Dewi

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Support Vector ◽

Good Reputation ◽

Multiclass Support Vector Machine ◽

Simple Logistic ◽

Better Than

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

Classification of Text Documents based on Naive Bayes using N-Gram Features

Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents

Classification of Javanese Language Level on Articles Using Multinomial Naive Bayes and N-Gram Methods

Pengaruh N-Gram terhadap Klasifikasi Buku menggunakan Ekstraksi dan Seleksi Fitur pada Multinomial Naïve Bayes

Klasifikasi Rating Otomatis pada Dokumen Teks Ulasan Produk Elektronik Menggunakan Metode N-gram dan Naïve Bayes

Varying Naive Bayes Models with Applications to Classification of Chinese Text Documents

Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Analysis and Classification of Danger Level in Android Applications Using Naive Bayes Algorithm

Classification of community opinion on the use of the Transjakarta bus based on twitter social network using naïve bayes method

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

Export Citation Format