The impact of different training data set on the accuracy of sentiment classification of Naïve Bayes technique

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

Iteration-based naive Bayes sentiment classification of microblog multimedia posts considering emoticon attributes

Multimedia Tools and Applications ◽

10.1007/s11042-020-08797-7 ◽

2020 ◽

Vol 79 (27-28) ◽

pp. 19151-19166

Author(s):

Yanmei Wang

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Sentiment Classification

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

Analysis of Sentiment Classification of Hotel Reviews Based on Multinomial Naive Bayes

2020 The 11th International Conference on E-business, Management and Economics ◽

10.1145/3414752.3414796 ◽

2020 ◽

Author(s):

Yang Zhirui ◽

Li Chunyan

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Sentiment Classification

Download Full-text

Optimasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Anomaly dengan Univariate Fitur Selection

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v4i2.2433 ◽

2020 ◽

Vol 4 (2) ◽

pp. 40-49

Author(s):

Harianto Harianto ◽

◽

Andi Sunyoto ◽

Sudarmawan Sudarmawan ◽

◽

...

Keyword(s):

Feature Selection ◽

Intrusion Detection System ◽

Naive Bayes ◽

Detection System ◽

Naïve Bayes ◽

Training Data ◽

Bayes Classifier ◽

Data Set ◽

System Data ◽

And Training

System and network security from interference from parties who do not have access to the system is the most important in a system. To realize a system, data or network that is safe at unauthorized users or other interference, a system is needed to detect it. Intrusion-Detection System (IDS) is a method that can be used to detect suspicious activity in a system or network. The classification algorithm in artificial intelligence can be applied to this problem. There are many classification algorithms that can be used, one of which is Naïve Bayes. This study aims to optimize Naïve Bayes using Univariate Selection on the UNSW-NB 15 data set. The features used only take 40 features that have the best relevance. Then the data set is divided into two test data and training data, namely 10%: 90%, 20%: 70%, 30%: 70%, 40%: 60% and 50%: 50%. From the experiments carried out, it was found that feature selection had quite an effect on the accuracy value obtained. The highest accuracy value is obtained when the data set is divided into 40%: 60% for both feature selection and non-feature selection. Naïve Bayes with unselected features obtained the highest accuracy value of 91.43%, while with feature selection 91.62%, using feature selection could increase the accuracy value by 0.19%.

Download Full-text

Application of the Naïve Bayes Algorithm for Student Graduation Analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.15.23596 ◽

2018 ◽

Vol 7 (4.15) ◽

pp. 421

Author(s):

Erick Akhmad Fahmi Alfa’izy ◽

Khairil Anam ◽

Naidah Naing ◽

Rosanita Tritias Utami ◽

Nur Anim Jauhariyah ◽

...

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

College System ◽

Student Graduation ◽

Bayes Algorithm ◽

Using Data ◽

Analysis System ◽

Law Student

Design an analysis system to find out graduation by comparing previous data and existing data to overcome errors in a college system. By taking data records that are already available to be processed using the naïve Bayes algorithm. This research was conducted at Universitas Maarif Hasyim Latif. In this case, the object of research is to analyze the data of students with naïve Bayes algorithms to find out their graduation. For sampling the data taken is the previous Faculty of Law Student data to be used as training data, to retrieve the entire data using data records that are already available in the Directorate of Information Systems. That the naïve Bayes algorithm can be used in the classification of data in the form of a string or textual. This is based on researchers' trials in taking examples of calculations that have been done before. To compare the results of the classification of graduation analysis using the naïve Bayes algorithm testing is done with a sample of data in the form of training data compared to data testing. From the calculations that have been made, the accuracy is 77.78%.

Download Full-text

ANALISIS MODEL NAIVE BAYES UNTUK IDENTIFIKASI PENGGOLONGAN DAYA LISTRIK DI KOTA LHOKSUMAWE

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.971 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Muhammad Saidi ◽

Fajriana Fajriana ◽

Wahyu Fuadi ◽

Ermatita Ermatita ◽

Iwan Pahendra

Keyword(s):

Naive Bayes ◽

Electrical Power ◽

Naïve Bayes ◽

Training Data ◽

Bayes Method ◽

Customer Data ◽

Poor Households ◽

House Area ◽

Naive Bayes Method

Electricity subsidy is provided for all 450 VA power household customers and 900 VA power household customers who are poor and disadvantaged. However, there are many facts that household customers with 450 VA power are capable and 900 VA power household customers consist of capable households, boarding houses or luxury rented. Households are able to use more electricity than poor households. This paper describe to the identification of household customers' electrical power in the Lhokseumawe city to facilitate PLN in classifying customer power by using the Naive Bayes method. Naive bayes value variables used in this study are: monthly income, highest diploma, last job, house area, subscription fee and government registered household. The classification of household customer power is grouped into three categories, namely low (450 VA down), medium (900 VA) and high (above 1300 VA).. Based on household customer data that is used as training data, the Naive Bayes method is able to classify the customer data tested. So the Naive Bayes method successfully predicts the magnitude of the probability of household electrical power with an accuracy percentage of 80%.Keywords: Electricity, Naive Bayes, CBS, low birth weight, subsidy

Download Full-text

Comparing tagging suggestion models on discrete corpora

International Journal of Web Information Systems ◽

10.1108/ijwis-08-2019-0035 ◽

2020 ◽

Vol 16 (2) ◽

pp. 201-221

Author(s):

Bojan Bozic ◽

Andre Rios ◽

Sarah Jane Delany

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Data Sets ◽

Data Set ◽

Content Type ◽

Advantages And Disadvantages ◽

Study Results ◽

Sample Data ◽

The Impact ◽

Tag Prediction

Purpose This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry. Design/methodology/approach The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data. Findings The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k. Originality/value The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.

Download Full-text

IMPLEMENTASI TEORI NAIVE BAYES DALAM KLASIFIKASI CALON MAHASISWA BARU STMIK KHARISMA MAKASSAR

SINTECH (Science and Information Technology) Journal ◽

10.31598/sintechjournal.v3i2.651 ◽

2020 ◽

Vol 3 (2) ◽

pp. 110-117

Author(s):

Irayori Loelianto ◽

Moh. Sofyan S Thayf ◽

Husni Angriani

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Design Stage ◽

Prospective Students ◽

Classifier Design ◽

Python Programming Language ◽

Bayes Theory ◽

Python Programming

STMIK KHARISMA Makassar has graduated thousands of alumni since it was founded. However, the number of students registering is uncertain every year, although from 2016 to 2019 there has been an increase in the number of registrations. The problem is the percentage of the number of prospective students registering has actually decreased significantly. The purpose of this research is to implement the Naive Bayes theory in classification of STMIK KHARISMA Makassar prospective students. This research basically uses the Naive Bayes theory as a classifier, and is made using the Python programming language. At the classifier design stage, there were a total of 499 data collected from 2016 to 2019. The data was divided by a ratio of 80:20 for training data and test data. The result from the research indicate the level of accuracy of the classifier reaches 73%.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MEMPREDIKSI PEMESANAN DRIVER GO-JEK ONLINE DENGAN MENGGUNAKAN METODE NAIVE BAYES (STUDI KASUS: PT. GO-JEK INDONESIA)

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v2i1.972 ◽

2018 ◽

Vol 2 (1) ◽

Author(s):

Delisman Laia ◽

Efori Buulolo ◽

Matias Julyus Fika Sirait

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Transportation Industry ◽

Data Set ◽

Data Mining Algorithms ◽

Taxi Service ◽

Bayes Algorithm ◽

Using Data

PT. Go-Jek Indonesia is a service company. Go-jek online is a technology-based motorcycle taxi service that leads the transportation industry revolution. Predictions on ordering go-jek drivers using data mining algorithms are used to solve problems faced by the company PT. Go-Jek Indonesia to predict the level of ordering of online go-to drivers. In determining the crowded and lonely time. The proposed method is Naive Bayes. Naive Bayes algorithm aims to classify data in certain classes. The purpose of this study is to look at the prediction patterns of each of the attributes contained in the data set by using the naive algorithm and testing the training data on testing data to see whether the data pattern is good or not. what will be predicted is to collect the data of the previous driver ordering, which is based on the day, time for one month. The Naive Bayes algorithm is used to predict the ordering of online go-to-go drivers that will be experienced every day by seeing each order such as morning, afternoon and evening. The results of this study are to make it easier for the company to analyze the data of each go-jek driver booking in taking policies to ensure that both drivers and consumers or customers.Keywords: Go-jek Driver, Data Mining, Naive Bayes

Download Full-text