scholarly journals Multi-Aspect Sentiment Analysis Hotel Review Using RF, SVM, and Naïve Bayes based Hybrid Classifier

2021 ◽  
Vol 5 (2) ◽  
pp. 630
Author(s):  
I Putu Ananda Miarta Utama ◽  
Sri Suryani Prasetyowati ◽  
Yuliant Sibaroni

In the hotel tourism sector, of course, it cannot be separated from the role of social media because tourists tend to share experiences about services and products offered by a hotel, such as adding pictures, reviews, and ratings which will be helpful as references for other tourists, for example on the media online TripAdvisor. However, tourists' many experiences regarding a hotel make some people feel confused in determining the right hotel to visit. Therefore, in this study, an aspect-based analysis of reviews on hotels is carried out, which will make it easier for tourists to determine the right hotel based on the best category aspects. The dataset used is the TripAdvisor Hotel Reviews dataset which is already on the Kaggle website. And has five aspects, namely Room, Location, Cleanliness, Registration, and Service. A review analysis was carried out into positive and negative categories using the Random Forest, SVM, and Naive Bayes based Hybrid Classifier methods to solve this problem. In this study the Hybrid Classifier method gets better accuracy than the classification using one algorithm on multi-aspect data, namely the Hybrid Classifier got an average accuracy 84%, Naïve Bayes got an average accuracy 82.4%, Random Forest got an average accuracy 82.2%, and use SVM got an average accuracy 81%

2020 ◽  
Vol 5 (3) ◽  
pp. 75
Author(s):  
Yulia Resti ◽  
Firmansyah Burlian ◽  
Irsyadi Yani ◽  
Des Alwine Zayanti ◽  
Indah Meiliana Sari

Cans is one type of inorganic waste that can take up to hundreds of years to be decomposed on the ground so that recycling is the right solution for managing cans waste. In the recycling industry, can classification systems are needed for the sorting system automation. This paper discusses the cans classification system based on the digital images using the Naive Bayes method, where the input variables are the pixel values of red, green, and blue (RGB) color, and the image of the can is captured by placing it on a conveyor belt which runs at a certain speed. The average accuracy rate of the k-fold cross-validation which is less satisfactory from the classification system obtained using the original Naive Bayes model is corrected using the fuzzy approach. This approach succeeded in improving the average accuracy of the can classification system which was originally from 52.99% to 88.02% or an increase of 60.2%, where the standard deviation decreased from 15.72% to only 3%. Cans is one type of inorganic waste that can take up to hundreds of years to be decomposed on the ground so that recycling is the right solution for managing cans waste. In the recycling industry, can classification systems are needed for the sorting system automation. This paper discusses the cans classification system based on the digital images using the Naive Bayes method, where the input variables are the pixel values of red, green, and blue (RGB) color, and the image of the can is captured by placing it on a conveyor belt which runs at a certain speed. The average accuracy rate of the k-fold cross-validation which is less satisfactory from the classification system obtained using the original Naive Bayes model is corrected using the fuzzy approach. This approach succeeded in improving the average accuracy of the can classification system which was originally from 52.99% to 88.02% or an increase of 60.2%, where the standard deviation decreased from 15.72% to only 3%.


2021 ◽  
Vol 14 (1) ◽  
pp. 60
Author(s):  
Ngurah Agus Sanjaya ER ◽  
I Gusti Agung Gede Arya Kadyanan

Udatari is the first traditional dance platform in Indonesia which provides information about traditional events such as, dance tutorials, group dancer and dance attributes. The tight competition in the startup world, requires Udatari as a new startup to manage application users optimally. Knowing loyal users will help startups determine the right marketing strategy. In this study, the method used for clustering is the K-Means method where this method seeks to classify existing data into several groups provided that the data in one group have the same characteristics as each other. The model used for the clustering process is RFM, namely recency, frequency and monetary. The purpose of this clustering is to get the segmentation of users who have different Customer Lifetime Value. The second method for conducting classification is the Naïve Bayes method, where this method predicts future opportunities based on past experiences. The purpose of this classification is to predict new users into the user segmentation obtained from the clustering results. From the results of this study, the optimum k value for K-Means are 3 clusters with the largest CLV value in the second cluster where testing on this method uses the Silhouette Index. Furthermore, for the test results of the Naïve Bayes method, the average accuracy value is 97.44% where the accuracy of each class is 92.31% for cluster 0 (first cluster), 100% for the second cluster and 100% for the third cluster. Keywords: K-Means, Naïve Bayes, Loyalty, Segmentation, RFM


2021 ◽  
Author(s):  
Fahime Khozeimeh ◽  
Danial Sharifrazi ◽  
Navid Hoseini Izadi ◽  
Javad Hassannataj Joloudari ◽  
Afshin Shoeibi ◽  
...  

Abstract Today, the use of artificial intelligence methods to diagnose and predict infectious and non-infectious diseases has attracted so much attention. Currently, COVID-19 is considered a new virus which has caused so many deaths worldwide. Due to the pandemic nature of COVID-19, the automated tools for the clinical diagnostic of this disease are highly desired. Convolutional Neural Networks (CNNs) have shown outstanding classification performance on image datasets. Up to our knowledge, COVID computer aided diagnosis systems based on CNNs and clinical information have been never analyzed or explored to date. Moreover, Most of existing literature on COVID-19 focuses on distinguishing infected individuals from non-infected ones. In this paper, we propose a novel method named CNN-AE to predict survival chance of COVID-19 patients using a CNN trained on clinical information. To further increase the prediction accuracy, we use the CNN in combination with an autoencoder. Our method is one of the first that aims to predict survival chance of already infected patients. We rely on clinical data to carry out the prediction. The motivation is that the required resources to prepare CT images are expensive and limited compared to the resources required to collect clinical data such as blood pressure, liver disease, etc. We evaluate our method on a publicly available clinical dataset of deceased and recovered patients which we have collected. Careful analysis of the dataset properties is also presented which consists of important features extraction and correlation computation between features. Since most of COVID-19 patients are usually recovered, the number of deceased samples of our dataset is low leading to data imbalance. To remedy this issue, a data augmentation procedure based on autoencoders is proposed. To demonstrate the generality of our augmentation method, we train random forest and Naïve Bayes on our dataset with and without augmentation and compare their performance. We also evaluate our method on another dataset for further generality verification. Experimental results reveal the superiority of CNN-AE method compared to the standard CNN as well as other methods such as random forest and Naïve Bayes. COVID-19 detection average accuracy of CNN-AE is 96.05% which is higher than CNN average accuracy of 92.49%. To show that clinical data can be used as a reliable dataset for COVID-19 survival chance prediction, CNN-AE is compared with a standard CNN which is trained on CT images.


2018 ◽  
Vol 5 (2) ◽  
pp. 60-67 ◽  
Author(s):  
Dwi Yulianto ◽  
Retno Nugroho Whidhiasih ◽  
Maimunah Maimunah

ABSTRACT   Banana fruit is a commodity that contributes a great value to both national and international fruit production achievement. The government through the National Standardization Agency establishes standards to maintain the quality of bananas. The purpose of this Project is to classify the stages of maturity of Ambon banana base on the color index using Naïve Bayes method in accordance with the regulations of SNI 7422:2009. Naive Bayes is used as a method in the classification process by comparing the probability values generated from the variable value of each model to determine the stage of Ambon banana maturity. The data used is the primary data image of 105 pieces of Ambon banana. By using 3 models which consists of different variables obtained the same greatest average accuracy by using the 2nd model which has 9 variable values (r, g, b, v, * a, * b, entropy, energy, and homogeneity) and the 3rd model has 7 variable values (r, g, b, v , * a, entropy and homogeneity) that is 90.48%.   Keywords: banana maturity, classification, image processing     ABSTRAK   Buah pisang merupakan komoditas yang memberikan kontribusi besar terhadap angka produksi buah nasional maupun internasional. Pemerintah melalui Badan Standarisasi Nasional menetapkan standar untuk buah pisang, menjaga mutu  buah pisang. Tujuan dari penelitian ini adalah klasifikasi tahapan kematangan dari buah pisang ambon berdasarkan indeks warna menggunakan metode Naïve Bayes  sesuai dengan SNI 7422:2009. Naive bayes digunakan sebagai metode dalam proses pengklasifikasian dengan cara membandingkan nilai probabilitas yang dihasilkan dari nilai variabel penduga setiap model untuk menentukan tahap kematangan pisang ambon. Data yang digunakan adalah data primer citra pisang ambon sebanyak 105. Dengan menggunakan 3 buah model yang terdiri dari variabel penduga yang berbeda didapatkan akurasi rata-rata terbesar yang sama yaitu dengan menggunakan model ke-2 yang mempunyai 9 nilai variabel (r, g, b, v, *a, *b, entropi, energi, dan homogenitas) dan model ke-3 yang mempunyai 7 nilai variabel (r, g, b, v, *a, entropi dan homogenitas) yaitu sebesar 90.48%.   Kata Kunci : kematangan pisang,  klasifikasi, pengolahan citra


2021 ◽  
Vol 2021 (1) ◽  
pp. 1012-1018
Author(s):  
Handy Geraldy ◽  
Lutfi Rahmatuti Maghfiroh

Dalam menjalankan peran sebagai penyedia data, Badan Pusat Statistik (BPS) memberikan layanan akses data BPS bagi masyarakat. Salah satu layanan tersebut adalah fitur pencarian di website BPS. Namun, layanan pencarian yang diberikan belum memenuhi harapan konsumen. Untuk memenuhi harapan konsumen, salah satu upaya yang dapat dilakukan adalah meningkatkan efektivitas pencarian agar lebih relevan dengan maksud pengguna. Oleh karena itu, penelitian ini bertujuan untuk membangun fungsi klasifikasi kueri pada mesin pencari dan menguji apakah fungsi tersebut dapat meningkatkan efektivitas pencarian. Fungsi klasifikasi kueri dibangun menggunakan model machine learning. Kami membandingkan lima algoritma yaitu SVM, Random Forest, Gradient Boosting, KNN, dan Naive Bayes. Dari lima algoritma tersebut, model terbaik diperoleh pada algoritma SVM. Kemudian, fungsi tersebut diimplementasikan pada mesin pencari yang diukur efektivitasnya berdasarkan nilai precision dan recall. Hasilnya, fungsi klasifikasi kueri dapat mempersempit hasil pencarian pada kueri tertentu, sehingga meningkatkan nilai precision. Namun, fungsi klasifikasi kueri tidak memengaruhi nilai recall.


Author(s):  
T R Stella Mary ◽  
Shoney Sebastian

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>


PLoS ONE ◽  
2014 ◽  
Vol 9 (1) ◽  
pp. e86703 ◽  
Author(s):  
Wangchao Lou ◽  
Xiaoqing Wang ◽  
Fan Chen ◽  
Yixiao Chen ◽  
Bo Jiang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document