Acep Saepulrohman ◽  
Sudin Saepudin ◽  
Dudih Gustian

Teknologi informasi dan komunikasi saat ini sangat berkembang pesat, salah satunya Aplikasi Chat atau pesan instan seperti WhatsApp, Line dan Telegram. Pada bulan Oktober 2020, mayoritas pengguna aplikasi pesan instan adalah pengguna aplikasi WhatsApp, dengan total 2 miliar pengguna. Sekalipun aplikasi whatsapp tersebut masuk dalam peringkat teratas dan mendapat skor tertinggi, akan tetapi hal tersebut tidak dapat dijadikan tolak ukur kepuasan karena masih terdapat pandangan yang negatif terhadap aplikasi whatsapp, sebagian pengguna menganggap bahwa whatsapp seringkali eror pada saat digunakan, kemudian masalah lain yang muncul seperti jaringan yang digunakan pengguna tidak stabil. Untuk melakukan analisis mengenai hal tersebut diperlukan pendekatan analisis sentimen guna mengkategorikan komentar pengguna menjadi positif atau negatif. Penelitian ini menggunakan algoritma Naïve Bayes dengan Support Vector Machine dalam menganalisa komentar positif dan negatif terhadap kepuasan pengguna aplikasi Whatsapp di Google Play Store. Dari hasil pengujian yang dilakukan terhadap 1500 data komentar pengguna, evaluasi model menggunakan 10 Fold Cross Validation menunjukan bahwa tingkat keakurasian untuk kepuasan pengguna aplikasi whatsapp berdasarkan algoritma Naïve Bayes adalah sebesar 70,40% dan Support Vector Machine sebesar 77,00%, sedangkan nilai AUC Naïve Bayes sebesar 0,585 dan Support Vector Machine adalah  0,876. Dari hasil tersebut algoritma Support Vector Machine dapat digunakan untuk penelitian dengan karakteristik  data yang sama.

Syed Md. Minhaz Hossain ◽  
Khaleque Md. Aashiq Kamal ◽  
Anik Sen ◽  
Iqbal H. Sarker

Short Message Service (SMS) is becoming the secure medium of communication due to large-scale global coverage, reliability, and power efficiency. As person--to--person (P2P) messaging is less secure than application-to-person (A2P) messaging, anyone can send a message, leading to the attack. Attackers mistreat this opportunity to spread malicious content, perform harmful activities, and abuse other people, commonly known as spam. Moreover, such messages can waste a lot of time, and important messages are sometimes overlooked. As a result, accurate spam detection in SMS and its computational time are burning issues. In this paper, we conduct six different experiments to detect SMS spam from the dataset of 5574 messages using machine learning classifiers such as Multinomial Naïve Bayes (MNB) and Support Vector Machine (SVM), considering variations of \textit{Term Frequency-- Inverse Document Frequency (TF--IDF)} features for exploring the trade-off among accuracy, F1-score and computational time. The experiments achieve the best result of the accuracy of 98.50\%, F1--score of 98\%, and area under roc curve (AUC) of 0.97 for multinomial naïve bayes classifier with TF--IDF after stemming.

Syed Md. Minhaz Hossain ◽  
Iqbal H. Sarker

Recently, spam emails have become a significant problem with the expanding usage of the Internet. It is to some extend obvious to filter emails. A spam filter is a system that detects undesired and malicious emails and blocks them from getting into the users' inboxes. Spam filters check emails for something "suspicious" in terms of text, email address, header, attachments, and language. However, we have used different features such as word2vec, word n-grams, character n-grams, and a combination of variable length n-grams for comparative analysis in our proposed approach. Different machine learning models such as support vector machine (SVM), decision tree (DT), logistic regression (LR), and multinomial naïve bayes (MNB) are applied to train the extracted features. We use different evaluation metrics such as precision, recall, f1-score, and accuracy to evaluate the experimental results. Among them, SVM provides 97.6 \% of accuracy, 98.8\% of precision, and 94.9\% of f1-score using a combination of n-gram features.

2021 ◽  
Vol 10 (2) ◽  
pp. 111-117
Yulia Aryani ◽  
Arie Wahyu Wijayanto

ABSTRAK – Klasifikasi merupakan salah satu topik utama dalam data mining atau machine learning. Klasifikasi adalah suatu pengelompokan data dimana data yang digunakan tersebut mempunyai kelas label atau target. Klasifikasi digunakan untuk mengambil data dan ditempatkan kedalam kelompok tertentu.  Studi tentang ionosfer penting untuk penelitian di berbagai domain, khususnya dalam sistem komunikasi.  Dalam penelitian ionosfer, perlu dilakukan klasifikasi radar yang berguna dan tidak berguna dari ionosfer. Pada makalah ini, akan dilakukan klasifikasi  terhadap data inosphere yang diambil dari UCI machine learning repository.  Klasifikasi dilakukan dengan menggunakan tiga metode klasifikasi, yakni  SVM ( Support Vector Machine ) , Naïve Bayes, dan Random Forest. Hasil dari percobaan ini bisa menunjukkan prediksi dari setiap percobaan dengan tingkat akurasi dan prediksi yang berbeda-beda di setiap metode yang digunakan. Hasil akurasi, presisi, dan recall terbaik didapatkan pada metode Random Forest dengan rasio data latih dan data uji sebesar 85% didapat akurasi dari data uji sebesar 90,57% dengan presisi sebesar 94,12%. Kata Kunci – Ionosfer; Klasifikasi; SVM; Naïve Bayes; Random Forest.

Rahma Aulia Siahaan ◽  
Marnis Nasution ◽  
Mila Nirmala Sari Hasibuan

Hati merupakan organ vital bagi manusia. Penyakit hati adalah gangguan pada setiap fungsi hati.Diagnosis dini penyakit hati sangat penting agar dapat diobati dan diobati dengan cepat. Di bidang medis, mendiagnosis penyakit radang hati menjadi hal yang agak sulit dilakukan. Namun, ada catatan medis yang menyimpan gejala pasien. Hal ini tentunya sangat menguntungkan bagi tenaga medis atau dokter. Mereka dapat menggunakan catatan medis sebelumnya sebagai bahan untuk membuat keputusan tentang diagnosis penyakit pasien. Teknik analisis manual konvensional yang selama ini digunakan sudah tidak efektif lagi untuk diagnosis. Seiring dengan perkembangan sistem berbasis pengetahuan medis, tuntutan penggunaan sistem pengetahuan berbasis komputer sebagai teknik analisis dalam mendiagnosis penyakit menjadi semakin penting. Dalam studi ini, peneliti akan menerapkan dan membandingkan beberapa metode klasifikasi data mining, antara lain algoritma C4.5, Naïve Bayes, dan k-Nearest Neighbor untuk mendiagnosis penyakit radang hati, kemudian membandingkan mana dari ketiga metode tersebut yang paling akurat. Berdasarkan hasil pengukuran performansi ketiga model menggunakan metode Cross Validation, Confusion Matrix dan ROC Curve, diketahui bahwa metode C4.5 merupakan metode terbaik dengan akurasi 70,99% dan under the curva (AUC). ) nilai 0,950, kemudian metode k-Nearest Neighbor dengan akurasi 67,19% dan nilai under the curve (AUC) 0,873, kemudian metode nave Bayes dengan tingkat akurasi 66,14% dan nilai under the curve (AUC) sebesar 0,742. kemudian bandingkan mana dari ketiga metode tersebut yang paling akurat. Berdasarkan hasil pengukuran performansi ketiga model menggunakan metode Cross Validation, Confusion Matrix dan ROC Curve, diketahui bahwa metode C4.5 merupakan metode terbaik dengan akurasi 70,99% dan under the curva (AUC). ) nilai 0,950, kemudian metode k-Nearest Neighbor dengan akurasi 67,19% dan nilai under the curve (AUC) 0,873, kemudian metode nave Bayes dengan tingkat akurasi 66,14% dan nilai under the curve (AUC) sebesar 0,742. kemudian bandingkan mana dari ketiga metode tersebut yang paling akurat. Berdasarkan hasil pengukuran performansi ketiga model menggunakan metode Cross Validation, Confusion Matrix dan ROC Curve, diketahui bahwa metode C4.5 merupakan metode terbaik dengan akurasi 70,99% dan under the curva (AUC). ) nilai 0,950, kemudian metode k-Nearest Neighbor dengan akurasi 67,19% dan nilai under the curve (AUC) 0,873, kemudian metode nave Bayes dengan tingkat akurasi 66,14% dan nilai under the curve (AUC) sebesar 0,742.

Vivi Nadenia Harahap ◽  
Deci Irmayani ◽  
Syaiful Zuhri Harahap

Gubernur DKI Jakarta saat ini, meski sudah terpilih sejak tahun 2017 selalu menarik untuk dibicarakan atau bahkan dikomentari. Komentar yang muncul berasal dari media secara langsung atau melalui media sosial. Twitter menjadi salah satu media sosial yang sering digunakan sebagai media untuk mengomentari gubernur terpilih bahkan bisa menjadi trending topic di media sosial Twitter. Netizen yang berkomentar pun beragam, ada yang selalu menge-Tweet kritik, ada yang berkomentar Positif, dan ada pula yang hanya me-retweet. Dalam penelitian ini, prediksi apakah Netizen aktif akan cenderung selalu menimbulkan komentar Positif atau Negatif akan dilakukan dalam penelitian ini. Model algoritma yang digunakan adalah Decision Tree, Naïve Bayes, Random Forest, dan juga Ensemble. Data Twitter yang diolah harus melalui preprocessing terlebih dahulu sebelum dilanjutkan menggunakan Rapidminer. Dalam uji coba menggunakan Rapidminer dilakukan dalam empat kali uji coba dengan membagi menjadi dua bagian yaitu data testing dan data latih. Perbandingan yang dilakukan adalah 10% data pengujian: 90% data pelatihan, kemudian 20% data pengujian: 80% data pelatihan, kemudian 30% data pengujian: 70% data pelatihan, dan yang terakhir adalah 35% data pengujian: 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini. 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini. 65% data pelatihan. Rata-rata Akurasi untuk algoritma Decision Tree adalah 93,15%, sedangkan untuk algoritma Naïve Bayes Akurasinya adalah 91,55%, kemudian untuk algoritma Random Forest adalah 93,41, dan yang terakhir adalah algoritma Ensemble dengan Akurasi sebesar 93,42%. sini.

Pastima Simanjuntak ◽  
Hotma Pangaribuan ◽  
Muhammad Taufik Syastra

Facial treatments or skincare treatments contained in beauty care are divided into two categories, namely home treatment (such as giving face soap, morning cream, night cream, etc.) and direct care (such as facials, chemical peels, and so on). Home treatment facials consist of a variety of care products. Each home treatment product has a specific function both for treating the face or fixing the skin on consumers' faces such as acne, black spots, blackheads, oily skin, and others. Therefore, in order to determine the right home treatment product for consumers, knowledge of the usefulness of a home treatment product is needed. One of the factors of trade problems that exist in Batam City, there are still many products that enter without knowing whether the product is safe or not to be used, especially for cosmetic or skincare products where many cosmetic products are not licensed by BPOM but can still be traded to the people of Batam City. Finding skincare cosmetics that are good for the community is very difficult, because too many skincare products are sold in the market that do not have a BPOM permit and it will be dangerous for people who use these products. It is also due to the absence of a recommendation from a doctor or a beautician, which causes the wrong or bad skincare selection and will have a bad impact on one's face. The purpose of this study was to make recommendations for the use of skincare products in Batam City. For this reason, through this research, the researcher intends to apply one of the data mining techniques with the naïve Bayes algorithm with software implementation using the Tanagra 4.1 software, where the results of this study can be used to see consumer buying patterns that have been neglected to increase product sales, and also see the decisions made to help recommendations for skincare use in Batam City.

Yarma Agustya Dewi Utami ◽  
Volvo Sihombing ◽  
Muhammad Halmi Dar

Sentiment analysis is an important research topic and is currently being developed. Sentiment analysis is carried out to see the opinion or tendency of a person's opinion on a problem or object, whether it tends to have a negative or positive view. The main purpose of this research is to find out public sentiment towards the Full Day school policy comments from the Facebook Page of the Ministry of Education and Culture of the Republic of Indonesia and to determine the performance of the Na-ïve Bayes Classifier Algorithm. The results of this study indicate that the public's negative sentiment towards the Full Day School policy is higher than positive or neutral sentiment. The highest accuracy value is the Naïve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most Facebook users who express opinions through comments. The highest accuracy value is the Naïve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most users. Facebook that expresses opinions through comments. The highest accuracy value is the Naïve Bayes Classifier algorithm with the tri-gram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy results.

2021 ◽  
Vol 6 (2) ◽  
pp. 78-89
Asep Hendra ◽  
Fitriyani Fitriyani

Healthcare service has the role to help and serve people to access medical services, i.e. providing medicines, medical consultation, or health control. Healthcare service has been transforming to a digital platform. Halodoc is one of the digital platforms that people can use for free or paid, user can also give reviews of Halodoc’s performance and services on Google Play Store to give feedback that Halodoc can use to evaluate and improve the app. The Google Play Store review is increasing every day. Therefore an analysis for the review with sentiment analysis for Halodoc’s review is needed, first phase of sentiment analysis for the review is preprocessing which has tokenization, transform to lower cases, filter stopword, dan filter token (by length) processes. The data is divided into two positive and negative classes with cross-validation and a k-fold validation value of 10, using Naïve Bayes Classifier algorithm with 81,68% accuracy and AUC 0.756, categorized as fair classification.

2021 ◽  
Sahar Andalib ◽  
Kunihiko Taira ◽  
H. Pirouz Kavehpour

Abstract Droplet evaporation plays crucial roles in biodiagnostics, microfabrication, and inkjet printing. Experimentally studying the evolution of a sessile droplet consisting of two or more components needs sophisticated equipment to control the vast parameter space affecting the physical process. On the other hand, non-axisymmetric nature of the problem, attributed to compositional perturbations, introduces challenges to numerical methods. In this work, droplet evaporation problem is studied from a new perspective. We analyze evolution of a sessile methanol droplet through data-driven classification and regression techniques. The models are trained using experimental data of methanol droplet evolution under various environmental humidity levels and substrate temperatures. At higher humidity levels, the interfacial tension and subsequently contact angle increase due to higher water uptake into droplet. Therefore, different regimes of evolution are observed due to adsorption-absorption and possibly condensation of water which turns the droplet into a binary system. We use classification algorithms to predict the regime of droplet with point-by-point analysis of droplet profile. Decision tree demonstrates a better performance compared to Na\text{\"i}ve Bayes (NB) classifier. Furthermore, through utilizing regression techniques, we predict the humidity level surrounding droplet as well as time evolution of macroscopic parameter (diameter or contact angle) of droplet. The prediction results show promising performance for four cases of methanol droplet evolution under conditions that are unseen by the model which demonstrates the capability of the model to capture the complex physics underlying binary droplet evolution.

