IMPLEMENTASI PENDETEKSIAN SPAM EMAIL MENGGUNAKAN METODE TEXT MINING DENGAN ALGORITMA NAÏVE BAYES DAN DECISION TREE J48

Email cukup populer sebagai salah satu media komunikasi digital. Hal tersebut dikarenakan proses pengiriman pesan dengan email yang mudah. Sayangnya, kebanyakan pesan dalam email adalah email spam. Spam adalah pesan yang tidak diinginkan penerima pesan karena spam biasanya berisi pesan iklan maupun pesan penipuan. Ham adalah pesan yang diinginkan penerima pesan. Salah satu cara untuk menyortir pesan-pesan tersebut adalah dengan melakukan pengklasifikasian pesan email menjadi spam maupun ham. Naïve Bayes dan decision tree J48 ialah algoritma yang dapat digunakan untuk mengklasifikasikan pesan email. Oleh karena itu, penelitian ini bertujuan membandingkan efektifitas algoritma Naïve Bayes dan decision tree J48 dalam penyortiran email spam. Metode yang digunakan adalah text mining. Data yang berisi teks pesan email berbahasa Inggris akan diproses terlebih dahulu sebelum diklasifikasikan dengan Naïve Bayes dan decision tree J48. Tahap pra proses tersebut meliputi tokenisasi, pembuangan stop word list, stemming, dan seleksi atribut. Selanjutnya, data teks pesan email akan diproses dengan algoritma Naïve Bayes dan decision tree J48. Algoritma Naïve Bayes adalah algoritma pengklasifikasi yang berdasarkan pada teori keputusan Bayesian sedangkan algoritma decision tree J48 ialah pengembangan dari algoritma decision tree ID3. Hasil penelitian ini adalah algoritma decision tree J48 mendapat akurasi yang lebih tingggi dari algoritma Naïve Bayes. Algoritma decision tree J48 mendapat 93,117% sedangkan Naïve Beyes memiliki akurasi 88,5284%. Kesimpulan dari penelitian ini adalah algoritma decision tree J48 lebih unggul dibanding Naive Bayes untuk menyortir email spam jika dilihat dari tingkat akurasi masing-masing algoritma.

Download Full-text

Centroid Based Classifier With TF – IDF – ICF for Classfication of Student’s Complaint at Appliation E-Complaint in Muhammadiyah University of Sidoarjo

JEEE-U (Journal of Electrical and Electronic Engineering-UMSIDA) ◽

10.21070/jeee-u.v1i1.23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 17 ◽

Cited By ~ 1

Author(s):

Mochamad Alfan Rosid ◽

Gunawan Gunawan ◽

Edwin Pramana

Keyword(s):

Text Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Base Classifier

Text mining mengacu pada proses mengambil informasi berkualitas tinggi dari teks. Informasi berkualitas tinggi biasanya diperoleh melalui peramalan pola dan kecenderungan melalui sarana seperti pembelajaran pola statistik. Salah satu kegiatan penting dalam text mining adalah klasifikasi atau kategorisasi teks. Kategorisasi teks sendiri saat ini memiliki berbagai metode antara lain metode K-Nearest Neighbor, Naïve Bayes, dan Centroid Base Classifier, atau decision tree classification.Pada penelitian ini, klasifikasi keluhan mahasiswa dilakukan dengan metode centroid based classifier dan dengan fitur TF-IDF-ICF, Ada lima tahap yang dilakukan untuk mendapatkan hasil klasifikasi. Tahap pengambilan data keluhan kemudian dilanjutkan dengan tahap preprosesing yaitu mempersiapkan data yang tidak terstruktur sehingga siap digunakan untuk proses selanjutnya, kemudian dilanjutkan dengan proses pembagian data, data dibagi menjadi dua macam yaitu data latih dan data uji, tahap selanjutnya yaitu tahap pelatihan untuk menghasilkan model klasifikasi dan tahap terakhir adalah tahap pengujian yaitu menguji model klasifikasi yang telah dibuat pada tahap pelatihan terhadap data uji. Keluhan untuk pengujian akan diambilkan dari database aplikasi e-complaint Universitas Muhammadiyah Sidoarjo. Adapun hasil uji coba menunjukkan bahwa klasifikasi keluhan dengan algoritma centroid based classifier dan dengan fitur TF-IDF-ICF memiliki rata-rata akurasi yang cukup tinggi yaitu 79.5%. Nilai akurasi akan meningkat dengan meningkatnya data latih dan efesiensi sistem semakin menurun dengan meningkatnya data latih.

Download Full-text

The comparation of text mining with Naive Bayes classifier, nearest neighbor, and decision tree to detect Indonesian swear words on Twitter

2017 5th International Conference on Cyber and IT Service Management (CITSM) ◽

10.1109/citsm.2017.8089231 ◽

2017 ◽

Cited By ~ 7

Author(s):

Wildan Budiawan Zulfikar ◽

Mohamad Irfan ◽

Cecep Nurul Alam ◽

Muhammad Indra

Keyword(s):

Text Mining ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Analisis Sentimen Penilaian Tempat Tujuan Wisata Kota Tegal Berbasis Text Mining

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v5i2.32661 ◽

2019 ◽

Vol 5 (2) ◽

pp. 191 ◽

Cited By ~ 1

Author(s):

Oman Somantri ◽

Dairoh Dairoh

Keyword(s):

Text Mining ◽

Decision Tree ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes

Mendapatkan sebuah informasi untuk meningkatkan pelayanan dan strategi dalam pengelolaan tempat kunjungan wisata masih sedikit dan sulit dikarenakan informasi yang didapatkan masih terbatas. Media sosial memiliki peranan dalam memberikan sebuah data terhadap penilaian kunjungan wisata, sama halnya dengan tempat-tempat wisata yang berada di wilayah Tegal dan sekitarnya. Pada penelitian ini model sentiment analysis diusulkan sebagai solusi untuk mengatasi permasalahan. Tujuan dari penelitian ini adalah mencari model sistem untuk memberikan sebuah informasi pendukung keputusan bagi para wisatawan dan pengelola tempat wisata untuk dijadikan sumber informasi terhadap tempat wisata yang ada. Metode penelitian yang digunakan adalah dengan melakukan eksperimen untuk mendapatkan model yang diharapkan. Pada penelitian ini metode Naïve Bayes dan Decision Tree diterapkan untuk klasifikasi teks sehingga mendapatkan model terbaik yang dihasilkan untuk diimplementasikan pada sistem sehingga model yang didapatkan dapat dijadikan sebuah sistem pendukung keputusan untuk pengembangan sistem cerdas pada pihak terkait dalam upaya peningkatan nilai jual potensi daerah khususnya bidang pariwisata. Hasil penelitian menunjukan bahwa model yang didapatkan setelah dilakukan eksperimen didapatkan tingkat akurasi naïve bayes menghasilkan 77,50% lebih baik dibandingkan dengan menggunakan Decision Tree yang menghasilkan tingkat akurasi 60,83%.

Download Full-text

Prediksi Sentimen Investor Pasar Modal Di Jejaring Sosial Menggunakan Text Mining

BALANCE: Economic, Business, Management and Accounting Journal ◽

10.30651/blc.v18i2.7226 ◽

2021 ◽

Vol 18 (2) ◽

pp. 32

Author(s):

Aestikani Mahani ◽

Hendro Margono

Keyword(s):

Text Mining ◽

Decision Tree ◽

Capital Market ◽

Naive Bayes ◽

Stock Exchange ◽

Investor Sentiment ◽

Naïve Bayes ◽

Classification Model ◽

Business World ◽

The Capital Market

The decline in optimism for capital market investors is one of the financial impacts on the business world that arose from the SARS-COVID19 pandemic. This event was reflected in a decrease in trading volume followed by a sharp drop in the JCI on the Indonesia Stock Exchange starting March 2020. Thus, a slowdown in the economic recovery resulting from the pandemic is reflected in investor sentiment in the capital market. On the one hand, the rapid development of the internet in Indonesia has triggered the investor's activities in the information searching prior buy and sell securities, mostly use online platforms, which contribute to influencing investor preferences and sentiment. This study conducted a qualitative examination of the features/terms of stock investment in the capital market and collected them in a compact dictionary (lexicon). Therefore, lexicon-based investor opinion extraction was extracted from Twitter, followed by the text sentiment analysis, and forming a classification model based on Naive Bayes and Decision Tree. This research output shows that the polarity of capital market investor sentiment is optimistic with the sentiment features that often appear, namely "cuan", "bearish," "serok", "copet", "untung", "cut loss", and "nyangkut." Meanwhile, the Decision Tree classification model provides better performance.Keywords : investor, lexicon, social network, stock exchange, text miningCorrespondence to : [email protected] Penurunan optimisme investor pasar modal adalah salah satu dampak keuangan pada dunia usaha yang timbul akibat pandemi SARS-COVID19. Hal ini tercermin dari turunnya volume perdagangan yang diikuti penurunan tajam IHSG di Bursa Efek Indonesia mulai Maret 2020. Sehingga kekhawatiran atas perlambatan pemulihan ekonomi sebagai dampak pandemi, tercermin dari sentimen investor di pasar modal. Di satu sisi, perkembangan internet di Indonesia yang pesat, memicu kecenderungan aktivitas investor dalam pencarian informasi sebelum membeli dan menjual surat berharga secara online, turut berkontribusi dalam mempengaruhi preferensi dan sentimen investor. Penelitian ini menggali ekspektasi investor yang tercermin pada sentimen investasi, dimana pasar modal sebagai salah satu barometer penting perekonomian suatu negara. Kajian ini mengeksplorasi fitur/terms investasi saham yang kerap muncul di pasar modal dan mengumpulkannya dalam kamus leksikon. Kemudian, dilakukan ekstraksi opini investor berbasis leksikon yang digali dari jejaring sosial Twitter, dilanjutkan dengan tahap text mining yaitu menganalisis sentimen, dan membentuk model klasifikasi berbasis Naive Bayes dan Decision Tree. Keluaran penelitian ini menunjukkan bahwa polaritas sentimen investor pasar modal adalah positif dengan fitur sentimen yang sering muncul yaitu “cuan”, “bearish”, “serok”, “copet”, “untung”, dan “cut loss”. Sedangkan model klasifikasi Decision Tree memberikan performansi akurasi yang kebih baik.Kata Kunci : Analisis sentimen; Investor; Leksikon; Text mining; Twitter

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Identifying Key Fraud Indicators in the Automobile Insurance Industry Using SQL Server Analysis Services

Studia Universitatis Babe-Bolyai Oeconomica ◽

10.2478/subboec-2019-0009 ◽

2019 ◽

Vol 64 (2) ◽

pp. 53-71

Author(s):

Botond Benedek ◽

Ede László

Keyword(s):

Neural Network ◽

Decision Tree ◽

Naive Bayes ◽

Insurance Industry ◽

Naïve Bayes ◽

Sql Server ◽

Categorical Variables ◽

Automobile Insurance ◽

Price Determination ◽

Mining Tool

Abstract Customer segmentation represents a true challenge in the automobile insurance industry, as datasets are large, multidimensional, unbalanced and it also requires a unique price determination based on the risk profile of the customer. Furthermore, the price determination of an insurance policy or the validity of the compensation claim, in most cases must be an instant decision. Therefore, the purpose of this research is to identify an easily usable data mining tool that is capable to identify key automobile insurance fraud indicators, facilitating the segmentation. In addition, the methods used by the tool, should be based primarily on numerical and categorical variables, as there is no well-functioning text mining tool for Central Eastern European languages. Hence, we decided on the SQL Server Analysis Services (SSAS) tool and to compare the performance of the decision tree, neural network and Naïve Bayes methods. The results suggest that decision tree and neural network are more suitable than Naïve Bayes, however the best conclusion can be drawn if we use the decision tree and neural network together.

Download Full-text

Impute, Select, Decision Tree and Naïve Bayes (ISE-DNC): An Ensemble Learning Approach to Classify the Lung Cancer

SSRN Electronic Journal ◽

10.2139/ssrn.3667438 ◽

2020 ◽

Author(s):

Bhanumathi S ◽

Dr. Chandrashekara S N

Keyword(s):

Lung Cancer ◽

Decision Tree ◽

Ensemble Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Learning Approach

Download Full-text

Human activity classification using Decision Tree and Naïve Bayes classifiers

Multimedia Tools and Applications ◽

10.1007/s11042-020-10447-x ◽

2021 ◽

Author(s):

Kholoud Maswadi ◽

Norjihan Abdul Ghani ◽

Suraya Hamid ◽

Muhammads Babar Rasheed

Keyword(s):

Decision Tree ◽

Human Activity ◽

Naive Bayes ◽

Naïve Bayes ◽

Activity Classification

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text