Implementing Naive Bayes Algorithm for Detecting Spam Emails on Datasets

Author(s):  
ARGHA GHOSH ◽  
A. SENTHILRAJAN

Email is the most common as well as the fastest medium for communicating around the globe. But, presently every day we used to get lots of junk emails in the name of “spam”. This “spam” emails mainly used to contain two types of content, those are content like an advertisement, offers and, criminal activity content like a phishing website link, malware, trojan, etc. Those advertisements, offer types of spam or junk emails known as Unsolicited Commercial Emails and, those emails contain phishing website link, malware, trojan used to known as Unsolicited Bulk Emails. Whoever used to send spam emails, they are known as Spammers. Spammers mainly used to get the email address of target user from the websites, junk sites, browsers add on, etc. Naive Bayes algorithm is a probabilistic machine learning algorithm that mainly well-known for classifying spam emails. Naive Bayes algorithm mainly originated from Bayes Theorem. Bayes Theorem mainly used in conditional probability for elaborates the probability of an event in terms of when the probability of other event is true. In this research work, we have been performing Feature Extraction in terms of email characteristics and behavior. In this paper, we have been proposed a detection approach for classifying spam emails using Naïve Bayes classifier. In this research work, we have been used multiple email data-sets for implementing Naïve Bayes classifier. Those data sets are Spam Corpus, Spambase. Based on the results of WEKA (Waikato Environment for Knowledge Analysis) tool, we have been performing Experimental analysis in terms of measuring the performance of Naïve Bayes classifier using parameters like Accuracy, Recall, Precision, F-measure. Based on correctly classified instances of emails and incorrectly classified instances of emails, lastly comparing the performance of Naïve Bayes classifier in multiple data sets.

2020 ◽  
Vol 1 (2) ◽  
pp. 61-66
Author(s):  
Febri Astiko ◽  
Achmad Khodar

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Bustami Yusuf ◽  
Muhammad Zaeki ◽  
Hendri Ahmadian ◽  
Khairan Ar ◽  
Sri Wahyuni

Education is one of the sciences that makes humans much better by learning various scientific disciplines. Al-Quran is one of the sources of knowledge that is believed by Muslims around the world. Because technology has penetrated almost every domain of our lives , including the world of education. Thus, the authors make technology as tool  for researching educational topics in Al-Quran by implementing text exploration .The research was carried out by making some basic words that were related to the subject of education as the keywords in this study. The keywords are “Ajar”, “Bicara”, “Cipta”, “Dengar”, “Ingat” and “Lihat”. Then, the authors implemented the Naïve Bayes Classifier algorithm. To test and evaluate the results, the author used two methods, i.e. recall and precision. The study results are the keyword “cipta” by 3.05 %, “Ingat” 2.25 %, “Ajar” 1.96 %,“Lihat” 0.82 %, finally “Dengar” 0.62% and “Bicara” 0.34% with  total  weight of 3,516 words that  have been filtered. The overall percentage of the results is 9.04% of the total number of words 38,761 in the Al-Quran. For the Naïve Bayes algorithm evaluation method,  the recall and precision scores are 0.605 and 0.366, respectively.


Author(s):  
Youllia Indrawaty Nurhasanah ◽  
Asep Nana Hermana ◽  
Mahesa Arga Hutama

Sugeno Fuzzy algorithm is one of the algorithms contained on Fuzzy Inference System, that used to describe the condition between the two pieces of the decisions represented in the form of rules IF - THEN, where the output is constant or linear equations. While the Naive Bayes algorithm is an algorithm that uses data classification to a particular class based on the probability of each data class. Both of these algorithms can be implemented on a Decision Support System (DSS) for diet selection, using Fuzzy Sugeno as an additional determinant of energy and Naive Bayes method as decision maker. This is because the need for food intake and diet has become a problem for humans. To prevent excess intake of food it needs dietary adjustments or so-called diet. But in daily life, people sometimes hard to determine the type of diet that is suitable for them. So we need a system that can determine the type of diet that is suitable for a person. The data that used as a reference for decision support are age, daily caloric requirement, Body Mass Index (BMI), blood pressure, cholesterol, uric acid and blood sugar levels. Results of system testing showed from a sample of 30 data there are 26 appropriate data and 4 inappropriate data to determine the type of diet by the system with the success rate of 86.7%.


2019 ◽  
Vol 9 (2) ◽  
pp. 97
Author(s):  
Firman Tempola

<p class="JGI-AbstractIsi">This research is a continuation of previous research that applied the Naive Bayes classifier algorithm to predict the status of volcanoes in Indonesia based on seismic factors. There are five attributes used in predicting the status of volcanoes, namely the status of the normal, standby and alerts. The results Showed the accuracy of the resulted prediction was only 79.31%, or fell into fair classification. To overcome these weaknesses and in order to increase accuracy, optimization is done by giving criteria or attribute weights using particle swarm optimization. This research compared the optimization of Naive Bayes algorithm to vector machine support using particle swarm optimization. The research found improvement on system after application of PSO-NBC to that of 91.3 % and 92.86% after applying PSO-SVM.</p>


SISFORMA ◽  
2018 ◽  
Vol 5 (1) ◽  
pp. 22
Author(s):  
Eka Angga Laksana ◽  
Ase Suryana ◽  
Heri Heryono

Sentiment analysis as part of text mining research domain has been being recognized due to the successful implementation in social media analysis. Sentiment analysis methods had intelligent ability to classify texts into negative or positive. Classified texts concluded whole users respond and described opinion polarity about particular topic. Based on this idea, this research took e-learning’s users opinion as object to be measured through sentiment analysis. The results can be used to evaluate the e-learning activity. This research had been implemented in Widyatama University which had been running e-learning activity for several years. Qualitative method by given questioner to users and gather the feedback is commonly used as evaluation of e-learning system previously. Still, questioner doesn’t represent the conclusion about the whole opinion. Hence, it needs the method to identify opinion polarity from e-learning member. The e-learning opinion data sets were gathered from questioner filled by e-learning member included both student and lecturer as participants. The participants gave review about learning outcome after their participation in e-learning activity. Their opinion was needed to describe current situation about e-learning activity. Therefore, the conclusion could be used to make improvement and described few achievements about the e-learning system. The data sets trained by Naïve Bayes classifier to group each user respond into negative or positive. The classification results were also evaluated by a number of particular evaluation metric used in data mining to show the classifier performance such as accuracy, precision, and recall.


With the recent advancement in the field of online services, the importance of a review for a product has also gone up. In this paper we focus on the aspect of reducing the time and effort for the user by recommending the best product to him. For this to be achieved, this paper proposes a Naive Bayes Classifier which labels the reviews accurately and combines the reviews to give a final rating to the product. The amazon product review data consisting of both negative and positive reviews was used for training and testing purposes. The model’s performance is evaluated, and results are analysed.


Author(s):  
Jie Ji ◽  
◽  
Qiangfu Zhao

Document clustering partitions sets of unlabeled documents so that documents in clusters share common concepts. A Naive Bayes Classifier (BC) is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. BC requires a small amount of training data to estimate parameters required for classification. Since training data must be labeled, we propose an Iterative Bayes Clustering (IBC) algorithm. To improve IBC performance, we propose combining IBC with Comparative Advantage-based (CA) initialization method. Experimental results show that our proposal improves performance significantly over classical clustering methods.


2018 ◽  
Vol 2 (2) ◽  
pp. 131
Author(s):  
Anaïs Pizzo ◽  
Pascal Teyssere ◽  
Long Vu-Hoang

With the explosion of computer science in the last decade, data banks and networksmanagement present a huge part of tomorrows problems. One of them is the development of the best classication method possible in order to exploit the data bases. In classication problems, a representative successful method of the probabilistic model is a Naïve Bayes classier. However, the Naïve Bayes effectiveness still needs to be upgraded. Indeed, Naïve Bayes ignores misclassied instances instead of using it to become an adaptive algorithm. Different works have presented solutions on using Boosting to improve the Gaussian Naïve Bayes algorithm by combining Naïve Bayes classier and Adaboost methods. But despite these works, the Boosted Gaussian Naïve Bayes algorithm is still neglected in the resolution of classication problems. One of the reasons could be the complexity of the implementation of the algorithm compared to a standard Gaussian Naïve Bayes. We present in this paper, one approach of a suitable solution with a pseudo-algorithm that uses Boosting and Gaussian Naïve Bayes principles having the lowest possible complexity. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Author(s):  
Tobias Sombra ◽  
Rose Santini ◽  
Emerson Morais ◽  
Walmir Couto ◽  
Alex Zissou ◽  
...  

Quantitative evaluation of a dataset can play an important role in pattern recognition of technical-scientific research involving behavior and dynamics in social networks. As an example, are the adaptive feature weighting approaches by naive Bayes text algorithm. This work aims to present an exploratory data analysis with a quantitative approach that involves pattern recognition using the Mendeley research network; to identify logics given the popularity of document access. To better analyze the results, the work was divided into four categories, each with three subcategories, that is, five, three, and two output classes. The name for these categories came up due to data collection, which also presented documents with open access, dismembering proceedings, and journals for two more categories. As a result, the performance for the test examples showed a lower error rate related to the subcategory two output classes in the criterion of popularity by using the naive Bayes algorithm in Mendeley.


2021 ◽  
Vol 10 (1) ◽  
pp. 47-52
Author(s):  
Pulung Hendro Prastyo ◽  
Septian Eko Prasetyo ◽  
Shindy Arti

Credit scoring is a model commonly used in the decision-making process to refuse or accept loan requests. The credit score model depends on the type of loan or credit and is complemented by various credit factors. At present, there is no accurate model for determining which creditors are eligible for loans. Therefore, an accurate and automatic model is needed to make it easier for banks to determine appropriate creditors. To address the problem, we propose a new approach using the combination of a machine learning algorithm (Naïve Bayes), Information Gain (IG), and discretization in classifying creditors. This research work employed an experimental method using the Weka application. Australian Credit Approval data was used as a dataset, which contains 690 instances of data. In this study, Information Gain is employed as a feature selection to select relevant features so that the Naïve Bayes algorithm can work optimally. The confusion matrix is used as an evaluator and 10-fold cross-validation as a validator. Based on experimental results, our proposed method could improve the classification performance, which reached the highest performance in average accuracy, precision, recall, and f-measure with the value of 86.29%, 86.33%, 86.29%, 86.30%, and 91.52%, respectively. Besides, the proposed method also obtains 91.52% of the ROC area. It indicates that our proposed method can be classified as an excellent classification.


Sign in / Sign up

Export Citation Format

Share Document