AN INFORMATION-THEORETIC FILTER METHOD FOR FEATURE WEIGHTING IN NAIVE BAYES

Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of suspicious emails during the past few years. This paper proposes to apply classification data mining for the task of suspicious email detection based on deception theory. In this paper, email data was classified using four different classifiers (Neural Network, SVM, Naïve Bayesian and Decision Tree). The experiment was performed using weka on the basis of different data size by which the suspicious emails are detected from the email corpus. Experimental results show that simple ID3 classifier which make a binary tree, will give a promising detection rates.

Download Full-text

An Augmented Naive Bayesian Power Network Fault Diagnosis Method Based on Data Mining

2011 Asia-Pacific Power and Energy Engineering Conference ◽

10.1109/appeec.2011.5748348 ◽

2011 ◽

Cited By ~ 2

Author(s):

Qianwen Nie ◽

Youyuan Wang

Keyword(s):

Data Mining ◽

Fault Diagnosis ◽

Power Network ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Diagnosis Method ◽

Network Fault

Download Full-text

Incremental Naïve Bayesian Learning Algorithm based on Classification Contribution Degree

Journal of Computers ◽

10.4304/jcp.9.8.1967-1974 ◽

2014 ◽

Vol 9 (8) ◽

Cited By ~ 8

Author(s):

Shuxia Ren ◽

Yangyang Lian ◽

Xiaojian Zou

Keyword(s):

Bayesian Learning ◽

Learning Algorithm ◽

Naive Bayesian ◽

Naïve Bayesian

Download Full-text

Naïve Bayesian Classification of Uncertain Objects Based on the Theory of Interval Probability

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500123 ◽

2016 ◽

Vol 25 (03) ◽

pp. 1650012 ◽

Cited By ~ 3

Author(s):

Hongmei Chen ◽

Weiyi Liu ◽

Lizhen Wang

Keyword(s):

Data Mining ◽

Uncertain Data ◽

Bayesian Classification ◽

Bayesian Classifier ◽

Interval Probability ◽

Naïve Bayesian Classifier ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Intuitive Concept

The potential applications and challenges of uncertain data mining have recently attracted interests from researchers. Most uncertain data mining algorithms consider aleatory (random) uncertainty of data, i.e. these algorithms require that exact probability distributions or confidence values are attached to uncertain data. However, knowledge about uncertainty may be incomplete in the case of epistemic (incomplete) uncertainty of data, i.e. probabilities of uncertain data may be imprecise, coarse, or missing in some applications. The paper focuses on uncertain data which miss probabilities, specially, value-uncertain discrete objects which miss probabilities (for short uncertain objects). On the other hand, classification is one of the most important tasks in data mining. But, to the best of our knowledge, there is no method to learn Naïve Bayesian classifier from uncertain objects. So the paper studies Naïve Bayesian classification of uncertain objects. Firstly, the paper defines interval probabilities of uncertain objects from probabilistic cardinality point of view, and bridges the gap between uncertain objects and the theory of interval probability by proving that interval probabilities are F-probabilities. Secondly, based on the theory of interval probability, the paper defines conditional interval probabilities including the intuitive concept and the canonical concept, and the conditional independence of the intuitive concept. Further, the paper gives a formula to effectively compute the intuitive concept. Thirdly, the paper presents a Naïve Bayesian classifier with interval probability parameters which can handle both uncertain objects and certain objects. Finally, experiments with uncertain objects based on UCI data show satisfactory performances.

Download Full-text

Klasifikasi Posting Twitter Kemacetan Lalu Lintas Kota Bandung Menggunakan Naive Bayesian Classification

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.3048 ◽

2013 ◽

Vol 7 (1) ◽

pp. 13 ◽

Cited By ~ 2

Author(s):

Sandi Fajar Rodiyansyah ◽

Edi Winarko

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Naive Bayes ◽

High Accuracy ◽

Bayesian Classification ◽

Support Vector ◽

Bayes Classifier ◽

Traffic Jam ◽

Naive Bayesian ◽

Naïve Bayesian

AbstrakSetiap hari server Twitter menerima data tweet dengan jumlah yang sangat besar, dengan demikian, kita dapat melakukan data mining yang digunakan untuk tujuan tertentu. Salah satunya adalah untuk visualisasi kemacetan lalu lintas di sebuah kota.Naive bayes classifier adalah pendekatan yang mengacu pada teorema Bayes, dengan mengkombinasikan pengetahuan sebelumnya dengan pengetahuan baru. Sehingga merupakan salah satu algoritma klasifikasi yang sederhana namun memiliki akurasi tinggi. Untuk itu, dalam penelitian ini akan membuktikan kemampuan naive bayes classifier untuk mengklasifikasikan tweet yang berisi informasi dari kemacetan lalu lintas di Bandung.Dari hasil uji coba, aplikasi menunjukan bahwa nilai akurasi terkecil 78% dihasilkan pada pengujian dengan sampel sebanyak 100 dan menghasilkan nilai akurasi tinggi 91,60% pada pengujian dengan sampel sebanyak 13106. Hasil pengujian dengan perangkat lunak Rapid Miner 5.1 diperoleh nilai akurasi terkecil 72% dengan sampel sebanyak 100 dan nilai akurasi tertinggi 93,58% dengan sampel 13106 untuk metode naive bayesian classification. Sedangkan untuk metode support vector machine diperoleh nilai akurasi terkecil 92% dengan sampel sebanyak 100 dan nilai akurasi tertinggi 99,11% dengan sampel sebanyak 13106. Kata kunci— Twitter, tweet, klasifikasi, naive bayesian classification, support vector machine AbstractEvery day the Twitter server receives data tweet with a very large number, thus, we can perform data mining to be used for specific purpose. One of which is for the visualization of traffic jam in a city.Naive bayes classifier is an approach that refers to the bayes theorem, is a combination of prior knowledge with new knowledge. So that is one of the classification algorithm is simple but has a high accuracy. With this, in this research will prove the ability naive bayes classifier to classify the tweet that contains information of traffic jam in Bandung.The testing result, the program shows that the smallest value of the accuracy is 78% on testing by using a sample 100 record and generate high accuracy is 91,60% on the testing by using a sample 13106 record. The testing results with Rapid Miner 5.1 software obtained the smallest value of the accuracy is 72% by using a sample 100 records and the high accuracy is 93.58% by using a sample 13.106 records for naive bayesian classification. And for the method of support vector machine obtained the smallest value is 92% accuracy by using a sample 100 records and the high accuracy of 99.11% by using a sample 13.106 records. Keywords—Twitter, tweet, classification, naive bayesian classification, support vector machine

Download Full-text

Klasifikasi Posting Twitter Kemacetan Lalu Lintas Kota Bandung Menggunakan Naive Bayesian Classification

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.2144 ◽

2013 ◽

Vol 6 (1) ◽

Author(s):

Sandi Fajar Rodiyansyah ◽

Edi Winarko

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Naive Bayes ◽

High Accuracy ◽

Bayesian Classification ◽

Support Vector ◽

Bayes Classifier ◽

Traffic Jam ◽

Naive Bayesian ◽

Naïve Bayesian

AbstrakSetiap hari server Twitter menerima data tweet dengan jumlah yang sangat besar, dengan demikian, kita dapat melakukan data mining yang digunakan untuk tujuan tertentu. Salah satunya adalah untuk visualisasi kemacetan lalu lintas di sebuah kota.Naive bayes classifier adalah pendekatan yang mengacu pada teorema Bayes, dengan mengkombinasikan pengetahuan sebelumnya dengan pengetahuan baru. Sehingga merupakan salah satu algoritma klasifikasi yang sederhana namun memiliki akurasi tinggi. Untuk itu, dalam penelitian ini akan membuktikan kemampuan naive bayes classifier untuk mengklasifikasikan tweet yang berisi informasi dari kemacetan lalu lintas di Bandung.Dari hasil uji coba, aplikasi menunjukan bahwa nilai akurasi terkecil 78% dihasilkan pada pengujian dengan sampel sebanyak 100 dan menghasilkan nilai akurasi tinggi 91,60% pada pengujian dengan sampel sebanyak 13106. Hasil pengujian dengan perangkat lunak Rapid Miner 5.1 diperoleh nilai akurasi terkecil 72% dengan sampel sebanyak 100 dan nilai akurasi tertinggi 93,58% dengan sampel 13106 untuk metode naive bayesian classification. Sedangkan untuk metode support vector machine diperoleh nilai akurasi terkecil 92% dengan sampel sebanyak 100 dan nilai akurasi tertinggi 99,11% dengan sampel sebanyak 13106. Kata kunci— Twitter, tweet, klasifikasi, naive bayesian classification, support vector machine AbstractEvery day the Twitter server receives data tweet with a very large number, thus, we can perform data mining to be used for specific purpose. One of which is for the visualization of traffic jam in a city.Naive bayes classifier is an approach that refers to the bayes theorem, is a combination of prior knowledge with new knowledge. So that is one of the classification algorithm is simple but has a high accuracy. With this, in this research will prove the ability naive bayes classifier to classify the tweet that contains information of traffic jam in Bandung.The testing result, the program shows that the smallest value of the accuracy is 78% on testing by using a sample 100 record and generate high accuracy is 91,60% on the testing by using a sample 13106 record. The testing results with Rapid Miner 5.1 software obtained the smallest value of the accuracy is 72% by using a sample 100 records and the high accuracy is 93.58% by using a sample 13.106 records for naive bayesian classification. And for the method of support vector machine obtained the smallest value is 92% accuracy by using a sample 100 records and the high accuracy of 99.11% by using a sample 13.106 records. Keywords—Twitter, tweet, classification, naive bayesian classification, support vector machine

Download Full-text