Improving Naïve Bayes Text Classifier with Modified EM Algorithm

Movies are very familiar to everyone, from children, adolescents to adults, whether just because they want to watch, a hobby, or fill their spare time. Movies that used to be watched only on television and had to wait months after release or directly to the cinema, with the development of technology, of course, it is increasingly easier for everyone to enjoy movies, now they can be watched through paid television services to smartphones. One of the websites that viewers often use to review movies they have watched is IMDb. The data review can be used to get an opinion or opinion mining from the audience, whether the title of the movie being reviewed is good or not. One of the algorithms that are often used is Naïve Bayes, apart from being easy to implement, Naïve Bayes is also known to be very fast and easy to use to predict classes on a test dataset. The purpose of this study is to see how much influence the Expectation-Maximization to increase accuracy on implementation of Expectation-Maximization algorithm in opinion mining movies review case studies. From the results of this study using the Expectation-Maximization method, it was found that the accuracy increased by 4% compared to using only Naïve Bayes.

Download Full-text

Quality Assessment of Affymetrix GeneChip Data using the EM Algorithm and a Naive Bayes Classifier

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering ◽

10.1109/bibe.2007.4375557 ◽

2007 ◽

Cited By ~ 1

Author(s):

Brian E. Howard ◽

Beate Sick ◽

Imara Perera ◽

Yang Ju Im ◽

Heike Winter-Sederoff ◽

...

Keyword(s):

Em Algorithm ◽

Quality Assessment ◽

Naive Bayes ◽

Naïve Bayes ◽

Affymetrix Genechip ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Genechip Data ◽

The Em Algorithm

Download Full-text

Penerapan Information Retrieval Menggunakan Pemodelan Topik Pada Deskripsi Portal Multimedia

Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI) ◽

10.32672/jnkti.v2i1.1057 ◽

2019 ◽

Vol 2 (1) ◽

pp. 48

Author(s):

Indra Gita Anugrah ◽

Harunur Rosyid

Keyword(s):

Information Retrieval ◽

Em Algorithm ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Probabilistic Latent Semantic Analysis

Pesatnya perkembangan teknologi informasi saat ini, diikuti meningkatnya perkembangan data. Data merupakan informasi yang sangat berharga perkembangan yang semakin pesat menyebabkan kesulitan dalam pengelolaannya. Salah satu pemanfaatan data adalah penggunaan temu kembali informasi pada portal video multimedia. Semakin banyak video multimedia yang tersimpan pada repositori maka semakin sulit dalam proses pencarian. Pada proses pencarian, pengguna terkadang menginginkan korelasi diantara hasil pencarian. Untuk membentuk korelasi dari hasil pencarian, dibutuhkan sebuah pemodelan topik yang berfungsi sebagai penghubung diantara query, kata dan dokumen dari deskripsi video multimedia. Salah satu metode pemodelan topik dapat dilakukan menggunakan model Probabilistic Latent Semantic Analysis (PLSA) dengan algoritma Expectation dan Maximization (EM Algorithm). Algoritma EM merupakan algoritma untuk menduga suatu parameter, tahap awal adalah melakukan pencarian nilai ekspektasi (Expectation). Pencarian nilai ekspektasi membutuhkan topik sebagai parameter awal yang nilai parameter-parameter akan diperbaharui menggunakan algoritma Maximization. Proses pembentukan parameter awal dilakukan menggunakan algoritma Naive Bayes, dimana algoritma Naive Bayes digunakan memprediksi kejadian dimasa datang menggunakan pengalaman sebelumnya.

Download Full-text

Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier

The KIPS Transactions PartD ◽

10.3745/kipstd.2006.13d.3.369 ◽

2006 ◽

Vol 13D (3) ◽

pp. 369-376

Author(s):

Jae-Young Chang ◽

Han-Joon Kim

Keyword(s):

Em Algorithm ◽

Naive Bayes ◽

Naïve Bayes ◽

Selective Sampling ◽

The Em Algorithm

Download Full-text

Porous Media in the Simulation of Greenhouse Crops Using the Naïves Bayes EM Algorithm

JOURNAL OF ADVANCES IN AGRICULTURE ◽

10.24297/jaa.v10i0.8115 ◽

2019 ◽

Vol 10 ◽

pp. 1873-1885

Author(s):

Guillermo Alfonso De la Torre Gea

Keyword(s):

Porous Media ◽

Em Algorithm ◽

Natural Ventilation ◽

Naive Bayes ◽

Equations Of Motion ◽

Naïve Bayes ◽

Training Data ◽

Data Set ◽

Climate Conditions ◽

Porous Media Model

The porous media approach has become more popular thus, it solves the equations of motion and energy numerically and therefore obtains detailed distributions of temperature and airspeed. However, those models are not allowed to forecast the relationships between the porosity of the volume of the crop with respect to the variables that comprise the climate in natural ventilation greenhouses at the same time in terms of probability. A porous media model of the crop and its approximations were developed and analyzed through non-supervised Bayesian Networks clustering, with the aim of determining the influence of porous media in function to the density crop, over the climate conditions in a natural ventilation greenhouse. Also, a naïve Bayes model unsupervised by the EM algorithm, initialized with random parameters was developed. The resulting model maximized the likelihood of the training data set. The relationships between the pressure drops in the flow limits at the crop were established. Porosity is directly influenced by humidity, temperature and slowly to CO2 concentration. Solar radiation, speed air and slowly the height are inversely influenced with the porosity. Naïve Bayes EM application to a CFD model has been providing a greater understanding of the interactions between the variables.

Download Full-text

Training a naive bayes classifier via the EM algorithm with a class distribution constraint

10.3115/1119176.1119193 ◽

2003 ◽

Cited By ~ 15

Author(s):

Yoshimasa Tsuruoka ◽

Jun'ichi Tsujii

Keyword(s):

Em Algorithm ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Class Distribution ◽

The Em Algorithm

Download Full-text

Handling missing data in software effort prediction with naive Bayes and EM algorithm

Proceedings of the 7th International Conference on Predictive Models in Software Engineering - Promise '11 ◽

10.1145/2020390.2020394 ◽

2011 ◽

Cited By ~ 18

Author(s):

Wen Zhang ◽

Ye Yang ◽

Qing Wang

Keyword(s):

Missing Data ◽

Em Algorithm ◽

Naive Bayes ◽

Naïve Bayes ◽

Effort Prediction

Download Full-text

Study of Sentiment of Governor's Election Opinion in 2018

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset21841124 ◽

2018 ◽

pp. 231-238

Author(s):

Agung Eddy Suryo Saputro ◽

Khairil Anwar Notodiputro ◽

Indahwati A

Keyword(s):

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Addition Method ◽

Sentiment Mining ◽

Positive Sentiment ◽

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Klasifikasi Tahap Kematangan Pisang Ambon Berdasarkan Warna Menggunakan Naive Bayes

PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic ◽

10.33558/piksel.v5i2.268 ◽

2018 ◽

Vol 5 (2) ◽

pp. 60-67 ◽

Cited By ~ 1

Author(s):

Dwi Yulianto ◽

Retno Nugroho Whidhiasih ◽

Maimunah Maimunah

Keyword(s):

Naive Bayes ◽

Fruit Production ◽

Naïve Bayes ◽

Primary Data ◽

Banana Fruit ◽

Bayes Method ◽

Classification Image ◽

Average Accuracy ◽

The Government

ABSTRACT Banana fruit is a commodity that contributes a great value to both national and international fruit production achievement. The government through the National Standardization Agency establishes standards to maintain the quality of bananas. The purpose of this Project is to classify the stages of maturity of Ambon banana base on the color index using Naïve Bayes method in accordance with the regulations of SNI 7422:2009. Naive Bayes is used as a method in the classification process by comparing the probability values generated from the variable value of each model to determine the stage of Ambon banana maturity. The data used is the primary data image of 105 pieces of Ambon banana. By using 3 models which consists of different variables obtained the same greatest average accuracy by using the 2nd model which has 9 variable values (r, g, b, v, * a, * b, entropy, energy, and homogeneity) and the 3rd model has 7 variable values (r, g, b, v , * a, entropy and homogeneity) that is 90.48%. Keywords: banana maturity, classification, image processing ABSTRAK Buah pisang merupakan komoditas yang memberikan kontribusi besar terhadap angka produksi buah nasional maupun internasional. Pemerintah melalui Badan Standarisasi Nasional menetapkan standar untuk buah pisang, menjaga mutu buah pisang. Tujuan dari penelitian ini adalah klasifikasi tahapan kematangan dari buah pisang ambon berdasarkan indeks warna menggunakan metode Naïve Bayes sesuai dengan SNI 7422:2009. Naive bayes digunakan sebagai metode dalam proses pengklasifikasian dengan cara membandingkan nilai probabilitas yang dihasilkan dari nilai variabel penduga setiap model untuk menentukan tahap kematangan pisang ambon. Data yang digunakan adalah data primer citra pisang ambon sebanyak 105. Dengan menggunakan 3 buah model yang terdiri dari variabel penduga yang berbeda didapatkan akurasi rata-rata terbesar yang sama yaitu dengan menggunakan model ke-2 yang mempunyai 9 nilai variabel (r, g, b, v, *a, *b, entropi, energi, dan homogenitas) dan model ke-3 yang mempunyai 7 nilai variabel (r, g, b, v, *a, entropi dan homogenitas) yaitu sebesar 90.48%. Kata Kunci : kematangan pisang, klasifikasi, pengolahan citra

Download Full-text