Attribute Selection in Naive Bayes Algorithm Using Genetic Algorithms and Bagging for Prediction of Liver Disease

Liver disease is an inflammatory disease of the liver and can cause the liver to be unable to function as usual and even cause death. According to WHO (World Health Organization) data, almost 1.2 million people per year, especially in Southeast Asia and Africa, have died from liver disease. The problem that usually occurs is the difficulty of recognizing liver disease early on, even when the disease has spread. This study aims to compare and evaluate Naive Bayes algorithm as a selected algorithm and Naive Bayes algorithm based on Genetic Algorithm (GA) and Bagging to find out which algorithm has a higher accuracy in predicting liver disease by processing a dataset taken from the UCI Machine Learning Repository database (GA). University of California Invene). From the results of testing by evaluating both the confusion matrix and the ROC curve, it was proven that the testing carried out by the Naive Bayes Optimization algorithm using Algortima Genetics and Bagging has a higher accuracy value than only using the Naive Bayes algorithm. The accuracy value for the Naive Bayes algorithm model is 66.66% and the accuracy value for the Naive Bayes model with attribute selection using Genetic Algorithms and Bagging is 72.02%. Based on this value, the difference in accuracy is 5.36%.Keywords: Liver Disease, Naïve Bayes, Genetic Agorithms, Bagging.

Download Full-text

Analisis Klasifikasi Kanker Payudara Menggunakan Algoritma Naive Bayes

INFORMAL: Informatics Journal ◽

10.19184/isj.v4i3.14170 ◽

2020 ◽

Vol 4 (3) ◽

pp. 117

Author(s):

Hardian Oktavianto ◽

Rahman Puji Handri

Keyword(s):

Breast Cancer ◽

Naive Bayes ◽

Naïve Bayes ◽

World Health ◽

Average Percentage ◽

Average Value ◽

Treatment Measures ◽

Bayes Algorithm ◽

Health Organization

Breast cancer is one of the highest causes of death among women, this disease ranks second cause of death after lung cancer. According to the world health organization, 1 million women get a diagnosis of breast cancer every year and half of them die, in general this is due to early treatment and slow treatment resulting in new cancers being detected after entering the final stage. In the field of health and medicine, machine learning-based classification has been carried out to help doctors and health professionals in classifying the types of cancer, to determine which treatment measures should be performed. In this study breast cancer classification will be carried out using the Naive Bayes algorithm to group the types of cancer. The dataset used is from the Wisconsin breast cancer database. The results of this study are the ability of the Naive Bayes algorithm for the classification of breast cancer produces a good value, where the average percentage of correctly classified data reaches 96.9% and the average percentage of data is classified as incorrect only 3.1%. While the level of effectiveness of classification with naive bayes is high, where the average value of precision and recall is around 0.96. The highest precision and recall values are when the test data uses a percentage split of 40% with the respective values reaching 0.974 and 0.973.

Download Full-text

Analisis Sentimen Opini Terhadap Vaksin Covid - 19 pada Media Sosial Twitter Menggunakan Support Vector Machine dan Naive Bayes

Jurnal Komtika ◽

10.31603/komtika.v5i1.5185 ◽

2021 ◽

Vol 5 (1) ◽

pp. 19-25

Author(s):

Frizka Fitriana ◽

Ema Utami ◽

Hanif Al Fatta

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

World Health ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Bayes Algorithm ◽

Health Organization ◽

The Right ◽

The Impact

The corona virus outbreak, commonly referred to as COVID-19, has been officially designated a global pandemic by the World Health Organization (WHO). To minimize the impact caused by the virus, one of the right steps is to develop a vaccine, however, with the vaccination for the Indonesian people, it is controversial so that it invites many people to give an opinion assessment, but the limited space makes it difficult for the public to express their opinion, because Therefore, people choose social media as a place to channel public opinion. Support vector machine algorithm has better performance in terms of accuracy, precision and recall with values of 90.47%, 90.23%, 90.78% with performance values on the Bayes algorithm, namely 88.64%, 87.32%, 88, 13%, with a difference of 1.83% accuracy, 2.91% precision and 2.65% recall, while for time the Naive Bayes algorithm has a better performance level with a value of 8.1 seconds and the Support vector machine algorithm gets a time speed of 11 seconds with a difference of 2, 9 seconds. With the results of sentiment analysis neutral 8.76%, negative 42.92% and positive 48.32% for Bayes and neutral 10.56%, negative 41.28% and positive 48.16% for SVM.

Download Full-text

SENTIMEN ANALISIS KEBIJAKAN GANJIL GENAP DI TOL BEKASI MENGGUNAKAN ALGORITMA NAIVE BAYES DENGAN OPTIMALISASI INFORMATION GAIN

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.705 ◽

2019 ◽

Vol 15 (2) ◽

pp. 247-254

Author(s):

Heru Sukma Utama ◽

Didi Rosiyadi ◽

Dedi Aridarma ◽

Bobby Suryo Prakoso

Keyword(s):

Social Media ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

Toll Road ◽

Textual Data ◽

Bayes Algorithm

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.

Download Full-text

Prediction of Solid Garbage Waste Generation in Smart Cities using Naive Bayes Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1031.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 53-56

Keyword(s):

Naive Bayes ◽

Learning Algorithm ◽

Smart Cities ◽

Confusion Matrix ◽

Daily Basis ◽

Naïve Bayes ◽

Human Beings ◽

Waste Generation ◽

Future Prediction ◽

Bayes Algorithm

Smart cities which are becoming overcrowded today are making human beings life miserable and prone to more challenges on daily basis. Overcrowded is leading to vast generation of wastes contributing to air pollution and in turn is affecting health causing various diseases. Even though various measures are taken to recycle wastes, the rate at which it is being produced is becoming higher and higher. This paper deals with prediction of waste generation using Naïve Bayes machine learning algorithm(Classifier) based on the statistics of previous waste datasets. The datasets used for the future prediction are obtained from reliable sources. The implementation of the algorithm is done in Pyspark using Anaconda Jupyter. The performance of the classifier on the datasets is analyzed with confusion matrix and accuracy metric is used to rate the efficiency of the classifier. The accuracy obtained indicates that algorithm can be effectively used for real time prediction and it gives more accurate results for huge input datasets based on independence assumption.

Download Full-text

Brain tumor prediction using naïve Bayes’ classifier and decision tree algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10634 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 137 ◽

Cited By ~ 1

Author(s):

Danda Shashank Reddy ◽

Chinta Naga Harshitha ◽

Carmel Mary Belinda

Keyword(s):

Brain Tumor ◽

Decision Tree ◽

Naive Bayes ◽

Classification Problem ◽

Naïve Bayes ◽

World Health ◽

Decision Tree Algorithm ◽

Tree Algorithm ◽

Computer Tomography Scan ◽

Bayes Algorithm

Now a day’s many advanced techniques are proposed in diagnosing the tumor in brain like magnetic resonance imaging, computer tomography scan, angiogram, spinal tap and biospy. Based on diagnosis it is easy to predict treatment. All of the types of brain tumor are officially reclassified by the World Health Organization. Brain tumors are of 120 types, almost each tumor is having same symptoms and it is difficult to predict treatment. For this regard we are proposing more accurate and efficient algorithm in predicting the type of brain tumor is Naïve Bayes’ classification and decision tree algorithm. The main focus is on solving tumor classification problem using these algorithms. Here the main goal is to show that the prediction through the decision tree algorithm is simple and easy than the Naïve Bayes’ algorithm.

Download Full-text

KLASIFIKASI TEKS MENGGUNAKAN CHI SQUARE FEATURE SELECTION UNTUK MENENTUKAN KOMIK BERDASARKAN PERIODE, MATERI DAN FISIKDENGAN ALGORITMA NAIVEBAYES

Compiler ◽

10.28989/compiler.v5i2.171 ◽

2016 ◽

Vol 5 (2) ◽

Author(s):

Siti Anisah ◽

Anton Setiawan Honggowibowo ◽

Asih Pujiastuti

Keyword(s):

Feature Selection ◽

Error Rate ◽

Classification System ◽

Naive Bayes ◽

Naïve Bayes ◽

Chi Square ◽

Oracle Database ◽

Category O ◽

The Difference ◽

Bayes Algorithm

A comic has its own characteristics compared the other types of books. The difference between comic and other books can be seen from the category o f period, material and physical. Comicand other booksneeded an application o f classification system. Looking for the problem, classification system was made using Chi Square Feature Selection and Naive Bayes algorithm to determine the comic based on the period, material and physical. Delphi programming language and Oracle Database are used to build the Classification System. Chi Square Feature Selection acquired trait a comic is in 0.10347 and which not comic is in 1.9531. Furthermore, data is classified by the Naive Bayes algorithm. From 120 titles o f comic that consists 60 titles o f comic and non comicused to build classesfor trainand 60 titles o f comic and non comic used to test. The results o f Naive Bayesalgorithm for comic is 96,67%with 3.33% error rate, and non comic is 90% with 10% error rate. The classification to determine comic is good.

Download Full-text

The Use of Naive Bayes for Broiler Digestive Tract Disease Detection

Journal on Information Technology and Computer Engineering ◽

10.25077/jitce.3.01.1-7.2019 ◽

2019 ◽

Vol 3 (01) ◽

pp. 1-7

Author(s):

Hindriyanto Dwi Purnomo

Keyword(s):

Evaluation Method ◽

Naive Bayes ◽

Confusion Matrix ◽

Gastrointestinal Diseases ◽

Naïve Bayes ◽

Common Disease ◽

High Productivity ◽

Bayes Algorithm ◽

Tract Disease

Broiler chicken is a species of chicken that have high productivity. In order to get a good quality of chicken, good treatments of the breeding factors is needed, so the chicken will not easily infected by diseases. Gastrointestinal diseases are common disease that infects chickens. The mortality level caused by gastrointestinal diseases is considered high. This study is designed to address the problem by developing a system using the Naive Bayes algorithm. 60 chicken data samples were used, and the result shows that Naive Bayes might be used to detect gastrointestinal diseases among chickens with accuracy level of 93.3%. The number was confirmed by using confusion matrix evaluation method, and gave same level of accuracy compared to the expert judgments.

Download Full-text

Sentiment Analysis Using Naive Bayes Algorithm with Feature Selection Particle Swarm Optimization (PSO) and Genetic Algorithm

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i2.1224 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Abi Rafdi ◽

Herman Mawengkang Herman ◽

Syahril Efendi

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Naive Bayes ◽

Confusion Matrix ◽

Particle Swarm ◽

Naïve Bayes ◽

Swarm Optimization ◽

Bayes Algorithm

This study analyzes Sentiment to see opinions, points of view, judgments, attitudes, and emotions towards creatures and aspects expressed through texts. One of Social Media is like Twitter is one of the most widely used means of communication as a research topic. The main problem with sentiment analysis is voting and using the best feature options for maximum results. Either, the most widely known classification method is Naive Bayes. However, Naive Bayes is very sensitive to significant features. That way, in this test, a comparison of feature selection is carried out using Particle Swarm Optimization and Genetic Algorithm to improve the accuracy performance of the Naive Bayes algorithm. Analyses are performed by comparing before and after testing using feature selection. Validation uses a cross-validation technique, while the confusion matrix ??is appealed to measure accuracy. The results showed the highest increase for Naïve Bayes algorithm accuracy when using the feature selection of the Particle Swarm Optimization Algorithm from 60.26% to 77.50%, while the genetic algorithm from 60.26% to 70.71%. Therefore, the choice of the best characteristics is Particle Swarm Optimization which is superior with an increase in accuracy of 17.24%.

Download Full-text

Sentimen Analisis Terkait Lockdown pada Sosial Media Twitter

Indonesian Journal on Software Engineering (IJSE) ◽

10.31294/ijse.v6i2.8991 ◽

2020 ◽

Vol 6 (2) ◽

pp. 223-229

Author(s):

Muhammad Dwison Alizah ◽

Arifin Nugroho ◽

Ummu Radiyah ◽

Windu Gata

Keyword(s):

Support Vector Machine ◽

World Health Organization ◽

Predictive Modeling ◽

Naive Bayes ◽

Naïve Bayes ◽

World Health ◽

Support Vector ◽

The World ◽

Negative Impacts ◽

Health Organization

Abstract: Covid-19 has been set as a Pandemic by the World Health Organization (WHO). The very large impact and the infection that is fast enough are the reasons for making Covid-19 as a pandemic and efforts to overcome. One anticipation that can be done is to do lockdown. Making the decision to carry out a lockdown is intended to reduce the spread that occurs. Lockdown is certainly not a 100% good solution for all of individual. There are individual who agree that the lockdown will be implemented, also there are those who think that the lockdown is better not to be carried out considering the negative impacts that can occur. Therefore in this study will be presented the predictive modeling for sentiment analysis related to "lockdown" specially on social media Twitter. The method used to labeled was using Vader then the tweets are extracted using TF-IDF, and modeling is made for the prediction of sentiment using Naïve Bayes and Support Vector Machine. The results obtained from the two algorithms are more than 80%. Keywords: Covid-19, lockdown, TF-IDF, Naïve Bayes, Support Vector Machine Abstrak: Covid-19 telah ditetapkan sebagia Pandemi oleh World Health Organization (WHO). Dampak yang sangat besar dan penyebaran yang cukup cepat menjadi alsan untuk menjadikan Covid-19 sebagai Pandemi dan perlu dilakukan upaya penanggulangan. Salah satu upaya yang bisa dilakukan adalah dengan melakukan lockdown. Pengambilan keputusan untuk melakukan lockdown diperuntukan guna mengurangi penyebaran yang terjadi. Lockdown tentunya bukanlah solusi yang 100% baik bagi segala pihak. Terdapat pihak - pihak yang menyetujui akan dilaksanakannya lockdown, ada pula yang beranggapan bahwa lockdown lebih baik tidak dilaksanakan dengan pertimbangan dampak negatif yang bisa terjadi. Oleh karena itu, pada penelitian ini akan disampaikan mengenai pembuatan pemodelan prediksi terkait analisa sentimen terkait “Lockdown” yang dikhususkan pada media sosial Twitter. Metode yang digunakan adalah dengan melakukan labeling menggunakan Vader dan selanjutnya tweet tersebut dilakukan ekstraksi menggunakan TF-IDF, dan dibuatkan pemodelan untuk prediksi sentimen menggunakan Naïve Bayes dan Support Vector Machine. Hasil evaluasi yang didapat dari kedua algoritma tersebut ialah mencapai lebih dari 80%. Kata kunci: Covid-19, lockdown, TF-IDF, Naïve Bayes, Support Vector Machine Abstract: Covid-19 has been set as a Pandemic by the World Health Organization (WHO). The very large impact and the infection that is fast enough are the reasons for making Covid-19 as a pandemic and efforts to overcome. One anticipation that can be done is to do lockdown. Making the decision to carry out a lockdown is intended to reduce the spread that occurs. Lockdown is certainly not a 100% good solution for all of individual. There are individual who agree that the lockdown will be implemented, also there are those who think that the lockdown is better not to be carried out considering the negative impacts that can occur. Therefore in this study will be presented the predictive modeling for sentiment analysis related to "lockdown" specially on social media Twitter. The method used to labeled was using Vader then the tweets are extracted using TF-IDF, and modeling is made for the prediction of sentiment using Naïve Bayes and Support Vector Machine. The results obtained from the two algorithms are more than 80%. Keywords:Covid-19, lockdown, TF-IDF, Naïve Bayes, Support Vector Machine Abstrak: Covid-19 telah ditetapkan sebagia Pandemi oleh World Health Organization (WHO). Dampak yang sangat besar dan penyebaran yang cukup cepat menjadi alsan untuk menjadikan Covid-19 sebagai Pandemi dan perlu dilakukan upaya penanggulangan. Salah satu upaya yang bisa dilakukan adalah dengan melakukan lockdown. Pengambilan keputusan untuk melakukan lockdown diperuntukan guna mengurangi penyebaran yang terjadi. Lockdown tentunya bukanlah solusi yang 100% baik bagi segala pihak. Terdapat pihak - pihak yang menyetujui akan dilaksanakannya lockdown, ada pula yang beranggapan bahwa lockdown lebih baik tidak dilaksanakan dengan pertimbangan dampak negatif yang bisa terjadi. Oleh karena itu, pada penelitian ini akan disampaikan mengenai pembuatan pemodelan prediksi terkait analisa sentimen terkait “Lockdown” yang dikhususkan pada media sosial Twitter. Metode yang digunakan adalah dengan melakukan labeling menggunakan Vader dan selanjutnya tweet tersebut dilakukan ekstraksi menggunakan TF-IDF, dan dibuatkan pemodelan untuk prediksi sentimen menggunakan Naïve Bayes dan Support Vector Machine. Hasil evaluasi yang didapat dari kedua algoritma tersebut ialah mencapai lebih dari 80%. Kata kunci: Covid-19, lockdown, TF-IDF, Naïve Bayes, Support Vector Machine

Download Full-text

Optimasi Naïve Bayes Dan Algoritma Genetika Untuk Prediksi Penerimaan Beasiswa Pendidikan Pada SMP Utama

Jurnal Teknik Komputer ◽

10.31294/jtk.v5i2.5343 ◽

2019 ◽

Vol 5 (2) ◽

pp. 189-196

Author(s):

Nining Suryani ◽

Evy Priyanti

Keyword(s):

Genetic Algorithms ◽

Naive Bayes ◽

Optimal Level ◽

Naïve Bayes ◽

Drop Out ◽

The Road ◽

Parent Status ◽

Pocket Money ◽

Scholarship Recipients ◽

Bayes Algorithm

Educational scholarships are one of the efforts to sustain students in getting a better education. Not a few students drop out in the middle of the road or cannot continue their education at the same level or higher level. Selection according to the criteria for scholarship recipients is important so that scholarships are on target. Similar to Depok Primary Middle School, educational scholarships are provided by schools based on 9 criteria for scholarship recipients, namely parent status, parent work, rented house, home appliances, vehicles, parents 'savings, parents' jewelry, cellphones and pocket money. With the number of prospective scholarship recipients there is an algorithm needed to accurately predict students who are entitled to scholarships. With the naïve bayes algorithm, accuracy is 77.50% in predicting scholarship recipients based on the criteria found in students. The use of genetic algorithms is done to get a more optimal level of accuracy. This is evidenced by the accuracy of 83.33%.

Download Full-text