Top-k Feature Selction Untuk Deteksi Penyakit Hepatitis Menggunakan Algoritme Naïve Bayes

Abstract. Becoming one of the society health problems in the world, hepatitis is an inflammation liver disease caused by a virus, bacterial infection, chemical substances including drugs and alcohol. In this research, for the dataset of hepatitis having high dimensionality, its value for each attribute was calculated using weight information gain method. Then, the attributes were selected by using top-k methods and were classified by using Naïve Bayes Algorithm respectively. This research showed that 9 out of 20 attributes had chosen to be the highest top-9 with an accuracy rate of 85.57%. Later on, this research can be useful for a consideration in a decision making process for various subjects related to feature selection and Naïve Bayes Algorithm method and also for predicting hepatitis.Keywords: data mining, weight information gain, Naïve Bayes algorithmAbstrak. Penyakit hepatitis merupakan masalah kesehatan masyarakat di dunia. Penyakit hepatitis merupakan penyakit peradangan hati yang disebabkan oleh virus, infeksi bakteri, zat-zat kimia termasuk obat-obatan dan alkohol. Pada penelitian ini, dataset hepatitis yang memiliki data berdimensi tinggi akan dihitung nilai bobot dari masing-masing atribut menggunakan metode weight information gain. Setelah dihitung nilai bobot dilakukan pemilihan atribut, atribut yang dipilih menggunakan metode top-k. Kemudian dilakukan klasifikasi menggunakan algoritme Naïve Bayes. Hasil penelitian menunjukkan dari 20 atribut, terpilih top-9 tertinggi dengan nilai akurasi 85.57%. Dengan adanya penelitian ini dapat digunakan sebagai bahan pertimbangan dan pengambilan keputusan pada berbagai bidang yang berkaitan dengan metode feature selection, algoritme Naïve Bayes, dan di dalam memprediksi penyakit hepatitis.Kata Kunci: data mining, weight information gain, algoritma Naïve Bayes

Download Full-text

Opinion Mining on Culinary Food Customer Satisfaction Using Naïve Bayes Based-on Hybrid Feature Selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i1.pp468-475 ◽

2019 ◽

Vol 15 (1) ◽

pp. 468 ◽

Cited By ~ 3

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Feature Selection ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Classification Model ◽

Consumer Ratings ◽

Bayes Algorithm ◽

Restaurant Owners

<p>Conducting an assessment of consumer sentiments taken from social media in assessing a culinary food gives useful information for everyone who wants to get this information especially for migrants and tourists, in th other hand that information is very valuable for food stall and restaurant owners as information in improvinf food quality. Overcoming this problem, a sentiment analysis classification model using naïve bayes algorithm (NB) was applied to get this information. This problem occurs is the level of accuracy of classification of consumer ratings of culinary food is still not optimal because the weight of values in the data preprocessing process are not optimal. In this paper proposed a hybrid feature selection models to overcome the problems in the process of selecting the feature attributes that have not been optimal by using a combination of information gain (IG) and genetic algorithm (GA) algorithms. The result of this research showed that after the experiment and compared to using others algorithms produce the best of the level occuracy is 93%.</p>

Download Full-text

IMPLEMENTASI KLASIFIKASI NAIVE BAYES DALAM MEMPREDIKSI LAMA STUDI MAHASISWA (STUDI KASUS : UNIVERSITAS DHYANA PURA)

SINTECH (Science and Information Technology) Journal ◽

10.31598/sintechjournal.v4i2.964 ◽

2021 ◽

Vol 4 (2) ◽

pp. 202-209

Author(s):

Kelvin Hennry Loudry Malelak ◽

I Made Dwi Ardiada ◽

Gerson Feoh

Keyword(s):

Data Mining ◽

Undergraduate Students ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Accuracy Rate ◽

Testing Data ◽

Study Programs ◽

Bayes Algorithm ◽

Academic Year

Under normal conditions, undergraduate or undergraduate students from a university can complete their studies for 4 years or 8 semesters. In fact, many students complete their study period of more than 4 years. Is known that in fact in the 2015/2016 academic year there were 744 people who were accepted as students. Of the 744 people who were accepted, 405 people had completed a study period of about 4 years and the remaining 39 people completed their studies for 5 years and 300 of them did not continue their studies. Based on the problem on, so This study implements a classification that can help Dhyana Pura University in predicting the length of study for students who are currently studying in various study programs at Dhyana Pura University. The author's method serves in the classification to predict long student study period is the Naive Bayes algorithm. By using the Java-based Rapid Miner tool to classify graduation data. Then the implementation of data mining which is divided into 968 training data and 193 data testing data with naive Bayes has succeeded in obtaining an accuracy rate of 100% which also has very good parameters.

Download Full-text

Penerapan Data Mining untuk Memprediksi Minat Nasabah Terhadap Produk Asuransi Meninggal Dunia dengan Metode Naïve Bayes (Studi Kasus : PT. BNI Life Insurance)

Respati ◽

10.35842/jtir.v16i2.406 ◽

2021 ◽

Vol 16 (2) ◽

pp. 103

Author(s):

Ari Hidayatullah, Ena Mudiawati, Muhammad Syafrullah

Keyword(s):

Data Mining ◽

Life Insurance ◽

Naive Bayes ◽

Naïve Bayes ◽

Insurance Companies ◽

Insurance Company ◽

Accuracy Rate ◽

Customer Data ◽

Bayes Algorithm ◽

Business Behavior

INTISASIPendapatan untuk perusahaan asuransi ditentukan oleh jumlah premi yang dibayar oleh nasabah. faktor penting nasabah berupa premi, premi ditentukan dalam persentase atau tarif tertentu. Pada perusahaan asuransi pasti memiliki jumlah data, dan data tersebut sangat penting bagi perusahaan untuk mengetahui kriteria nasabah yang berminat pada asurnsi yang dipasarkan. Dengan adanya informasi dari data nasabah yang ada, perusahaan asuransi dapat mengambil suatu keputusan dalam menerapkan stragi perusahaan diantarnya yaitu menjual produk- produk promo untuk meninggatkan pendapatan perusahaan. Data mining merupakan suatu teknologi yang dapat membantu perusahaan dalam menemukan suatu yang sangat penting dari sekumpulan data. Data mining dapat membentu sautu pola atau membuat suatu sifat perilaku bisnisa yang berguna untuk pengambilan keputusan. Dengan menggunakan metode algoritma Naive Bayes diharapkan bisa membantu perusahaan dalam pengelolaan data nasabah dengan cara mengklasifikasi data nasabah untuk memprediksi minat nasabah dengan tingkat akurasi melebihi 80% dalam memilih produk asuransi meninggal dunia. Kata Kunci: asuransi, baïve bayes, prediksi, data mining. ABSTRACTIncome for insurance companies is determined by the amount of premium paid by the customer. Important factors for customers in the form of premiums, premiums are determined in certain percentages or rates. The insurance company certainly has the amount of data, and the data is very important for companies to know the criteria of customers who are interested in the insurance marketed. With the information from existing customer data, the insurance company can make a decision in implementing the company's strategy, which is to sell promo products to increase company revenue. Data mining is a technology that can help companies find a very important set of data. Data mining can form a pattern or create a nature of business behavior that is useful for decision making. By using the Naive Bayes algorithm method, it is expected to be able to assist companies in managing customer data by classifying customer data to predict customer interest with an accuracy rate exceeding 80% in choosing a death insurance product. Keywords: insurance, baïve bayes, predictions, data mining..

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text

SENTIMEN ANALISIS KEBIJAKAN GANJIL GENAP DI TOL BEKASI MENGGUNAKAN ALGORITMA NAIVE BAYES DENGAN OPTIMALISASI INFORMATION GAIN

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.705 ◽

2019 ◽

Vol 15 (2) ◽

pp. 247-254

Author(s):

Heru Sukma Utama ◽

Didi Rosiyadi ◽

Dedi Aridarma ◽

Bobby Suryo Prakoso

Keyword(s):

Social Media ◽

Opinion Mining ◽

Naive Bayes ◽

Information Gain ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

Toll Road ◽

Textual Data ◽

Bayes Algorithm

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Naïve Bayes Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Naïve Bayes algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying NB Algorithm model. The results obtained from the study using the NB model are obtained Confusion Matrix result, namely accuracy of 79,55%, Precision of 80,51%, and Sensitivity or Recall of 80,91%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.

Download Full-text

OPTIMASI METODE NAIVE BAYES DENGAN FEATURE SELECTION INFORMATION GAIN UNTUK PREDIKSI KETERLAMBATAN PEMBAYARAN SPP SEKOLAH

Jurnal Ilmiah SINUS ◽

10.30646/sinus.v17i1.378 ◽

2019 ◽

Vol 17 (1) ◽

pp. 1

Author(s):

Muqorobin Muqorobin ◽

Kusrini Kusrini ◽

Emha Taufiq Luthfi

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Information Gain ◽

Confusion Matrix ◽

Education Institution ◽

Naïve Bayes ◽

Bayes Method ◽

Education Development ◽

Main Requirement ◽

Naive Bayes Method

The cost of education is one component of input that is very important in implementing education. Because costs are the main requirement in an effort to achieve educational goals. SMK Al-Islam Surakarta is a private education institution that requires students to pay school fees in the form of Education Development Donations. Educational Development Donation is a routine school fee that is conducted every month. Based on last year's TU report, many students were late in paying Education Development Donations, around 60%. This is a big problem. The purpose of this study is that researchers will build a predictive system using the Naïve Bayes method. Because the method can classify the class right or late, in the payment of school fees. Data processing was taken from the dapodik data of schools in 2017/2018 with the test dataset taking 30 records. To find out the level of accuracy, this research was conducted with the Naive Bayes Method and the Information Gain Method for feature selection. Accuracy testing is done by the Confusion Matrix method. The results showed that the highest accuracy was obtained by combining the Naive Bayes Method with the Information Gain Method obtained by 90% accuracy.

Download Full-text

Prediksi Tingkat Kelulusan Tepat Waktu Mahasiswa Menggunakan Algoritma Naïve Bayes pada Universitas XYZ

Jurnal ULTIMATICS ◽

10.31937/ti.v12i2.1715 ◽

2020 ◽

Vol 12 (2) ◽

pp. 104-107

Author(s):

Nurhayati . ◽

Nuraeny Septianti ◽

Nani Retnowati ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Information Technology ◽

Data Processing ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Method ◽

Processing Data ◽

Student Graduation ◽

Phase Data ◽

Bayes Algorithm

Data processing is imperative for the development of information technology. Almost any field of work has information about data. The data is made use of the analysis of the job. Nowadays, information data is imperatively processed to help workers in making decisions. This study discusses student prediction graduation rates by using the naïve Bayes method. That aims at providing information to college if they can use it properly to utilize the data of students who graduated by processing data mining. Based on the data mining process, steps founded that used producing information, namely predicting student graduation on time. The method of this study is Naïve Bayes with classification techniques. At this study, researchers used a six-phase data mining process of industry crossing standards in data mining known as CRISP-DM. The results of research concluded that the application of the Naive Bayes algorithm uses 4 (four) parameters namely ips, ipk, the number of credits, and graduation by getting an accuracy value of 80.95%.

Download Full-text

Implementation of The Naïve Bayes Algorithm with Feature Selection using Genetic Algorithm for Sentiment Review Analysis of Fashion Online Companies

2018 6th International Conference on Cyber and IT Service Management (CITSM) ◽

10.1109/citsm.2018.8674286 ◽

2018 ◽

Cited By ~ 2

Author(s):

Siti Ernawati ◽

Eka Rini Yulia ◽

Frieyadie ◽

Samudi

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Review Analysis ◽

Bayes Algorithm

Download Full-text

Improving Text Categorization by Multicriteria Feature Selection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2005.p0570 ◽

2005 ◽

Vol 9 (5) ◽

pp. 570-575

Author(s):

Son Doan ◽

◽

Susumu Horiguchi ◽

Keyword(s):

Feature Selection ◽

Natural Language ◽

Text Categorization ◽

Naive Bayes ◽

Naïve Bayes ◽

Experimental Results ◽

Benchmark Data ◽

Bayes Algorithm

Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.

Download Full-text