PEMBENTUKAN MODEL KLASIFIKASI DATA LAMA STUDI MAHASISWA STMIK INDONESIA MENGGUNAKAN DECISION TREE DENGAN ALGORITMA NBTREE

One of the assessment criteria for the accreditation of the study program is the assessment of the duration of the study of students who graduated on time. not a few students who pursue the study period exceeds the established standard of graduation. So it is important for the study program to know which students have the possibility of passing is not timely. For that it is necessary to predict the length of student study. One way to predict the length of a student's study is to build a classification model. This study aims to build a long prediction model of student study using Decision Tree with NBTree algorithm. The data used are academic value data and student academic leave data. The result obtained is a classification model of Naïve Bayes Decision Tree with 73.45% accuracy.

Download Full-text

Classifying the Level of Energy-Environmental Efficiency Rating of Brazilian Ethanol

Energies ◽

10.3390/en13082067 ◽

2020 ◽

Vol 13 (8) ◽

pp. 2067

Author(s):

Nilsa Duarte da Silva Lima ◽

Irenilza de Alencar Nääs ◽

João Gilberto Mendes dos Reis ◽

Raquel Baracat Tosi Rodrigues da Silva

Keyword(s):

Decision Tree ◽

High Efficiency ◽

Rating Scale ◽

Naive Bayes ◽

Naïve Bayes ◽

Environmental Efficiency ◽

Classification Model ◽

Bayes Algorithm ◽

J48 Decision Tree

The present study aimed to assess and classify energy-environmental efficiency levels to reduce greenhouse gas emissions in the production, commercialization, and use of biofuels certified by the Brazilian National Biofuel Policy (RenovaBio). The parameters of the level of energy-environmental efficiency were standardized and categorized according to the Energy-Environmental Efficiency Rating (E-EER). The rating scale varied between lower efficiency (D) and high efficiency + (highest efficiency A+). The classification method with the J48 decision tree and naive Bayes algorithms was used to predict the models. The classification of the E-EER scores using a decision tree using the J48 algorithm and Bayesian classifiers using the naive Bayes algorithm produced decision tree models efficient at estimating the efficiency level of Brazilian ethanol producers and importers certified by the RenovaBio. The rules generated by the models can assess the level classes (efficiency scores) according to the scale discretized into high efficiency (Classification A), average efficiency (Classification B), and standard efficiency (Classification C). These results might generate an ethanol energy-environmental efficiency label for the end consumers and resellers of the product, to assist in making a purchase decision concerning its performance. The best classification model was naive Bayes, compared to the J48 decision tree. The classification of the Energy Efficiency Note levels using the naive Bayes algorithm produced a model capable of estimating the efficiency level of Brazilian ethanol to create labels.

Download Full-text

Text Analysis of Applicants for Personality Classification Using Multinomial Naïve Bayes and Decision Tree

JURNAL INFOTEL ◽

10.20895/infotel.v12i3.505 ◽

2020 ◽

Vol 12 (3) ◽

Author(s):

Nanda Yonda Hutama ◽

Kemas Muslim Lhaksmana ◽

Isman Kurniawan

Keyword(s):

Decision Tree ◽

Personality Traits ◽

Big Five ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Big Five Personality ◽

Text Data ◽

Personality Classification ◽

Best Parameters

Employees' qualities affect companies' performances and with a large number of applicants, it's difficult to find suitable applicants. To help with it, companies carry out psychological tests to know applicants' personalities, since personality's considered to have a relationship with work performances. But psychological testing requires a lot of effort, cost, and human resources. Thus with a system that can classify personalities through text can help reduce the effort needed. Similar studies carried out with the big five personalities as the theoretical basis and used one of the personality traits, namely using the k-NN method with 65% accuracy. Based on these studies, accuracy can improve by finding the best parameters using all of the big five personalities. This research is conducted based on the big five personality traits and related traits, namely consciousness and agreeableness. The data used is text data that's been labelled, pre-processed and feature selected. The clean text data is used to create a classification model using multinomial Naive Bayes and decision trees. There are 6 models built based on 3 work cultures, decision tree with an accuracy of 33%, 66%, 80%, and multinomial naïve Bayes with an accuracy of 83%, 50%, 60%, which resulted as better performance.

Download Full-text

A Short-Term Power Output Forecasting Based on Augmented Naïve Bayes Classifiers for High Wind Power Penetrations

Sustainability ◽

10.3390/su132212723 ◽

2021 ◽

Vol 13 (22) ◽

pp. 12723

Author(s):

Gyeongmin Kim ◽

Jin Hur

Keyword(s):

Prediction Model ◽

Wind Power ◽

Power Output ◽

Naive Bayes ◽

Meteorological Factors ◽

Clean Energy ◽

Naïve Bayes ◽

Classification Model ◽

Ensemble Prediction ◽

Renewable Power

Renewable-power-generating resources can provide unlimited clean energy and emit at most minute amounts of air pollutants and greenhouse gases, whereas fossil fuels are contributing to environmental pollution problems and climate change. The share of global power capacity comprising renewable-power-generating resources is increasing. However, due to the variability and uncertainty of wind resources, predicting the power output of these resources remains a key problem that must be resolved to establish stable power system operation and planning. In this study, we propose an ensemble prediction model for wind-power-generating resources based on augmented naïve Bayes classifiers. To select the principal component that affects the wind power outputs from among various meteorological factors, such as temperature, wind speed, and wind direction, prediction of wind-power-generating resources was performed using multiple linear regression (MLR) and a naïve Bayes classification model based on the selected meteorological factors. We proposed applying the analogue ensemble (AnEn) algorithm and the ensemble learning technique to predict the wind power. To validate this proposed hybrid prediction model, we analyzed empirical data from the wind farm of Jeju Island in South Korea and found that the proposed model has lower error than the single prediction models.

Download Full-text

Automation of gender determination in human canines using artificial intelligence

Dental Journal (Majalah Kedokteran Gigi) ◽

10.20473/j.djmkg.v50.i3.p116-120 ◽

2018 ◽

Vol 50 (3) ◽

pp. 116

Author(s):

F. Fidya ◽

Bayu Priyambadha

Keyword(s):

Artificial Intelligence ◽

Sexual Dimorphism ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Accuracy Rate ◽

Identification Process ◽

Gender Determination ◽

Lower Canine

Background: Gender determination is an important aspect of the identification process. The tooth represents a part of the human body that indicates the nature of sexual dimorphism. Artificial intelligence enables computers to perform to the same standard the same tasks as those carried out by humans. Several methods of classification exist within an artificial intelligence approach to identifying sexual dimorphism in canines. Purpose: This study aimed to quantify the respective accuracy of the Naive Bayes, decision tree, and multi-layer perceptron (MLP) methods in identifying sexual dimorphism in canines. Methods: A sample of results derived from 100 measurements of the diameter of mesiodistal, buccolingual, and diagonal upper and lower canine jaw models of both genders were entered into an application computer program that implements the algorithm (MLP). The analytical process was conducted by the program to obtain a classification model with testing being subsequently carried out in order to obtain 50 new measurement results, 25 each for males and females. A comparative analysis was conducted on the program-generated information. Results: The accuracy rate of the Naive Bayes method was 82%, while that of the decision tree and MLP amounted to 84%. The MLP method had an absolute error value lower than that of its decision tree counterpart. Conclusion: The use of artificial intelligence methods produced a highly accurate identification process relating to the gender determination of canine teeth. The most appropriate method was the MLP with an accuracy rate of 84%.

Download Full-text

Data Mining Application in Predicting Bank Loan Defaulters

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d2037.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2733-2744

Keyword(s):

Data Mining ◽

Decision Tree ◽

Model Building ◽

Naive Bayes ◽

Large Data ◽

Naïve Bayes ◽

Bank Loan ◽

Classification Model ◽

Data Set ◽

Data Mining Application

Data mining is the key tools for discoveries of knowledge from large data set. Nowadays, most of the organizations using this technology to maintain their data. This paper focuses on the Bank sector in Risk management specifically, detecting Bank loan defaulters through the data mining application to examine the patterns of different attribute which would contribute for detecting and predicting defaulters thus preventing wrong loans. This process can be done without change the current systems and the data. Then it helps to distinguish borrowers who repay loans promptly from those who don’t and avoid wrong loan allotment. In order to show the results of the study Classification model is implemented in order to find interesting patterns among attributes of customer. A total of 20461 sample data were taken by data base admin randomly from 3 consecutive years from the Bank database to build and test the model. In this research we used Classification model of decision tree and Naïve Bayes in Weka 3.7 tool for experiments. Modeling methodology applied to this paper was CIRSP-DM (Cross Industry Standard for Data Mining), which involves business understanding, data understanding, data preparation, model building, evaluation and deployment. Decision tree classifications with J48 implementation with 8 experiments were performed. Two experiments with different parameters were made for Naïve Bayes. Finally, evaluation and analysis of the models were performed then given a best solution to predict the defaulters.

Download Full-text

Comparison of Naïve Bayes Algorithm and Decision Tree C4.5 for Hospital Readmission Diabetes Patients using HbA1c Measurement

Knowledge Engineering and Data Science ◽

10.17977/um018v2i22019p58-71 ◽

2019 ◽

Vol 2 (2) ◽

pp. 58 ◽

Cited By ~ 1

Author(s):

Utomo Pujianto ◽

Asa Luki Setiawan ◽

Harits Ar Rosyid ◽

Ali M. Mohammad Salah

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes ◽

The Body ◽

Classification Model ◽

Diabetic Patients ◽

Patient Readmissions

Diabetes is a metabolic disorder disease in which the pancreas does not produce enough insulin or the body cannot use insulin produced effectively. The HbA1c examination, which measures the average glucose level of patients during the last 2-3 months, has become an important step to determine the condition of diabetic patients. Knowledge of the patient's condition can help medical staff to predict the possibility of patient readmissions, namely the occurrence of a patient requiring hospitalization services back at the hospital. The ability to predict patient readmissions will ultimately help the hospital to calculate and manage the quality of patient care. This study compares the performance of the Naïve Bayes method and C4.5 Decision Tree in predicting readmissions of diabetic patients, especially patients who have undergone HbA1c examination. As part of this study we also compare the performance of the classification model from a number of scenarios involving a combination of preprocessing methods, namely Synthetic Minority Over-Sampling Technique (SMOTE) and Wrapper feature selection method, with both classification techniques. The scenario of C4.5 method combined with SMOTE and feature selection method produces the best performance in classifying readmissions of diabetic patients with an accuracy value of 82.74 %, precision value of 87.1 %, and recall value of 82.7 %.

Download Full-text

A Hybrid System to Improve the Performance of Diabetes Disease Prediction using Genetic Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7374.129219 ◽

2020 ◽

Vol 9 (2) ◽

pp. 1720-1726

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Mortality Rate ◽

Decision Tree ◽

Prediction Model ◽

Naive Bayes ◽

Medical Science ◽

Naïve Bayes ◽

Support Vector ◽

Disease Prediction

Currently, data mining is playing a significant role in the healthcare system. It helps to extract the hidden pattern from the clinical dataset for further analysis. Also, it can be used to build a tool to manage the medical management system. Among the life-threatening diseases, diabetes mellitus is treated as a serious disease worldwide. Due to its mortality rate, early prediction and diagnosis are very important. Several research works are going on the mentioned issues to reduce the complications caused by diabetes as well as the mortality rate. The medical science needs to analyze an enormous quantity of clinical data for diagnosis purposes using machine learning techniques. In recent approaches, the disease datasets may contain insignificant and digressive features causing less accurate results. The aim of this paper is to analyze the existing prediction systems and hence develop a hybrid disease prediction model using the Genetic Algorithm for Naïve Bayes, Decision Tree and Support Vector Machine classifiers for better accuracy. This proposed diabetes prediction model produces the accuracies of 0.8182, 0.8052, and 0.8312 when Naïve Bayes, Decision Tree, and Support Vector Machine classifiers are used respectively. From the experimental results, it can be demonstrated that for all cases Support Vector Machine provides higher accuracy comparing to the other classifiers. In the analysis, the Pima Indian diabetes dataset is used to construct the proposed model.

Download Full-text

Data Mining Implementation Using Naïve Bayes Algorithm and Decision Tree J48 In Determining Concentration Selection

International Journal of Quantitative Research and Modeling ◽

10.46336/ijqrm.v1i3.72 ◽

2020 ◽

Vol 1 (3) ◽

pp. 123-134

Author(s):

Budiman Budiman ◽

Reni Nursyanti ◽

R Yadi Rakhman Alamsyah ◽

Imannudin Akbar

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Study Program ◽

Data Set ◽

Lower Accuracy ◽

Accuracy Result ◽

Bayes Algorithm

Computerization of society has substantially improved the ability to generate and collect data from a variety of sources. A large amount of data has flooded almost every aspect of people's lives. AMIK HASS Bandung has an Informatic Management Study Program consisting of three areas of concentration that can be selected by students in the fourth semester including Computerized Accounting, Computer Administration, and Multimedia. The determination of concentration selection should be precise based on past data, so the academic section must have a pattern or rule to predict concentration selection. In this work, the data mining techniques were using Naive Bayes and Decision Tree J48 using WEKA tools. The data set used in this study was 111 with a split test percentage mode of 75% used as training data as the model formation and 25% as test data to be tested against both models that had been established. The highest accuracy result obtained on Naive Bayes which is obtaining a 71.4% score consisting of 20 instances that were properly clarified from 28 training data. While Decision Tree J48 has a lower accuracy of 64.3% consisting of 18 instances that are properly clarified from 28 training data. In Decision Tree J48 there are 4 patterns or rules formed to determine concentration selection so that the academic section can assist students in determining concentration selection.

Download Full-text

Diabetic Prediction using Classification Method

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9718.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 264-267

Keyword(s):

Diabetes Mellitus ◽

Feature Extraction ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Classification Models ◽

Performance Parameters ◽

Prediction Analysis ◽

Input Dataset

Prediction analysis of diabetes mellitus is the main focus of this work. There are mainly three tasks involved in prediction analysis. These tasks are input dataset, feature extraction and classification. The earlier framework makes use of SVM and naïve bayes approaches for predicting this disease. This study implements voting classifier for prediction purpose. It is an ensemble approach. This classifier combines three classification models. These models are SVM, naïve bayes and decision tree. The implementation of available and new technique is carried out in python tool. These approaches give outcomes in terms of different performance parameters. In contrast to other classification models, proposed classification model performs better.

Download Full-text

Prediksi Sentimen Investor Pasar Modal Di Jejaring Sosial Menggunakan Text Mining

BALANCE: Economic, Business, Management and Accounting Journal ◽

10.30651/blc.v18i2.7226 ◽

2021 ◽

Vol 18 (2) ◽

pp. 32

Author(s):

Aestikani Mahani ◽

Hendro Margono

Keyword(s):

Text Mining ◽

Decision Tree ◽

Capital Market ◽

Naive Bayes ◽

Stock Exchange ◽

Investor Sentiment ◽

Naïve Bayes ◽

Classification Model ◽

Business World ◽

The Capital Market

The decline in optimism for capital market investors is one of the financial impacts on the business world that arose from the SARS-COVID19 pandemic. This event was reflected in a decrease in trading volume followed by a sharp drop in the JCI on the Indonesia Stock Exchange starting March 2020. Thus, a slowdown in the economic recovery resulting from the pandemic is reflected in investor sentiment in the capital market. On the one hand, the rapid development of the internet in Indonesia has triggered the investor's activities in the information searching prior buy and sell securities, mostly use online platforms, which contribute to influencing investor preferences and sentiment. This study conducted a qualitative examination of the features/terms of stock investment in the capital market and collected them in a compact dictionary (lexicon). Therefore, lexicon-based investor opinion extraction was extracted from Twitter, followed by the text sentiment analysis, and forming a classification model based on Naive Bayes and Decision Tree. This research output shows that the polarity of capital market investor sentiment is optimistic with the sentiment features that often appear, namely "cuan", "bearish," "serok", "copet", "untung", "cut loss", and "nyangkut." Meanwhile, the Decision Tree classification model provides better performance.Keywords : investor, lexicon, social network, stock exchange, text miningCorrespondence to : [email protected] Penurunan optimisme investor pasar modal adalah salah satu dampak keuangan pada dunia usaha yang timbul akibat pandemi SARS-COVID19. Hal ini tercermin dari turunnya volume perdagangan yang diikuti penurunan tajam IHSG di Bursa Efek Indonesia mulai Maret 2020. Sehingga kekhawatiran atas perlambatan pemulihan ekonomi sebagai dampak pandemi, tercermin dari sentimen investor di pasar modal. Di satu sisi, perkembangan internet di Indonesia yang pesat, memicu kecenderungan aktivitas investor dalam pencarian informasi sebelum membeli dan menjual surat berharga secara online, turut berkontribusi dalam mempengaruhi preferensi dan sentimen investor. Penelitian ini menggali ekspektasi investor yang tercermin pada sentimen investasi, dimana pasar modal sebagai salah satu barometer penting perekonomian suatu negara. Kajian ini mengeksplorasi fitur/terms investasi saham yang kerap muncul di pasar modal dan mengumpulkannya dalam kamus leksikon. Kemudian, dilakukan ekstraksi opini investor berbasis leksikon yang digali dari jejaring sosial Twitter, dilanjutkan dengan tahap text mining yaitu menganalisis sentimen, dan membentuk model klasifikasi berbasis Naive Bayes dan Decision Tree. Keluaran penelitian ini menunjukkan bahwa polaritas sentimen investor pasar modal adalah positif dengan fitur sentimen yang sering muncul yaitu “cuan”, “bearish”, “serok”, “copet”, “untung”, dan “cut loss”. Sedangkan model klasifikasi Decision Tree memberikan performansi akurasi yang kebih baik.Kata Kunci : Analisis sentimen; Investor; Leksikon; Text mining; Twitter

Download Full-text