Comparison with Classification Algorithms in Data Mining of a Fuel Automation System's Sales Data

This article deals with Otobil and pumps sales estimates at fuel stations. The fuel station data used in the study consists of 2384 data in total. Depending upon these data, classification procedures were performed on fuel station sales data using classification algorithms. In the study the classification algorithms that J48, Random Forest, KStar, Logistic Regression, IBk and Naive Bayes algorithms are used to compare the sales data estimations by using a software. The results obtained show that the accuracy rates of the J48 algorithm are more successful than others in general. It understands that these sales estimations shall encourage fuel station owners and association bodies to get more gainful.

Download Full-text

Predicting heart ailment in patients with varying number of features using data mining techniques

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp56-62 ◽

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

Comparison of Data Mining Classification Algorithms Determining the Default Risk

Scientific Programming ◽

10.1155/2019/8706505 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Begüm Çığşar ◽

Deniz Ünal

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Default Risk ◽

Operating Characteristic ◽

Classification Algorithms ◽

Financial Industry ◽

Statistical Institute ◽

Default Risks ◽

Characteristic Area ◽

Statistical Criteria

Big data and its analysis have become a widespread practice in recent times, applicable to multiple industries. Data mining is a technique that is based on statistical applications. This method extracts previously undetermined data items from large quantities of data. The banking and insurance industries use data mining analysis to detect fraud, offer the appropriate credit or insurance solutions to customers, and better understand customer demands. This study aims to identify data mining classification algorithms and use them to predict default risks, avoid possible payment difficulties, and reduce potential problems in extending credit. The data for this study, which contains demographic and socioeconomic characteristics of individuals, were obtained from the Turkish Statistical Institute 2015 survey. Six classification algorithms—Naive Bayes, Bayesian networks, J48, random forest, multilayer perceptron, and logistic regression—were applied to the dataset using WEKA 3.9 data mining software. These algorithms were compared considering the root mean error squares, receiver operating characteristic area, accuracy, precision, F-measure, and recall statistical criteria. The best algorithm—logistic regression—was obtained and applied to the real dataset to determine the attributes causing the default risk by using odds ratios. The socioeconomic and demographic characteristics of the individuals were examined, and based on the odds ratio values, the results of which individuals and characteristics were more likely to default, were reached. These results are not only beneficial to the literature but also have a significant influence in the financial industry in terms of the ability to predict customers’ default risk.

Download Full-text

Komparasi Tujuh Algoritma Identifikasi Fraud ATM Pada PT. Bank Central Asia Tbk

JATISI (Jurnal Teknik Informatika dan Sistem Informasi) ◽

10.35957/jatisi.v7i3.471 ◽

2020 ◽

Vol 7 (3) ◽

pp. 441-450

Author(s):

Haliem Sunata

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Central Asia ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Tree

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

Comparison of Random Forest, Logistic Regression, and MultilayerPerceptron Methods on Classification of Bank Customer Account Closure

Indonesian Journal of Applied Statistics ◽

10.13057/ijas.v4i1.41461 ◽

2021 ◽

Vol 4 (1) ◽

pp. 14

Author(s):

Husna Afanyn Khoirunissa ◽

Amanda Rizky Widyaningrum ◽

Annisa Priliya Ayu Maharani

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Feature Selection ◽

Random Forest ◽

Multilayer Perceptron ◽

Cross Validation ◽

Early Stage ◽

Bank Account ◽

Credit Score

The Bank is a business entity that is dealing with money, accepting deposits from customers, providing funds for each withdrawal, billing checks on the customer's orders, giving credit and or embedding the excess deposits until required for repayment. The purpose of this research is to determine the influence of age, gender, country, customer credit score, number of bank products used by the customer, and the activation of the bank members in the decision to choose to continue using the bank account that he has retained or closed the bank account. The data in this research used 10,000 respondents originating from France, Spain, and Germany. The method used is data mining with early stage preprocessing to clean data from outlier and missing value and feature selection to select important attributes. Then perform the classification using three methods, which are Random Forest, Logistic Regression, and Multilayer Perceptron. The results of this research showed that the model with Multilayer Perceptron method with 10 folds Cross Validation is the best model with 85.5373% accuracy.Keywords: bank customer, random forest, logistic regression, multilayer perceptron

Download Full-text

Komparasi Algoritma Klasifikasi Data Mining untuk Memprediksi Tingkat Kematian Dini Kanker dengan Dataset Early Death Cancer

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v4i2.1008 ◽

2019 ◽

Vol 4 (2) ◽

pp. 63

Author(s):

Panny Agustia Rahayuningsih

Keyword(s):

Neural Network ◽

Data Mining ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Early Death ◽

Naïve Bayes ◽

T Test ◽

Fold Cross Validation

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).

Download Full-text

Usage of Data Mining Techniques in Predicting the Heart Diseases Decision Tree & Random Forest Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h7168.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 963-967

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Early Diagnosis ◽

Decision Tree ◽

Heart Diseases ◽

Classification Algorithms ◽

Random Forest Algorithm ◽

Medical Field ◽

Data Mining Techniques

Nowadays, heart disease is the main cause of several deaths among all other diseases. Due to the lack of resources in the medical field, the prediction of heart diseases becomes a major problem. For early diagnosis and treatment, some classification algorithms such as Decision Tree and Random Forest Algorithm are used. The data mining techniques compare the accuracy of the algorithm and predict heart diseases. The main aim of this paper is to predict heart disease based on the dataset values. In this paper we are comparing the accuracy of above two algorithms. To implement these methods the following steps are used. In first phase, a dataset of 13 attributes is collected and it was applied on classification techniques using the Decision tree and Random Forest Algorithms. Finally, the accuracy is collected for both the algorithms. In this paper we observed that random forest is generating better results than decision tree in prediction of heart diseases.

Download Full-text

Analisis Kelayakan Lokasi Promosi Dalam Penerimaan Mahasiswa Baru (PMB) Dengan Algoritma Naïve Bayes & Decission Tree C4.5

Kilat ◽

10.33322/kilat.v10i1.1196 ◽

2021 ◽

Vol 10 (1) ◽

pp. 169-178

Author(s):

Wulan Wulandari

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithms ◽

Tertiary Institution ◽

Public And Private ◽

Measurement Results ◽

Bayes Algorithm ◽

The City ◽

New Student

Competition for new student admissions in every public and private tertiary institution is currently growing rapidly every year, some spend a lot of money on promotional activities, to assist institutions / institutions in obtaining recommendations for the feasibility of promotion locations based on several measurement criteria using the classification algorithms contained in data mining . The algorithm used to compare the measurement of the feasibility of the promotion location of the city and district of Bekasi is Naïve Bayes and Decission Tree C4.5 using four parameters including the number of students in one sub-district, the number of students in one sub-district, the distance of location and last year's enthusiasts using 35 regions / sub-districts in Bekasi city and district. measurement results using the rapidminner, the accuracy value of the Naïve Bayes algorithm is 91.43% and the Decission Tree C4.5 is 94.29%.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text