Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Predicting heart ailment in patients with varying number of features using data mining techniques

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp56-62 ◽

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span lang="EN-US">Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Jurnal Sistem dan Teknologi Informasi (JustIN) ◽

10.26418/justin.v7i3.33316 ◽

2019 ◽

Vol 7 (3) ◽

pp. 202

Author(s):

Muhammad Sony Maulana ◽

Raja Sabarudin ◽

Wahyu Nugraha

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Rule Induction ◽

T Test

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

Download Full-text

Penanganan Ketidakseimbangan Data pada Prediksi Customer Churn Menggunakan Kombinasi SMOTE dan Boosting

IJCIT (Indonesian Journal on Computer and Information Technology) ◽

10.31294/ijcit.v6i1.9545 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Nana Suryana ◽

Pratiwi Pratiwi ◽

Rizki Tri Prasetio

Keyword(s):

Data Mining ◽

Deep Learning ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Customer Churn ◽

Number Of Customers

Industri telekomunikasi menghadapi persaingan yang ketat antara penyedia layanan (service provider). Persaingan ini mengakibatkan customer churn atau berpindahnya pelanggan dari satu layanan ke layanan lain. Customer churn menjadi masalah utama karena dapat mempengaruhi pendapatan perusahaan, profitabilitas, serta kelangsungan hidup perusahaan. Oleh karena itu, mengetahui pelanggan yang akan melakukan churn secara dini menjadi salah satu cara yang cukup efektif dilakukan, karena dapat membantu perusahaan dalam membuat rencana yang efektif untuk tetap mempertahankan pelanggannya. Jumlah pelanggan yang mengundurkan diri dari layanannya saat ini biasanya dimiliki perusahaan dalam jumlah yang sedikit. Kondisi kekurangan data ini menyebabkan kesulitan dalam memprediksi customer churn. Tujuan umum dari penelitian ini adalah memprediksi pelanggan yang akan berpindah ke layanan lain atau mengundurkan diri dari layanannya saat ini. Sementara tujuan khusus penelitian Penelitian ini berusaha menangani ketidakseimbangan data dalam prediksi customer churn menggunakan optimasi pada level data melalui metode sampling yaitu Synthetic Minority Over Sampling. Kemudian dikombinasikan dengan optimasi level algoritma melalui pendekatan teknik Boosting. Pada penelitian beberapa algoritma prediksi seperti random forest, naïve bayes, decision tree, k-nearest neighbor dan deep learning yang akan diimplementasikan untuk mengetahui algoritma yang paling baik setelah dilakukan optimasi menggunakan SMOTE dan Boosting. Metode penelitian yang digunakan pada penelitian ini adalah CRISP-DM, yang merupakan kerangka penelitian data mining untuk penelitian lintas industri. Hasil penelitian ini menunjukan bahwa algoritma random forest merupakan algoritma yang menghasilkan akurasi paling optimal setelah dioptimasi menggunakan SMOTE dan Boosting dengan hasil akurasi 89,19%. The telecommunications industry faces stiff competition between service providers. This competition results in customer churn. Customer churn is a major problem because it can affect company revenue, profitability, survival, and service quality of the company. Therefore, knowing which customers will churn in the future early is one of the most effective ways to do it, because it can help companies make an effective plan to keep their customers. The number of customers who withdrew from its current services is usually owned by a small number. This lack of data causes difficulties in predicting customer churn. This problem then becomes a challenging issue in machine learning. The general purpose of this research is to predict customers who will churn. While the specific purpose of this research is to try to deal with data imbalances in predicting customer churn using optimization at the data level through the sampling method, namely Synthetic Minority Over Sampling (SMOTE). Then combined with algorithm level optimization through the Boosting technique approach. In this study, several prediction algorithms like the random forest, naïve Bayes, decision tree, k-nearest neighbor, and deep learning will be implemented to find out the best algorithm after optimization using SMOTE and Boosting. The method used in this study is CRISP-DM, which is a data mining research framework for cross-industry research. The results of this study indicate that the random forest algorithm is an algorithm that produces the most optimal accuracy after being optimized using SMOTE and Boosting with an accuracy of 89.19%.

Download Full-text

Usage of data mining techniques in predicting the heart diseases — Naïve Bayes & decision tree

2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT) ◽

10.1109/iccpct.2017.8074215 ◽

2017 ◽

Cited By ~ 1

Author(s):

Priyanka N. ◽

Pushpa Ravi Kumar

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Data Mining Techniques

Download Full-text

Analisa Komparasi Algoritma Decision Tree C4.5 dan Naïve Bayes untuk Prediksi Churn Berdasarkan Kelas Pelanggan Retail

International Journal of Natural Science and Engineering ◽

10.23887/ijnse.v3i3.23113 ◽

2019 ◽

Vol 3 (3) ◽

pp. 103

Author(s):

Ni Wayan Wardani ◽

Ni Kadek Ariasih

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes

Pelanggan adalah salah satu aset utama bagi perusahaan ritel. Perusahaan harus dapat mengenali bagaimana karakter pelanggan mereka sehingga mereka dapat mempertahankan pelanggan yang sudah ada agar tidak berhenti membeli dan pindah ke perusahaan ritel yang bersaing (churn). Salah satu model yang tepat untuk mengenali karakter pelanggan adalah model RFM (Recency, Frekuensi, Moneter). Model RFM mampu menghasilkan kelas pelanggan dan di setiap kelas pelanggan dapat dianalisis atau diprediksi dengan konsep data mining apakah pelanggan tetap sebagai pelanggan atau churn. Data yang digunakan berasal dari data pelanggan dan data penjualan di UD. Mawar Sari. Kelas pelanggan UD Mawar Sari yang dihasilkan dari model RFM adalah Dormant, Everyday, Golden dan Superstar. Konsep data mining dengan membangun model prediksi dalam penelitian ini menggunakan algoritma Decision Tree C4.5 dan Naïve Bayes. Di semua kelas pelanggan kinerja Algoritma Naïve Bayes lebih baik daripada Algoritma Decision Tree C4.5 dengan Recall 95,92%, Precision 84,15%, dan Accuracy 83,49% dan kelas pelanggan yang memiliki potensi churn tinggi adalah Dormant B, Dormant E, dan Dormant F.Kata Kunci: Prediksi Churn, RFM, C4.5, Naïve Bayes

Download Full-text

Sentiment Analysis of Social Media Users Using Naïve Bayes, Decision Tree, Random Forest Algorithm: A Case Study of Draft Law on the Elimination of Sexual Violence (RUU PKS)

2019 International Conference on Sustainable Engineering and Creative Computing (ICSECC) ◽

10.1109/icsecc.2019.8907228 ◽

2019 ◽

Author(s):

Khalisa Virra ◽

Rachmadita Andreswari ◽

Muhammad Azani Hasibuan

Keyword(s):

Social Media ◽

Random Forest ◽

Decision Tree ◽

Sexual Violence ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm

Download Full-text

GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

Geomatics Natural Hazards and Risk ◽

10.1080/19475705.2017.1289250 ◽

2017 ◽

Vol 8 (2) ◽

pp. 950-973 ◽

Cited By ~ 89

Author(s):

Wei Chen ◽

Xiaoshen Xie ◽

Jianbing Peng ◽

Jiale Wang ◽

Zhao Duan ◽

...

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Landslide Susceptibility ◽

Naive Bayes ◽

Naïve Bayes ◽

Comparative Assessment ◽

Kernel Logistic Regression ◽

Alternating Decision Tree ◽

Tree Models

Download Full-text

Komparasi Tujuh Algoritma Identifikasi Fraud ATM Pada PT. Bank Central Asia Tbk

JATISI (Jurnal Teknik Informatika dan Sistem Informasi) ◽

10.35957/jatisi.v7i3.471 ◽

2020 ◽

Vol 7 (3) ◽

pp. 441-450

Author(s):

Haliem Sunata

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Central Asia ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Tree

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.

Download Full-text