Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Sentiment Analysis of Social Media Users Using Naïve Bayes, Decision Tree, Random Forest Algorithm: A Case Study of Draft Law on the Elimination of Sexual Violence (RUU PKS)

2019 International Conference on Sustainable Engineering and Creative Computing (ICSECC) ◽

10.1109/icsecc.2019.8907228 ◽

2019 ◽

Author(s):

Khalisa Virra ◽

Rachmadita Andreswari ◽

Muhammad Azani Hasibuan

Keyword(s):

Social Media ◽

Random Forest ◽

Decision Tree ◽

Sexual Violence ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm

Download Full-text

GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

Geomatics Natural Hazards and Risk ◽

10.1080/19475705.2017.1289250 ◽

2017 ◽

Vol 8 (2) ◽

pp. 950-973 ◽

Cited By ~ 89

Author(s):

Wei Chen ◽

Xiaoshen Xie ◽

Jianbing Peng ◽

Jiale Wang ◽

Zhao Duan ◽

...

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Landslide Susceptibility ◽

Naive Bayes ◽

Naïve Bayes ◽

Comparative Assessment ◽

Kernel Logistic Regression ◽

Alternating Decision Tree ◽

Tree Models

Download Full-text

Komparasi Tujuh Algoritma Identifikasi Fraud ATM Pada PT. Bank Central Asia Tbk

JATISI (Jurnal Teknik Informatika dan Sistem Informasi) ◽

10.35957/jatisi.v7i3.471 ◽

2020 ◽

Vol 7 (3) ◽

pp. 441-450

Author(s):

Haliem Sunata

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Central Asia ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Tree

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

<span lang="EN-US">Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

Download Full-text

Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm

Procedia Computer Science ◽

10.1016/j.procs.2019.11.181 ◽

2019 ◽

Vol 161 ◽

pp. 765-772 ◽

Cited By ~ 6

Author(s):

Veny Amilia Fitri ◽

Rachmadita Andreswari ◽

Muhammad Azani Hasibuan

Keyword(s):

Social Media ◽

Random Forest ◽

Decision Tree ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm

Download Full-text

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Download Full-text

Random Forest Classifier untuk Deteksi Penderita COVID-19 berbasis Citra CT Scan

Jurnal Teknik Komputer ◽

10.31294/jtk.v7i2.10468 ◽

2021 ◽

Vol 7 (2) ◽

pp. 187-193

Author(s):

Nanik Wuryani ◽

Sarifah Agustiani

Keyword(s):

Random Forest ◽

Decision Tree ◽

Ct Scan ◽

Naive Bayes ◽

Naïve Bayes ◽

Color Histogram ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Hu Moments

Covid-19 merupakan virus yang menyebar dan meluas sehingga berubah menjadi suatu pandemi. Virus Covid-19 menyerang melalui organ vital manusia yaitu paru-patu, oleh karena itu peneliti lebih berfokus untuk mengidentifikasi Covid-19 pada paru-paru. Penelitian ini dilakukan dengan menggunakan citra CT Scan paru-paru dan bertujuan untuk mendeteksi ada tidaknya virus dengan cara mengklasifikasikan citra Covid-19 ke dalam tiga kelas menggunakan algoritma Random Forest serta mengkombinasikannya dengan menyertakan beberapa ekstraksi fitur yaitu Haralick, Color Histogram, dan Hu-Moments. Penelitian dimulai dengan hanya memasukkan satu fitur ke dalam percobaan, lalu mengkombinasikan dengan fitur yang lain, kemudian membandingkannya menggunakan klasifikasi oleh algoritma lain seperti K-Nearest Neighbor (KNN), Decision Tree, Linear Discriminant Analysis (LDA), Logistic Regression, Support Vector Machine (SVM), dan Naive Bayes. Hasil penelitian menunjukkan bahwa akurasi tertinggi dihasilkan oleh algoritma Random Forest dengan memasukkan fitur Haralick dan Color Histogram ke dalam proses yaitu sebesar 96,9%, diikuti oleh KNN sebesar 96,5%, Decision Tree sebesar 95,5%, dan yang paling rendah yaitu Naive Bayes sebesar 42,4%

Download Full-text

IProCAD: Intelligent Prognosis of Coronary Artery Disease Excluding Angiogram in Patient with Stable Angina

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e3101.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 2032-2040

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Random Forest ◽

Decision Tree ◽

Stable Angina ◽

Naive Bayes ◽

Feature Vector ◽

Naïve Bayes ◽

The Other

Cardiovascular diseases are one of the main causes of mortality in the world. A proper prediction mechanism system with reasonable cost can significantly reduce this death toll in the low-income countries like Bangladesh. For those countries we propose machine learning backed embedded system that can predict possible cardiac attack effectively by excluding the high cost angiogram and incorporating only twelve (12) low cost features which are age, sex, chest pain, blood pressure, cholesterol, blood sugar, ECG results, heart rate, exercise induced angina, old peak, slope, and history of heart disease. Here, two heart disease datasets of own built NICVD (National Institute of Cardiovascular Disease, Bangladesh) patients’, and UCI (University of California Irvin) are used. The overall process comprises into four phases: Comprehensive literature review, collection of stable angina patients’ data through survey questionnaires from NICVD, feature vector dimensionality is reduced manually (from 14 to 12 dimensions), and the reduced feature vector is fed to machine learning based classifiers to obtain a prediction model for the heart disease. From the experiments, it is observed that the proposed investigation using NICVD patient’s data with 12 features without incorporating angiographic disease status to Artificial Neural Network (ANN) shows better classification accuracy of 92.80% compared to the other classifiers Decision Tree (82.50%), Naïve Bayes (85%), Support Vector Machine (SVM) (75%), Logistic Regression (77.50%), and Random Forest (75%) using the 10-fold cross validation. To accommodate small scale training and test data in our experimental environment we have observed the accuracy of ANN, Decision Tree, Naïve Bayes, SVM, Logistic Regression and Random Forest using Jackknife method, which are 84.80%, 71%, 75.10%, 75%, 75.33% and 71.42% respectively. On the other hand, the classification accuracies of the corresponding classifiers are 91.7%, 76.90%, 86.50%, 76.3%, 67.0% and 67.3%, respectively for the UCI dataset with 12 attributes. Whereas the same dataset with 14 attributes including angiographic status shows the accuracies 93.5%, 76.7%, 86.50%, 76.8%, 67.7% and 69.6% for the respective classifiers

Download Full-text

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Jurnal Sistem dan Teknologi Informasi (JustIN) ◽

10.26418/justin.v7i3.33316 ◽

2019 ◽

Vol 7 (3) ◽

pp. 202

Author(s):

Muhammad Sony Maulana ◽

Raja Sabarudin ◽

Wahyu Nugraha

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Rule Induction ◽

T Test

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

Download Full-text

Modeling flood susceptibility using data-driven approaches of naïve Bayes tree, alternating decision tree, and random forest methods

Predicting heart ailment in patients with varying number of features using data mining techniques

Sentiment Analysis of Social Media Users Using Naïve Bayes, Decision Tree, Random Forest Algorithm: A Case Study of Draft Law on the Elimination of Sexual Violence (RUU PKS)

GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models

Komparasi Tujuh Algoritma Identifikasi Fraud ATM Pada PT. Bank Central Asia Tbk

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm

Data Driven Approach for Eye Disease Classification with Machine Learning

Random Forest Classifier untuk Deteksi Penderita COVID-19 berbasis Citra CT Scan

IProCAD: Intelligent Prognosis of Coronary Artery Disease Excluding Angiogram in Patient with Stable Angina

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Export Citation Format