Depression Detection Algorithm v1

Depression Detection with DM v1

10.17504/protocols.io.bzm8p49w ◽

2021 ◽

Author(s):

Umme Marzia Haque

Keyword(s):

Random Forest ◽

Decision Tree ◽

Supervised Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Learning Models ◽

Target Variable ◽

Correlated Variables ◽

Low Correlation ◽

Depression Detection

The study has used data from YMM. The Yes/No variables that had a low correlation with target variable have been removed. To extract the most relevant features , the high correlated variables with the target variable , the Boruta method was used in conjunction with a Random Forest( RF) Classifier. To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool To select suitable supervised learning models, the Tree-based Pipeline Optimization Tool (TPOTclassifier) has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) have been employed in the depression identification step.has been employed. RF, XGBoost (XGB), Decision Tree (DT), and Gaussian Naive Bayes (GaussianNB) were employed in the depression identification step.

Download Full-text

Sentiment Analysis of Social Media Users Using Naïve Bayes, Decision Tree, Random Forest Algorithm: A Case Study of Draft Law on the Elimination of Sexual Violence (RUU PKS)

2019 International Conference on Sustainable Engineering and Creative Computing (ICSECC) ◽

10.1109/icsecc.2019.8907228 ◽

2019 ◽

Author(s):

Khalisa Virra ◽

Rachmadita Andreswari ◽

Muhammad Azani Hasibuan

Keyword(s):

Social Media ◽

Random Forest ◽

Decision Tree ◽

Sexual Violence ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm

Download Full-text

Komparasi Tujuh Algoritma Identifikasi Fraud ATM Pada PT. Bank Central Asia Tbk

JATISI (Jurnal Teknik Informatika dan Sistem Informasi) ◽

10.35957/jatisi.v7i3.471 ◽

2020 ◽

Vol 7 (3) ◽

pp. 441-450

Author(s):

Haliem Sunata

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Central Asia ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Tree

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.

Download Full-text

Machine Learning Algorithms for Biological Targets: Investigating the Error Tolerance in Various Computational Methods

10.31219/osf.io/zkumv ◽

2019 ◽

Author(s):

Thomas M. Kaiser ◽

Pieter B. Burger

Keyword(s):

Machine Learning ◽

Random Forest ◽

Naive Bayes ◽

Probabilistic Neural Network ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Learning Models ◽

Bayes Network ◽

Insight Into ◽

Machine Learning Models

Machine learning continues to make strident advances in the prediction of desired properties concerning drug development. Problematically, the efficacy of machine learning in these arenas is reliant upon highly accurate and abundant data. These two limitations, high accuracy and abundance, are often taken together; however, insight into the dataset accuracy limitation of contemporary machine learning algorithms may yield insight into whether non-bench experimental sources of data may be used to generate useful machine learning models where there is a paucity of experimental data. We took highly accurate data across six kinase types, one GPCR, one polymerase, a human protease, and HIV protease, and intentionally introduced error at varying population proportions in the datasets for each target. With the generated error in the data, we explored how the retrospective accuracy of a Naïve Bayes Network, a Random Forest Model, and a Probabilistic Neural Network model decayed as a function of error. Additionally, we explored the ability of a training dataset with an error profile resembling that produced by the Free Energy Perturbation method (FEP+) to generate machine learning models with useful retrospective capabilities. The categorical error tolerance was quite high for a Naïve Bayes Network algorithm averaging 39% error in the training set required to lose predictivity on the test set. Additionally, a Random Forest tolerated a significant degree of categorical error introduced into the training set with an average error of 29% required to lose predictivity. However, we found the Probabilistic Neural Network algorithm did not tolerate as much categorical error requiring an average of 20% error to lose predictivity. Finally, we found that a Naïve Bayes Network and a Random Forest could both use datasets with an error profile resembling that of FEP+. This work demonstrates that computational methods of known error distribution like FEP+ may be useful in generating machine learning models not based on extensive and expensive in vitro-generated datasets.

Download Full-text

Sentiment Analysis of Social Media Twitter with Case of Anti-LGBT Campaign in Indonesia using Naïve Bayes, Decision Tree, and Random Forest Algorithm

Procedia Computer Science ◽

10.1016/j.procs.2019.11.181 ◽

2019 ◽

Vol 161 ◽

pp. 765-772 ◽

Cited By ~ 6

Author(s):

Veny Amilia Fitri ◽

Rachmadita Andreswari ◽

Muhammad Azani Hasibuan

Keyword(s):

Social Media ◽

Random Forest ◽

Decision Tree ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Random Forest Algorithm

Download Full-text

Data Driven Approach for Eye Disease Classification with Machine Learning

Applied Sciences ◽

10.3390/app9142789 ◽

2019 ◽

Vol 9 (14) ◽

pp. 2789 ◽

Cited By ~ 3

Author(s):

Sadaf Malik ◽

Nadia Kanwal ◽

Mamoona Naveed Asghar ◽

Mohammad Ali A. Sadiq ◽

Irfan Karamat ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Multiple Features ◽

Standard Format ◽

Free Data

Medical health systems have been concentrating on artificial intelligence techniques for speedy diagnosis. However, the recording of health data in a standard form still requires attention so that machine learning can be more accurate and reliable by considering multiple features. The aim of this study is to develop a general framework for recording diagnostic data in an international standard format to facilitate prediction of disease diagnosis based on symptoms using machine learning algorithms. Efforts were made to ensure error-free data entry by developing a user-friendly interface. Furthermore, multiple machine learning algorithms including Decision Tree, Random Forest, Naive Bayes and Neural Network algorithms were used to analyze patient data based on multiple features, including age, illness history and clinical observations. This data was formatted according to structured hierarchies designed by medical experts, whereas diagnosis was made as per the ICD-10 coding developed by the American Academy of Ophthalmology. Furthermore, the system is designed to evolve through self-learning by adding new classifications for both diagnosis and symptoms. The classification results from tree-based methods demonstrated that the proposed framework performs satisfactorily, given a sufficient amount of data. Owing to a structured data arrangement, the random forest and decision tree algorithms’ prediction rate is more than 90% as compared to more complex methods such as neural networks and the naïve Bayes algorithm.

Download Full-text

Evaluating the Performance of Supervised Classification Models: Decision Tree and Naïve Bayes Using KNIME

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20079 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 248 ◽

Cited By ~ 1

Author(s):

Syed Muzamil Basha ◽

Dharmendra Singh Rajput ◽

Ravi Kumar Poluru ◽

S. Bharath Bhushan ◽

Shaik Abdul Khalandar Basha

Keyword(s):

Decision Tree ◽

Classification Accuracy ◽

Supervised Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Task ◽

Classification Models ◽

Target Variable ◽

Input Variables ◽

F Measure

The classification task is to predict the value of the target variable from the values of the input variables. If a target is provided as part of the dataset, then classification is a supervised task. It is important to analysis the performance of supervised classification models before using them in classification task. In our research we would like to propose a novel way to evaluated the performance of supervised classification models like Decision Tree and Naïve Bayes using KNIME Analytics platform. Experiments are conducted on Multi variant dataset consisting 58000 instances, 9 columns associated specially for classification, collected from UCI Machine learning repositories (http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle)) and compared the performance of both the models in terms of Classification Accuracy (CA) and Error Rate. Finally, validated both the models using Metric precision, recall and F-measure. In our finding, we found that Decision tree acquires CA (99.465%) where as Naïve Bayes attain CA (90.358%). The F-measure of Decision tree is 0.984, whereas Naïve Bayes acquire 0.7045.

Download Full-text

Random Forest Classifier untuk Deteksi Penderita COVID-19 berbasis Citra CT Scan

Jurnal Teknik Komputer ◽

10.31294/jtk.v7i2.10468 ◽

2021 ◽

Vol 7 (2) ◽

pp. 187-193

Author(s):

Nanik Wuryani ◽

Sarifah Agustiani

Keyword(s):

Random Forest ◽

Decision Tree ◽

Ct Scan ◽

Naive Bayes ◽

Naïve Bayes ◽

Color Histogram ◽

Support Vector ◽

K Nearest Neighbor ◽

Linear Discriminant ◽

Hu Moments

Covid-19 merupakan virus yang menyebar dan meluas sehingga berubah menjadi suatu pandemi. Virus Covid-19 menyerang melalui organ vital manusia yaitu paru-patu, oleh karena itu peneliti lebih berfokus untuk mengidentifikasi Covid-19 pada paru-paru. Penelitian ini dilakukan dengan menggunakan citra CT Scan paru-paru dan bertujuan untuk mendeteksi ada tidaknya virus dengan cara mengklasifikasikan citra Covid-19 ke dalam tiga kelas menggunakan algoritma Random Forest serta mengkombinasikannya dengan menyertakan beberapa ekstraksi fitur yaitu Haralick, Color Histogram, dan Hu-Moments. Penelitian dimulai dengan hanya memasukkan satu fitur ke dalam percobaan, lalu mengkombinasikan dengan fitur yang lain, kemudian membandingkannya menggunakan klasifikasi oleh algoritma lain seperti K-Nearest Neighbor (KNN), Decision Tree, Linear Discriminant Analysis (LDA), Logistic Regression, Support Vector Machine (SVM), dan Naive Bayes. Hasil penelitian menunjukkan bahwa akurasi tertinggi dihasilkan oleh algoritma Random Forest dengan memasukkan fitur Haralick dan Color Histogram ke dalam proses yaitu sebesar 96,9%, diikuti oleh KNN sebesar 96,5%, Decision Tree sebesar 95,5%, dan yang paling rendah yaitu Naive Bayes sebesar 42,4%

Download Full-text

IProCAD: Intelligent Prognosis of Coronary Artery Disease Excluding Angiogram in Patient with Stable Angina

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e3101.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 2032-2040

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Random Forest ◽

Decision Tree ◽

Stable Angina ◽

Naive Bayes ◽

Feature Vector ◽

Naïve Bayes ◽

The Other

Cardiovascular diseases are one of the main causes of mortality in the world. A proper prediction mechanism system with reasonable cost can significantly reduce this death toll in the low-income countries like Bangladesh. For those countries we propose machine learning backed embedded system that can predict possible cardiac attack effectively by excluding the high cost angiogram and incorporating only twelve (12) low cost features which are age, sex, chest pain, blood pressure, cholesterol, blood sugar, ECG results, heart rate, exercise induced angina, old peak, slope, and history of heart disease. Here, two heart disease datasets of own built NICVD (National Institute of Cardiovascular Disease, Bangladesh) patients’, and UCI (University of California Irvin) are used. The overall process comprises into four phases: Comprehensive literature review, collection of stable angina patients’ data through survey questionnaires from NICVD, feature vector dimensionality is reduced manually (from 14 to 12 dimensions), and the reduced feature vector is fed to machine learning based classifiers to obtain a prediction model for the heart disease. From the experiments, it is observed that the proposed investigation using NICVD patient’s data with 12 features without incorporating angiographic disease status to Artificial Neural Network (ANN) shows better classification accuracy of 92.80% compared to the other classifiers Decision Tree (82.50%), Naïve Bayes (85%), Support Vector Machine (SVM) (75%), Logistic Regression (77.50%), and Random Forest (75%) using the 10-fold cross validation. To accommodate small scale training and test data in our experimental environment we have observed the accuracy of ANN, Decision Tree, Naïve Bayes, SVM, Logistic Regression and Random Forest using Jackknife method, which are 84.80%, 71%, 75.10%, 75%, 75.33% and 71.42% respectively. On the other hand, the classification accuracies of the corresponding classifiers are 91.7%, 76.90%, 86.50%, 76.3%, 67.0% and 67.3%, respectively for the UCI dataset with 12 attributes. Whereas the same dataset with 14 attributes including angiographic status shows the accuracies 93.5%, 76.7%, 86.50%, 76.8%, 67.7% and 69.6% for the respective classifiers

Download Full-text

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Jurnal Sistem dan Teknologi Informasi (JustIN) ◽

10.26418/justin.v7i3.33316 ◽

2019 ◽

Vol 7 (3) ◽

pp. 202

Author(s):

Muhammad Sony Maulana ◽

Raja Sabarudin ◽

Wahyu Nugraha

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Rule Induction ◽

T Test

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

Download Full-text