A hybrid cost-sensitive ensemble for heart disease prediction

10.21203/rs.2.22946/v6 ◽

2021 ◽

Author(s):

Zhenya Qi ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Nearest Neighbor ◽

Statistical Tests ◽

Disease Diagnosis ◽

Medical Decision ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Promising Alternative ◽

Relief Algorithm

Abstract Background: Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. Methods: A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results: The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions: The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

Download Full-text

A hybrid cost-sensitive ensemble for heart disease prediction

10.21203/rs.2.22946/v4 ◽

2020 ◽

Author(s):

Zhenya Qi ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Nearest Neighbor ◽

Statistical Tests ◽

Disease Diagnosis ◽

Medical Decision ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Promising Alternative ◽

Relief Algorithm

Abstract Background: Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. Methods: A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results: The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions: The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

A hybrid cost-sensitive ensemble for heart disease prediction

10.21203/rs.2.22946/v5 ◽

2020 ◽

Author(s):

Zhenya Qi ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Nearest Neighbor ◽

Statistical Tests ◽

Disease Diagnosis ◽

Medical Decision ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Promising Alternative ◽

Relief Algorithm

Abstract Background: Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. Methods: A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results: The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions: The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

Download Full-text

Bayes Classifier dan Support Vector Machine dalam Klasifikasi Judul Karya Akhir Mahasiswa Program Studi PTIK UNJ

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.3.1.9 ◽

2019 ◽

Vol 3 (1) ◽

pp. 54-62

Author(s):

Razi Aziz Syahputro ◽

Widodo ◽

Hamidillah Ajie

Keyword(s):

Support Vector Machine ◽

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Vector Space Model ◽

Support Vector ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Space Model ◽

Fold Cross Validation

Penelitian ini dilatarbelakangi dengan dibutuhkannya sistem pengklasifikasian untuk memudahkan pihak Jurusan Teknik Elektro khususnya Program Studi PTIK untuk mengklasifikasikan judul skripsi berdasarkan peminatan. Sebelum sistem dibuat diperlukan pertimbangan dari beberapa algoritma klasifikasi yang ada, maka dari itu penelitian ini memilih 3 algoritma dari 10 algoritma terbaik menurut ICDM tahun 2006. Klasifikasi terhadap dokumen teks pendek seperti judul skripsi mahasiswa memiliki kesulitan tersendiri daripada dokumen teks panjang karena semakin sedikit kata semakin sulit diklasifikasi. Sehingga tujuan dari penelitian ini adalah untuk mengetahui algoritma yang paling efektif untuk mengklasifikasi judul skripsi. Penelitian ini terdiri dari beberapa tahap yaitu pengumpulan data, pengelompokan data melalui angket oleh dosen ahli, pre-processing text, pembobotan kata menggunakan vector space model dan tf-idf, evaluasi dengan k-fold cross validation, klasifikasi menggunakan k-nearest neighbor, naïve bayes classifier, dan support vector machine, dan analisis dengan confusion matrix. Percobaan dilakukan dengan menggunakan 266 data judul skripsi mahasiswa PTIK UNJ dari angkatan 2010-2013, dengan data terakhir berasal dari sidang skripsi pada semester 105(semester ganjil 2016/2017). Hasil dari klasifikasi menggunakan algoritma tersebut didapatkan algoritma yang paling efisien yaitu support vector machine dengan akurasi 82% dari 10 kali percobaan.

Download Full-text

Recommender System for Term Deposit Likelihood Prediction using Cross-validated Neural Network

South Asian Journal of Social Studies and Economics ◽

10.9734/sajsse/2021/v11i330286 ◽

2021 ◽

pp. 21-28

Author(s):

Shawni Dutta ◽

Samir Kumar Bandyopadhyay

Keyword(s):

Neural Network ◽

Cross Validation ◽

Nearest Neighbor ◽

Automated System ◽

K Nearest Neighbor ◽

Decision Tree Classifier ◽

Proposed Model ◽

Tree Classifier ◽

Customer Perspective ◽

Fold Cross Validation

For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer details are influential while considering possibilities of taking term deposit subscription. An automated system is provided in this paper that approaches towards prediction of term deposit investment possibilities in advance. Neural network along with stratified 10-fold cross-validation methodology is proposed as predictive model which is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concluded that proposed model provides significant prediction results over other baseline models with an accuracy of 88.32% and MSE of 0.1168.

Download Full-text

A hybrid cost-sensitive ensemble for heart disease prediction

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01436-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Qi Zhenya ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Nearest Neighbor ◽

Statistical Tests ◽

Disease Diagnosis ◽

Medical Decision ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Promising Alternative ◽

Relief Algorithm

Abstract Background Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What’s more, the misclassification cost could be very high. Methods A cost-sensitive ensemble method was proposed to improve the efficiency of diagnosis and reduce the misclassification cost. The proposed method contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. T-test was used to investigate if the performance of the ensemble was better than individual classifiers and the contribution of Relief algorithm. Results The best performance was achieved by the proposed method according to ten-fold cross validation. The statistical tests demonstrated that the performance of the proposed ensemble was significantly superior to individual classifiers, and the efficiency of classification was distinctively improved by Relief algorithm. Conclusions The proposed ensemble gained significantly better results compared with individual classifiers and previous studies, which implies that it can be used as a promising alternative tool in medical decision making for heart disease diagnosis.

Download Full-text

An Improved Coronary Heart Disease Predictive System Using Random Forest

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i130253 ◽

2021 ◽

pp. 17-27

Author(s):

Abdulraheem Abdul ◽

Rafiu M. Isiaka ◽

Ronke S. Babatunde ◽

Jumoke F. Ajao

Keyword(s):

Neural Network ◽

Coronary Heart Disease ◽

Feature Selection ◽

Heart Disease ◽

Random Forest ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbor

Aims: This work aim is to develop an enhanced predictive system for Coronary Heart Disease (CHD). Study Design: Synthetic Minority Oversampling Technique and Random Forest. Methodology: The Framingham heart disease dataset was used, which was collected from a study in Framingham, Massachusetts, the data was cleaned, normalized, rebalanced. Classifiers such as random forest, artificial neural network, naïve bayes, logistic regression, k-nearest neighbor and support vector machine were used for classification. Results: Random Forest outperformed other classifiers with an accuracy of 98%, a sensitivity of 99% and a precision of 95.8%. Feature selection was employed for better classification, but no significant improvement was recorded on the performance of the classifier with feature selection. Train test split also performed better that cross validation. Conclusion: Random Forest is recommended for research in Coronary Heart Disease prediction domain.

Download Full-text

Studi Komparasi Algoritma Klasifikasi Mental Workload Berdasarkan Sinyal EEG

Jurnal Sistem Cerdas ◽

10.37396/jsc.v3i2.69 ◽

2020 ◽

Vol 3 (2) ◽

pp. 133-143

Author(s):

Dessy Kusumaningrum ◽

Elly Matul Imah

Keyword(s):

Random Forest ◽

Cross Validation ◽

Nearest Neighbor ◽

Mental Workload ◽

Principal Component ◽

Support Vector ◽

Multi Layer Perceptron ◽

K Nearest Neighbor ◽

Electroencephalogram Eeg ◽

Fold Cross Validation

Kondisi psikologis dan fisik manusia dapat memengaruhi proses berpikir. Apabila kondisi individu mengalami kelelahan, maka dapat memengaruhi penurunan tingkat produktivitas maupun penurunan proses berpikir yang menyebabkan timbulnya mental workload. Workload yang dimiliki harus seimbang terhadap kemampuan dan keterbatasan yang dimiliki. Mental workload yang berlebih berdampak buruk bagi individu karena menimbulkan penurunan produktivitas kerja. Perangkat khusus yang dapat digunakan untuk mengetahui tingkat mental workload seorang individu adalah Electroencephalogram (EEG). EEG adalah perangkat khusus yang digunakan untuk mengukur sinyal potensi listrik dari otak. Dataset yang digunakan dalam penelitian ini adalah STEW: Simultaneous Task EEG Dataset dengan 45 subjek. Dalam penelitian ini, telah dilakukan studi komparasi algoritma Random Forest, K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), dan Support Vector Machine (SVM) untuk klasifikasi mental workload berdasarkan sinyal EEG. Studi dilakukan untuk menentukan algoritma terbaik dalam klasifikasi dilihat dari segi nilai akurasi dan penggunaan memori saat proses klasifikasi. Dataset telah melalui beberapa tahapan, diantaranya pra-pemrosesan data, ekstraksi fitur, dan proses klasifikasi. Pra-pemrosesan data menerapkan pembagian data menjadi beberapa chunk. Untuk mendapatkan ciri dalam ekstraksi fitur, diterapkan metode Principal Component Analysis (PCA). Pada proses klasifikasi menggunakan pendekatan k-fold cross validation. Hasil studi penelitian ini adalah algoritma terbaik dari sisi akurasi adalah algoritma KNN, algoritma terbaik dari sisi waktu pembuatan model adalah algoritma Random Forest, serta algoritma terbaik dari sisi penggunaan memori adalah algoritma MLP.

Download Full-text

Komparasi Kinerja Algoritma Data Mining pada Dataset Konsumsi Alkohol Siswa

Khazanah Informatika Jurnal Ilmu Komputer dan Informatika ◽

10.23917/khif.v4i2.7061 ◽

2018 ◽

Vol 4 (2) ◽

pp. 98

Author(s):

Noviyanti Sagala ◽

Hendrik Tampubolon

Keyword(s):

Data Mining ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Gain Ratio ◽

Feature Correlation ◽

Fold Cross Validation

Data mining melakukan proses ekstraksi pengetahuan yang diperoleh dari sekumpulan data dalam jumlah besar. Penelitian ini bertujuan untuk menerapkan dan melakukan analisis kinerja algoritma data mining untuk memprediksi konsumsi alkohol dan menganalisis faktor-faktor yang terkait pada siswa tingkat menengah. Adapun tahapan yang dilakukan ialah pra-proses data, seleksi fitur, klasifikasi, dan evaluasi model. Pada tahap praproses, beberapa fitur diubah menjadi bentuk yang sesuai untuk memudahkan proses klasifikasi. Selanjutnya, algoritma Gain Ratio dan Feature Correlation-Based Filter (FCBF) digunakan untuk memilih fitur-fitur yang relevan dan penting untuk digunakan dalam tahapan klasifikasi. Decision Tree C5.0, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), dan Naive Bayes (NB) dieksekusi pada kelompok fitur yang terpilih. Akurasi model yang dibangun dievaluasi menggunakan 10-fold Cross-Validation (CV). Hasil penelitian menunjukkan bahwa model klasifikasi yang dibangun menggunakan Naïve Bayes memiliki nilai akurasi tertinggi dengan menggunakan 5 fitur terbaik dari Gain Ratio. Selain itu, penggunaan metode pemilihan fitur mampu meningkatkan performa dari seluruh klasifier secara umum. Pengujian lebih lanjut pada data yang sama maupun berbeda perlu dilakukan untuk mendapatkan gambaran lebih mendalam mengenai kinerja algoritma-algoritma yang digunakan.

Download Full-text