A Rough Set and Cellular Genetic Fusion Algorithm for Acute Critical Disease Prediction

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.

Download Full-text

A Hybrid System to Improve the Performance of Diabetes Disease Prediction using Genetic Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7374.129219 ◽

2020 ◽

Vol 9 (2) ◽

pp. 1720-1726

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Mortality Rate ◽

Decision Tree ◽

Prediction Model ◽

Naive Bayes ◽

Medical Science ◽

Naïve Bayes ◽

Support Vector ◽

Disease Prediction

Currently, data mining is playing a significant role in the healthcare system. It helps to extract the hidden pattern from the clinical dataset for further analysis. Also, it can be used to build a tool to manage the medical management system. Among the life-threatening diseases, diabetes mellitus is treated as a serious disease worldwide. Due to its mortality rate, early prediction and diagnosis are very important. Several research works are going on the mentioned issues to reduce the complications caused by diabetes as well as the mortality rate. The medical science needs to analyze an enormous quantity of clinical data for diagnosis purposes using machine learning techniques. In recent approaches, the disease datasets may contain insignificant and digressive features causing less accurate results. The aim of this paper is to analyze the existing prediction systems and hence develop a hybrid disease prediction model using the Genetic Algorithm for Naïve Bayes, Decision Tree and Support Vector Machine classifiers for better accuracy. This proposed diabetes prediction model produces the accuracies of 0.8182, 0.8052, and 0.8312 when Naïve Bayes, Decision Tree, and Support Vector Machine classifiers are used respectively. From the experimental results, it can be demonstrated that for all cases Support Vector Machine provides higher accuracy comparing to the other classifiers. In the analysis, the Pima Indian diabetes dataset is used to construct the proposed model.

Download Full-text

Model Prediksi Prestasi Mahasiswa Berdasarkan Evaluasi Pembelajaran Menggunakan Pendekatan Data Science

Data Sciences Indonesia (DSI) ◽

10.47709/dsi.v1i1.1168 ◽

2021 ◽

Vol 1 (1) ◽

pp. 14-20

Author(s):

Tommy Tommy ◽

Amir Mahmud Husein

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Data Science ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors

Perguruan tinggi merupakan satuan penyelenggara pendidikan tinggi sebagai tingkat lanjut jenjang pendidikan menengah di jalur pendidikan formal. Aspek prestasi belajar merupakan salah satu aspek penilaian keberhasilan perguruan tinggi dalam proses belajar. Dalam makalah ini menyajikan hasil analisis hubungan antara pembelajaran dengan prestasi mahasiswa dimana tahapan yang dilakukan menggunakan pendetakan data science. Berdasarkan Analisis data terdapat tiga indikator penting dalam penilaian prestasi belajar yaitu pedagogi, profesional dan kepribadian. Ketiga fitur digunakan sebagai variabel dependen untuk memprediksi prestasi belajar dimana algoritma DecisionTree menghasilkan akurasi lebih baik dari pada model k-nearest neighbors (KNN), Logistic Regression, Support Vector Machine, Naive Bayes dan dengan tingkat akurasi 68%, kemudian KNN dengan akurasi 66% dan lainnya sebesar 55% pada masing-masing algoritma yang diusulkan.

Download Full-text

Movie Success Prediction Using Naïve Bayes, Logistic Regression and Support Vector Machine

10.1109/icrito51393.2021.9596138 ◽

2021 ◽

Author(s):

Rachaell Nihalaani ◽

Apoorva Shete ◽

Darakshan Khan

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Naive Bayes ◽

Naïve Bayes ◽

Success Prediction ◽

Support Vector ◽

Movie Success

Download Full-text

Comparison of Tree Method, Support Vector Machine, Naïve Bayes, and Logistic Regression on Coffee Bean Image

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v9i1.536 ◽

2021 ◽

Vol 9 (1) ◽

pp. 126-136

Author(s):

Rahmat Robi Waliyansyah ◽

Umar Hafidz Asy'ari Hasbullah

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Classification Model ◽

Support Vector ◽

Coffee Bean ◽

Coffee Beans ◽

The Many

Coffee is one of the many favorite drinks of Indonesians. In Indonesia there are 2 types of coffee, namely Arabica & Robusta. The classification of coffee beans is usually done in a traditional way & depends on the human senses. However, the human senses are often inconsistent, because it depends on the mental or physical condition in question at that time, and only qualitative measures can be determined. In this study, to classify coffee beans is done by digital image processing. The parameters used are texture analysis using the Gray Level Coocurrence Matrix (GLCM) method with 4 features, namely Energy, Correlation, Homogeneity & Contrast. For feature extraction using a classification algorithm, namely Naïve Bayes, Tree, Support Vector Machine (SVM) and Logistic Regression. The evaluation of the coffee bean classification model uses the following parameters: AUC, F1, CA, precision & recall. The dataset used is 29 images of Arabica coffee beans and 29 images of Robusta beans. To test the accuracy of the model using Cross Validation. The results obtained will be evaluated using the confusion Matrix. Based on the results of testing and evaluation of the model, it is obtained that the SVM method is the best with the value of AUC = 1, CA = 0.983, F1 = 0.983, Precision = 0.983 and Recall = 0.983.

Download Full-text

Cavity auto-detection using machine learning algorithms: Logistic regression, support vector machine, and naïve Bayes

10.1190/iceg2019-066.1 ◽

2020 ◽

Author(s):

Hakim Saibi* ◽

Abdelkader Nasreddine Belkacem ◽

Mohamed Amrouche

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector

Download Full-text

Boosting Accuracy of Classical Machine Learning Antispam Classifiers in Real Scenarios by Applying Rough Set Theory

Scientific Programming ◽

10.1155/2016/5945192 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

N. Pérez-Díaz ◽

D. Ruano-Ordás ◽

F. Fdez-Riverola ◽

J. R. Méndez

Keyword(s):

Machine Learning ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Wide Range ◽

Vector Machines ◽

Bayes Algorithm

Nowadays, spam deliveries represent a major problem to benefit from the wide range of Internet-based communication forms. Despite the existence of different well-known intelligent techniques for fighting spam, only some specific implementations of Naïve Bayes algorithm are finally used in real environments for performance reasons. As long as some of these algorithms suffer from a large number of false positive errors, in this work we propose a rough set postprocessing approach able to significantly improve their accuracy. In order to demonstrate the advantages of the proposed method, we carried out a straightforward study based on a publicly available standard corpus (SpamAssassin), which compares the performance of previously successful well-known antispam classifiers (i.e., Support Vector Machines, AdaBoost, Flexible Bayes, and Naïve Bayes) with and without the application of our developed technique. Results clearly evidence the suitability of our rough set postprocessing approach for increasing the accuracy of previous successful antispam classifiers when working in real scenarios.

Download Full-text

Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17186449 ◽

2020 ◽

Vol 17 (18) ◽

pp. 6449

Author(s):

Parastoo Golpour ◽

Majid Ghayour-Mobarhan ◽

Azadeh Saki ◽

Habibollah Esmaily ◽

Ali Taghipour ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Support Vector Machine ◽

Logistic Regression ◽

Coronary Angiography ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Model

(1) Background: Coronary angiography is considered to be the most reliable method for the diagnosis of cardiovascular disease. However, angiography is an invasive procedure that carries a risk of complications; hence, it would be preferable for an appropriate method to be applied to determine the necessity for angiography. The objective of this study was to compare support vector machine, naïve Bayes and logistic regressions to determine the diagnostic factors that can predict the need for coronary angiography. These models are machine learning algorithms. Machine learning is considered to be a branch of artificial intelligence. Its aims are to design and develop algorithms that allow computers to improve their performance on data analysis and decision making. The process involves the analysis of past experiences to find practical and helpful regularities and patterns, which may also be overlooked by a human. (2) Materials and Methods: This cross-sectional study was performed on 1187 candidates for angiography referred to Ghaem Hospital, Mashhad, Iran from 2011 to 2012. A logistic regression, naive Bayes and support vector machine were applied to determine whether they could predict the results of angiography. Afterwards, the sensitivity, specificity, positive and negative predictive values, AUC (area under the curve) and accuracy of all three models were computed in order to compare them. All analyses were performed using R 3.4.3 software (R Core Team; Auckland, New Zealand) with the help of other software packages including receiver operating characteristic (ROC), caret, e1071 and rminer. (3) Results: The area under the curve for logistic regression, naïve Bayes and support vector machine were similar—0.76, 0.74 and 0.75, respectively. Thus, in terms of the model parsimony and simplicity of application, the naïve Bayes model with three variables had the best performance in comparison with the logistic regression model with seven variables and support vector machine with six variables. (4) Conclusions: Gender, age and fasting blood glucose (FBG) were found to be the most important factors to predict the result of coronary angiography. The naïve Bayes model performed well using these three variables alone, and they are considered important variables for the other two models as well. According to an acceptable prediction of the models, they can be used as pragmatic, cost-effective and valuable methods that support physicians in decision making.

Download Full-text

Komparasi Algoritma Support Vector Machine Dan Naïve Bayes Dengan Algotima Genetika Pada Analisis Sentimen Calon Gubernur Jabar 2018-2023

Jurnal Teknik Komputer ◽

10.31294/jtk.v6i1.6866 ◽

2020 ◽

Vol 6 (1) ◽

pp. 121-129

Author(s):

Deni Gunawan ◽

Dwiza Riana ◽

Dian Ardiansyah ◽

Fajar Akbar

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector

Abstrak – Kontestasi politik dalam penentuan menjadi pemimpin tingkat provinsi dalam hal ini gubernur jawa barat 2018-2023. Masyarakat yang memberikan opininya berupa tweet pada media sosial twitter menentukan bentuk dukungan atau tidaknya, sehingga perlu adanya analisis sentimen terhadap calon Gubernur agar mengetahui tingkat kepercayaan masyarakat serta terbentuk citra kepada calon Gubernur Jawa Barat 2018-2023. Akan tetapi membaca keseluruhan tweet yang tersebar dalam twitter yang berkaitan dengan masing-masing calon gubernur akan memakan waktu dan membingungkan dalam pengambilan keputusan. Klasifikasi sentimen akan mengurai masalah mengenai opini, pendapat, emosi dan prilaku dengan studi komputasi. Metode klasifikasi yang akan dibahas dalam penelitian yaitu dengan algoritma Naïve Bayes serta Support Vector Machine. Penentuan fitur menentuka hasil akurasi, dalam penentuan fitur seleksi digunakan Genetic Algorithm agar dapat meningkatan akurasi pengklasifikasian pada Support Vector Machine dan Naive Bayes. Perolehan penelitian ini yaitu klasifikasi teks dalam pola negatif atau positif dari tweet calon gubernur jawa barat 2018-2023. Pada dataset tidak seimbang Support Vector Machine menghasilkan rata-rata akurasi 92.61% dengan AUC 0,950, Naive Bayes menghasilkan rata-rata akurasi 93,29% dengan AUC 0,525, Support Vector Machine berbasis Genetic Algorithm menghasilkan rata-rata akurasi 93,03% dengan AUC 0,869, Naive Bayes berbasis Genetic Algorithm menghasilkan rata-rata akurasi 92,85% dengan AUC 0,543. Hasil ini menunjukan bahwa Support Vector Machine dapat digunakan untuk membangun deteksi tweet klasifikasi positif dan negatif dengan tingkat akurasi yang tinggi. Kebaruan dari penelitian ini adalah bahwa Support Vector Machine dapat digunakan untuk mendeteksi tweet pada dataset twitter berbahasa indonesia penulis.

Download Full-text

Shallow Landslide Susceptibility Mapping: A Comparison between Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine Algorithms

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17082749 ◽

2020 ◽

Vol 17 (8) ◽

pp. 2749 ◽

Cited By ~ 18

Author(s):

Viet-Ha Nhu ◽

Ataollah Shirzadi ◽

Himan Shahabi ◽

Sushant K. Singh ◽

Nadhir Al-Ansari ◽

...

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Landslide Susceptibility ◽

Logistic Model ◽

Naive Bayes ◽

Shallow Landslide ◽

Naïve Bayes ◽

Support Vector ◽

Model Tree ◽

Logistic Model Tree

Shallow landslides damage buildings and other infrastructure, disrupt agriculture practices, and can cause social upheaval and loss of life. As a result, many scientists study the phenomenon, and some of them have focused on producing landslide susceptibility maps that can be used by land-use managers to reduce injury and damage. This paper contributes to this effort by comparing the power and effectiveness of five machine learning, benchmark algorithms—Logistic Model Tree, Logistic Regression, Naïve Bayes Tree, Artificial Neural Network, and Support Vector Machine—in creating a reliable shallow landslide susceptibility map for Bijar City in Kurdistan province, Iran. Twenty conditioning factors were applied to 111 shallow landslides and tested using the One-R attribute evaluation (ORAE) technique for modeling and validation processes. The performance of the models was assessed by statistical-based indexes including sensitivity, specificity, accuracy, mean absolute error (MAE), root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC). Results indicate that all the five machine learning models performed well for shallow landslide susceptibility assessment, but the Logistic Model Tree model (AUC = 0.932) had the highest goodness-of-fit and prediction accuracy, followed by the Logistic Regression (AUC = 0.932), Naïve Bayes Tree (AUC = 0.864), ANN (AUC = 0.860), and Support Vector Machine (AUC = 0.834) models. Therefore, we recommend the use of the Logistic Model Tree model in shallow landslide mapping programs in semi-arid regions to help decision makers, planners, land-use managers, and government agencies mitigate the hazard and risk.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text