Comparison of Tree Method, Support Vector Machine, Naïve Bayes, and Logistic Regression on Coffee Bean Image

Coffee is one of the many favorite drinks of Indonesians. In Indonesia there are 2 types of coffee, namely Arabica & Robusta. The classification of coffee beans is usually done in a traditional way & depends on the human senses. However, the human senses are often inconsistent, because it depends on the mental or physical condition in question at that time, and only qualitative measures can be determined. In this study, to classify coffee beans is done by digital image processing. The parameters used are texture analysis using the Gray Level Coocurrence Matrix (GLCM) method with 4 features, namely Energy, Correlation, Homogeneity & Contrast. For feature extraction using a classification algorithm, namely Naïve Bayes, Tree, Support Vector Machine (SVM) and Logistic Regression. The evaluation of the coffee bean classification model uses the following parameters: AUC, F1, CA, precision & recall. The dataset used is 29 images of Arabica coffee beans and 29 images of Robusta beans. To test the accuracy of the model using Cross Validation. The results obtained will be evaluated using the confusion Matrix. Based on the results of testing and evaluation of the model, it is obtained that the SVM method is the best with the value of AUC = 1, CA = 0.983, F1 = 0.983, Precision = 0.983 and Recall = 0.983.

Download Full-text

Mental Stress Classification Based on a Support Vector Machine and Naive Bayes Using Electrocardiogram Signals

Sensors ◽

10.3390/s21237916 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7916

Author(s):

Mingu Kang ◽

Siho Shin ◽

Gengjia Zhang ◽

Jaehyo Jung ◽

Youn Tae Kim

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Mental Illnesses ◽

Classification Model ◽

Classification Error ◽

Support Vector ◽

Stress Classification ◽

Ecg Data

Examining mental health is crucial for preventing mental illnesses such as depression. This study presents a method for classifying electrocardiogram (ECG) data into four emotional states according to the stress levels using one-against-all and naive Bayes algorithms of a support vector machine. The stress classification criteria were determined by calculating the average values of the R-S peak, R-R interval, and Q-T interval of the ECG data to improve the stress classification accuracy. For the performance evaluation of the stress classification model, confusion matrix, receiver operating characteristic (ROC) curve, and minimum classification error were used. The average accuracy of the stress classification was 97.6%. The proposed model improved the accuracy by 8.7% compared to the previous stress classification algorithm. Quantifying the stress signals experienced by people can facilitate a more effective management of their mental state.

Download Full-text

A Rough Set and Cellular Genetic Fusion Algorithm for Acute Critical Disease Prediction

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.6.3894 ◽

2020 ◽

Vol 15 (6) ◽

Author(s):

Hongxin Wang ◽

Lijing Jia ◽

Heng Zhuang ◽

Xueyan Li ◽

Yuzhuo Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Logistic Regression ◽

Rough Set ◽

Rough Set Theory ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Disease Prediction ◽

Fusion Algorithm

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.

Download Full-text

Model Prediksi Prestasi Mahasiswa Berdasarkan Evaluasi Pembelajaran Menggunakan Pendekatan Data Science

Data Sciences Indonesia (DSI) ◽

10.47709/dsi.v1i1.1168 ◽

2021 ◽

Vol 1 (1) ◽

pp. 14-20

Author(s):

Tommy Tommy ◽

Amir Mahmud Husein

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Data Science ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbors

Perguruan tinggi merupakan satuan penyelenggara pendidikan tinggi sebagai tingkat lanjut jenjang pendidikan menengah di jalur pendidikan formal. Aspek prestasi belajar merupakan salah satu aspek penilaian keberhasilan perguruan tinggi dalam proses belajar. Dalam makalah ini menyajikan hasil analisis hubungan antara pembelajaran dengan prestasi mahasiswa dimana tahapan yang dilakukan menggunakan pendetakan data science. Berdasarkan Analisis data terdapat tiga indikator penting dalam penilaian prestasi belajar yaitu pedagogi, profesional dan kepribadian. Ketiga fitur digunakan sebagai variabel dependen untuk memprediksi prestasi belajar dimana algoritma DecisionTree menghasilkan akurasi lebih baik dari pada model k-nearest neighbors (KNN), Logistic Regression, Support Vector Machine, Naive Bayes dan dengan tingkat akurasi 68%, kemudian KNN dengan akurasi 66% dan lainnya sebesar 55% pada masing-masing algoritma yang diusulkan.

Download Full-text

Movie Success Prediction Using Naïve Bayes, Logistic Regression and Support Vector Machine

10.1109/icrito51393.2021.9596138 ◽

2021 ◽

Author(s):

Rachaell Nihalaani ◽

Apoorva Shete ◽

Darakshan Khan

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Naive Bayes ◽

Naïve Bayes ◽

Success Prediction ◽

Support Vector ◽

Movie Success

Download Full-text

Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining

Prosiding Seminar Nasional Teknoka ◽

10.22236/teknoka.v5i.391 ◽

2020 ◽

Vol 5 ◽

pp. 19-24

Author(s):

Dyah Retno Utari ◽

Arief Wibowo

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Majority Voting ◽

Support Vector ◽

F Measure

Asuransi kendaraan bermotor merupakan jenis usaha pertanggungan terhadap kerugian atau risiko kerusakan yang dapat timbul dari berbagai macam potensi kejadian yang menimpa kendaraan. Persaingan dalam bisnis asuransi khususnya untuk kendaraan bermotor menuntut inovasi dan strategi agar keberlangsungan bisnis tetap terjamin. Salah satu upaya yang dapat dilakukan perusahaan adalah memprediksi status keberlanjutan polis asuransi kendaraan dengan menganalisis data-data profil dan transaksi nasabah. Prediksi terhadap keputusan pemegang polis menjadi sangat penting bagi perusahaan, karena dapat menentukan strategi pemasaran yang mempengaruhi keputusan pelanggan untuk pembaharuan polis asuransi. Penelitian ini telah mengusulkan suatu model prediksi status keberlanjutan polis asuransi kendaraan dengan teknik pemilihan mayoritas dari hasil klasifikasi menggunakan algoritma- algoritma data mining seperti Naive Bayes, Support Vector Machine dan Decision Tree. Hasil pengujian menggunakan confusion matrix menunjukkan nilai akurasi terbaik diperoleh sebesar 93,57%, apapun untuk nilai precision mencapai 97,20%, dan nilai recall sebesar 95,20% serta nilai F-Measure sebesar 95,30%. Nilai evaluasi model terbaik dihasilkan menggunakan pendekatan pemilihan mayoritas (majority voting), mengungguli kinerja model prediksi berbasis pengklasifikasi tunggal.

Download Full-text

KOMPARASI METODE KLASIFIKASI PADA ANALISIS SENTIMEN USAHA WARALABA BERDASARKAN DATA TWITTER

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.752 ◽

2019 ◽

Vol 15 (2) ◽

pp. 267-274

Author(s):

Tati Mardiana ◽

Hafiz Syahreva ◽

Tuslaela Tuslaela

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor

Saat ini usaha waralaba di Indonesia memiliki daya tarik yang relatif tinggi. Namun, para pelaku usaha banyak juga yang mengalami kegagalan. Bagi seseorang yang ingin memulai usaha perlu mempertimbangkan sentimen masyarakat terhadap usaha waralaba. Meskipun demikian, tidak mudah untuk melakukan analisis sentimen karena banyaknya jumlah percakapan di Twitter terkait usaha waralaba dan tidak terstruktur. Tujuan penelitian ini adalah melakukan komparasi akurasi metode Neural Network, K-Nearest Neighbor, Naïve Bayes, Support Vector Machine, dan Decision Tree dalam mengekstraksi atribut pada dokumen atau teks yang berisi komentar untuk mengetahui ekspresi didalamnya dan mengklasifikasikan menjadi komentar positif dan negatif. Penelitian ini menggunakan data realtime dari tweets pada Twitter. Selanjutnya mengolah data tersebut dengan terlebih dulu membersihkannya dari noise dengan menggunakan Phyton. Hasil pengujian dengan confusion matrix diperoleh nilai akurasi Neural Network sebesar 83%, K-Nearest Neighbor sebesar 52%, Support Vector Machine sebesar 83%, dan Decision Tree sebesar 81%. Penelitian ini menunjukkan metode Support Vector Machine dan Neural Network paling baik untuk mengklasifikasikan komentar positif dan negatif terkait usaha waralaba.

Download Full-text

Cavity auto-detection using machine learning algorithms: Logistic regression, support vector machine, and naïve Bayes

10.1190/iceg2019-066.1 ◽

2020 ◽

Author(s):

Hakim Saibi* ◽

Abdelkader Nasreddine Belkacem ◽

Mohamed Amrouche

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector

Download Full-text

A Novel Machine Learning Algorithm Predicts Dementia With Lewy Bodies Versus Parkinson’s Disease Dementia Based on Clinical and Neuropsychological Scores

Journal of Geriatric Psychiatry and Neurology ◽

10.1177/0891988721993556 ◽

2021 ◽

pp. 089198872199355

Author(s):

Anastasia Bougea ◽

Efthymia Efthymiopoulou ◽

Ioanna Spanou ◽

Panagiotis Zikos

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Neuropsychological Tests ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Classification Model ◽

Support Vector ◽

Ensemble Model ◽

Machine Learning Algorithm

Objective: Our aim was to develop a machine learning algorithm based only on non-invasively clinic collectable predictors, for the accurate diagnosis of these disorders. Methods: This is an ongoing prospective cohort study ( ClinicalTrials.gov identifier NCT number NCT04448340) of 78 PDD and 62 DLB subjects whose diagnostic follow-up is available for at least 3 years after the baseline assessment. We used predictors such as clinico-demographic characteristics, 6 neuropsychological tests (mini mental, PD Cognitive Rating Scale, Brief Visuospatial Memory test, Symbol digit written, Wechsler adult intelligence scale, trail making A and B). We investigated logistic regression, K-Nearest Neighbors (K-NNs) Support Vector Machine (SVM), Naïve Bayes classifier, and Ensemble Model for their ability to predict successfully PDD or DLB diagnosis. Results: The K-NN classification model had an accuracy 91.2% of overall cases based on 15 best clinical and cognitive scores achieving 96.42% sensitivity and 81% specificity on discriminating between DLB and PDD. The binomial logistic regression classification model achieved an accuracy of 87.5% based on 15 best features, showing 93.93% sensitivity and 87% specificity. The SVM classification model had an accuracy 84.6% of overall cases based on 15 best features achieving 90.62% sensitivity and 78.58% specificity. A model created on Naïve Bayes classification had 82.05% accuracy, 93.10% sensitivity and 74.41% specificity. Finally, an Ensemble model, synthesized by the individual ones, achieved 89.74% accuracy, 93.75% sensitivity and 85.73% specificity. Conclusion: Machine learning method predicted with high accuracy, sensitivity and specificity PDD or DLB diagnosis based on non-invasively and easily in-the-clinic and neuropsychological tests.

Download Full-text

Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17186449 ◽

2020 ◽

Vol 17 (18) ◽

pp. 6449

Author(s):

Parastoo Golpour ◽

Majid Ghayour-Mobarhan ◽

Azadeh Saki ◽

Habibollah Esmaily ◽

Ali Taghipour ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Support Vector Machine ◽

Logistic Regression ◽

Coronary Angiography ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Support Vector ◽

Bayes Model

(1) Background: Coronary angiography is considered to be the most reliable method for the diagnosis of cardiovascular disease. However, angiography is an invasive procedure that carries a risk of complications; hence, it would be preferable for an appropriate method to be applied to determine the necessity for angiography. The objective of this study was to compare support vector machine, naïve Bayes and logistic regressions to determine the diagnostic factors that can predict the need for coronary angiography. These models are machine learning algorithms. Machine learning is considered to be a branch of artificial intelligence. Its aims are to design and develop algorithms that allow computers to improve their performance on data analysis and decision making. The process involves the analysis of past experiences to find practical and helpful regularities and patterns, which may also be overlooked by a human. (2) Materials and Methods: This cross-sectional study was performed on 1187 candidates for angiography referred to Ghaem Hospital, Mashhad, Iran from 2011 to 2012. A logistic regression, naive Bayes and support vector machine were applied to determine whether they could predict the results of angiography. Afterwards, the sensitivity, specificity, positive and negative predictive values, AUC (area under the curve) and accuracy of all three models were computed in order to compare them. All analyses were performed using R 3.4.3 software (R Core Team; Auckland, New Zealand) with the help of other software packages including receiver operating characteristic (ROC), caret, e1071 and rminer. (3) Results: The area under the curve for logistic regression, naïve Bayes and support vector machine were similar—0.76, 0.74 and 0.75, respectively. Thus, in terms of the model parsimony and simplicity of application, the naïve Bayes model with three variables had the best performance in comparison with the logistic regression model with seven variables and support vector machine with six variables. (4) Conclusions: Gender, age and fasting blood glucose (FBG) were found to be the most important factors to predict the result of coronary angiography. The naïve Bayes model performed well using these three variables alone, and they are considered important variables for the other two models as well. According to an acceptable prediction of the models, they can be used as pragmatic, cost-effective and valuable methods that support physicians in decision making.

Download Full-text

KOMPARASI ALGORITMA NAIVE BAYES DAN SUPPORT VECTOR MACHINE UNTUK ANALISA SENTIMEN REVIEW FILM

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.918 ◽

2018 ◽

Vol 14 (2) ◽

pp. 175

Author(s):

Elly Indrayuni

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Cross Validation ◽

Opinion Mining ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Support Vector ◽

Vector Machines ◽

Fold Cross Validation

Film merupakan subjek yang diminati oleh sejumlah besar orang diantara komunitas jaringan sosial yang memiliki perbedaan signifikan dalam pendapat atau sentimen mereka. Analisa sentimen atau opinion mining merupakan salah satu solusi mengatasi masalah untuk mengelompokan opini atau review menjadi opini positif atau negatif secara otomatis. Teknik yang digunakan dalam penelitian ini adalah Naive Bayes dan Support Vector Machines (SVM). Naive Bayes memiliki kelebihan yaitu sederhana, cepat dan memiliki akurasi yang tinggi. Sedangkan SVM mampu mengidentifikasi hyperplane terpisah yang memaksimalkan margin antara dua kelas yang berbeda. Hasil klasifikasi sentimen pada penelitian ini terdiri dari dua label class, yaitu positif dan negatif. Nilai akurasi yang dihasilkan akan menjadi tolak ukur untuk mencari model pengujian terbaik untuk kasus klasifikasi sentimen. Evaluasi dilakukan menggunakan 10 fold cross validation. Pengukuran akurasi diukur dengan confusion matrix dan kurva ROC. Hasil penelitian menunjukkan nilai akurasi untuk algoritma Naive Bayes sebesar 84.50%. Sedangkan nilai akurasi algoritma Support Vector Machine (SVM) lebih besar dari Naive Bayes yaitu sebesar 90.00%.

Download Full-text