Studi Komparasi Algoritma Klasifikasi Mental Workload Berdasarkan Sinyal EEG

Kondisi psikologis dan fisik manusia dapat memengaruhi proses berpikir. Apabila kondisi individu mengalami kelelahan, maka dapat memengaruhi penurunan tingkat produktivitas maupun penurunan proses berpikir yang menyebabkan timbulnya mental workload. Workload yang dimiliki harus seimbang terhadap kemampuan dan keterbatasan yang dimiliki. Mental workload yang berlebih berdampak buruk bagi individu karena menimbulkan penurunan produktivitas kerja. Perangkat khusus yang dapat digunakan untuk mengetahui tingkat mental workload seorang individu adalah Electroencephalogram (EEG). EEG adalah perangkat khusus yang digunakan untuk mengukur sinyal potensi listrik dari otak. Dataset yang digunakan dalam penelitian ini adalah STEW: Simultaneous Task EEG Dataset dengan 45 subjek. Dalam penelitian ini, telah dilakukan studi komparasi algoritma Random Forest, K-Nearest Neighbor (KNN), Multi-Layer Perceptron (MLP), dan Support Vector Machine (SVM) untuk klasifikasi mental workload berdasarkan sinyal EEG. Studi dilakukan untuk menentukan algoritma terbaik dalam klasifikasi dilihat dari segi nilai akurasi dan penggunaan memori saat proses klasifikasi. Dataset telah melalui beberapa tahapan, diantaranya pra-pemrosesan data, ekstraksi fitur, dan proses klasifikasi. Pra-pemrosesan data menerapkan pembagian data menjadi beberapa chunk. Untuk mendapatkan ciri dalam ekstraksi fitur, diterapkan metode Principal Component Analysis (PCA). Pada proses klasifikasi menggunakan pendekatan k-fold cross validation. Hasil studi penelitian ini adalah algoritma terbaik dari sisi akurasi adalah algoritma KNN, algoritma terbaik dari sisi waktu pembuatan model adalah algoritma Random Forest, serta algoritma terbaik dari sisi penggunaan memori adalah algoritma MLP.

Download Full-text

Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection

Traitement du signal ◽

10.18280/ts.370403 ◽

2020 ◽

Vol 37 (4) ◽

pp. 563-569

Author(s):

Dželila Mehanović ◽

Jasmin Kevrić

Keyword(s):

Feature Selection ◽

Random Forest ◽

Cross Validation ◽

Nearest Neighbor ◽

Security Threats ◽

Selection Methods ◽

K Nearest Neighbor ◽

Machine Learning Classifiers ◽

Time To Build ◽

Fold Cross Validation

Security is one of the most actual topics in the online world. Lists of security threats are constantly updated. One of those threats are phishing websites. In this work, we address the problem of phishing websites classification. Three classifiers were used: K-Nearest Neighbor, Decision Tree and Random Forest with the feature selection methods from Weka. Achieved accuracy was 100% and number of features was decreased to seven. Moreover, when we decreased the number of features, we decreased time to build models too. Time for Random Forest was decreased from the initial 2.88s and 3.05s for percentage split and 10-fold cross validation to 0.02s and 0.16s respectively.

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

Bayes Classifier dan Support Vector Machine dalam Klasifikasi Judul Karya Akhir Mahasiswa Program Studi PTIK UNJ

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.3.1.9 ◽

2019 ◽

Vol 3 (1) ◽

pp. 54-62

Author(s):

Razi Aziz Syahputro ◽

Widodo ◽

Hamidillah Ajie

Keyword(s):

Support Vector Machine ◽

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Vector Space Model ◽

Support Vector ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Space Model ◽

Fold Cross Validation

Penelitian ini dilatarbelakangi dengan dibutuhkannya sistem pengklasifikasian untuk memudahkan pihak Jurusan Teknik Elektro khususnya Program Studi PTIK untuk mengklasifikasikan judul skripsi berdasarkan peminatan. Sebelum sistem dibuat diperlukan pertimbangan dari beberapa algoritma klasifikasi yang ada, maka dari itu penelitian ini memilih 3 algoritma dari 10 algoritma terbaik menurut ICDM tahun 2006. Klasifikasi terhadap dokumen teks pendek seperti judul skripsi mahasiswa memiliki kesulitan tersendiri daripada dokumen teks panjang karena semakin sedikit kata semakin sulit diklasifikasi. Sehingga tujuan dari penelitian ini adalah untuk mengetahui algoritma yang paling efektif untuk mengklasifikasi judul skripsi. Penelitian ini terdiri dari beberapa tahap yaitu pengumpulan data, pengelompokan data melalui angket oleh dosen ahli, pre-processing text, pembobotan kata menggunakan vector space model dan tf-idf, evaluasi dengan k-fold cross validation, klasifikasi menggunakan k-nearest neighbor, naïve bayes classifier, dan support vector machine, dan analisis dengan confusion matrix. Percobaan dilakukan dengan menggunakan 266 data judul skripsi mahasiswa PTIK UNJ dari angkatan 2010-2013, dengan data terakhir berasal dari sidang skripsi pada semester 105(semester ganjil 2016/2017). Hasil dari klasifikasi menggunakan algoritma tersebut didapatkan algoritma yang paling efisien yaitu support vector machine dengan akurasi 82% dari 10 kali percobaan.

Download Full-text

An Improved Coronary Heart Disease Predictive System Using Random Forest

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i130253 ◽

2021 ◽

pp. 17-27

Author(s):

Abdulraheem Abdul ◽

Rafiu M. Isiaka ◽

Ronke S. Babatunde ◽

Jumoke F. Ajao

Keyword(s):

Neural Network ◽

Coronary Heart Disease ◽

Feature Selection ◽

Heart Disease ◽

Random Forest ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbor

Aims: This work aim is to develop an enhanced predictive system for Coronary Heart Disease (CHD). Study Design: Synthetic Minority Oversampling Technique and Random Forest. Methodology: The Framingham heart disease dataset was used, which was collected from a study in Framingham, Massachusetts, the data was cleaned, normalized, rebalanced. Classifiers such as random forest, artificial neural network, naïve bayes, logistic regression, k-nearest neighbor and support vector machine were used for classification. Results: Random Forest outperformed other classifiers with an accuracy of 98%, a sensitivity of 99% and a precision of 95.8%. Feature selection was employed for better classification, but no significant improvement was recorded on the performance of the classifier with feature selection. Train test split also performed better that cross validation. Conclusion: Random Forest is recommended for research in Coronary Heart Disease prediction domain.

Download Full-text

A hybrid cost-sensitive ensemble for heart disease prediction

10.21203/rs.2.22946/v1 ◽

2020 ◽

Author(s):

Zhenya Qi ◽

Zuoru Zhang

Keyword(s):

Heart Disease ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Misclassification Cost ◽

Proposed Model ◽

Learning Machine ◽

Fold Cross Validation ◽

Very High

Abstract Heart disease is the primary cause of morbidity and mortality in the world. It includes numerous problems and symptoms. The diagnosis of heart disease is difficult because there are too many factors to analyze. What's more, the misclassification cost could be very high. In this paper, I firstly propose a cost-sensitive ensemble model to improve the accuracy of diagnosis and reduce the misclassification cost. The proposed model contains five heterogeneous classifiers: random forest, logistic regression, support vector machine, extreme learning machine and k-nearest neighbor. Then, experiments are done on three datasets from UCI machine learning repository. The highest classification accuracy of 91.74%, highest G-mean of 90.55%, highest precision of 96.11%, highest recall of 89.61% and lowest misclassification cost of 30.32% are achieved by the proposed model according to ten-fold cross validation. The results demonstrate that the performance of the proposed model is superior to those of previously reported classification techniques.

Download Full-text

Komparasi Kinerja Algoritma Data Mining pada Dataset Konsumsi Alkohol Siswa

Khazanah Informatika Jurnal Ilmu Komputer dan Informatika ◽

10.23917/khif.v4i2.7061 ◽

2018 ◽

Vol 4 (2) ◽

pp. 98

Author(s):

Noviyanti Sagala ◽

Hendrik Tampubolon

Keyword(s):

Data Mining ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Gain Ratio ◽

Feature Correlation ◽

Fold Cross Validation

Data mining melakukan proses ekstraksi pengetahuan yang diperoleh dari sekumpulan data dalam jumlah besar. Penelitian ini bertujuan untuk menerapkan dan melakukan analisis kinerja algoritma data mining untuk memprediksi konsumsi alkohol dan menganalisis faktor-faktor yang terkait pada siswa tingkat menengah. Adapun tahapan yang dilakukan ialah pra-proses data, seleksi fitur, klasifikasi, dan evaluasi model. Pada tahap praproses, beberapa fitur diubah menjadi bentuk yang sesuai untuk memudahkan proses klasifikasi. Selanjutnya, algoritma Gain Ratio dan Feature Correlation-Based Filter (FCBF) digunakan untuk memilih fitur-fitur yang relevan dan penting untuk digunakan dalam tahapan klasifikasi. Decision Tree C5.0, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), dan Naive Bayes (NB) dieksekusi pada kelompok fitur yang terpilih. Akurasi model yang dibangun dievaluasi menggunakan 10-fold Cross-Validation (CV). Hasil penelitian menunjukkan bahwa model klasifikasi yang dibangun menggunakan Naïve Bayes memiliki nilai akurasi tertinggi dengan menggunakan 5 fitur terbaik dari Gain Ratio. Selain itu, penggunaan metode pemilihan fitur mampu meningkatkan performa dari seluruh klasifier secara umum. Pengujian lebih lanjut pada data yang sama maupun berbeda perlu dilakukan untuk mendapatkan gambaran lebih mendalam mengenai kinerja algoritma-algoritma yang digunakan.

Download Full-text

PREDIKSI TINGKAT KESUKSESAN PROMOSI BANK DENGAN ALGORITMA DNN

Jurnal Informatika ◽

10.30873/ji.v21i1.2866 ◽

2021 ◽

Vol 21 (1) ◽

pp. 23-33

Author(s):

Oscar Oscar ◽

Nurlaelatul Maulidah ◽

Annida Purnamawati ◽

Destiana Putri ◽

Hilman F Pardede

Keyword(s):

Neural Network ◽

Random Forest ◽

Success Rate ◽

Deep Neural Network ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Bank Marketing ◽

Fold Cross Validation ◽

Better Than

Telemarketing is one effective way for promoting products. However, it is often difficult to measure the success of telemarketing. Therefore, a way to predict the success rate of telemarketing, and hence strategies could be planned to increase the success rate. In this study, we evaluate several implementations of machine learning for prediction the success of telemarketing. The evaluated methods are Deep Neural Network (DNN), Random Forest, and K-nearest neighbor (K-NN). We validate our experiments using 10-fold cross validation and our experiments show that DNN with 3 hidden layers outperforms other methods. Accuracy of 90% is achieved with the DNN. It is better than Random Forest and KNN that achieve accuracies of algorithm and 88% and 89%.Keywords— Bank Marketing, DNN, KNN, Random Forest.

Download Full-text

Expert cancer model using supervised algorithms with a LASSO selection approach

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2631-2639 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2631

Author(s):

Pronab Ghosh ◽

Asif Karim ◽

Syeda Tanjila Atik ◽

Saima Afrin ◽

Mohd. Saifuzzaman

Keyword(s):

Breast Cancer ◽

Cross Validation ◽

Nearest Neighbor ◽

Disease Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Cancer Disease ◽

Critical Issues ◽

Selection Operator ◽

Fold Cross Validation

One of the most critical issues of the mortality rate in the medical field in current times is breast cancer. Nowadays, a large number of men and women is facing cancer-related deaths due to the lack of early diagnosis systems and proper treatment per year. To tackle the issue, various data mining approaches have been analyzed to build an effective model that helps to identify the different stages of deadly cancers. The study successfully proposes an early cancer disease model based on five different supervised algorithms such as logistic regression (henceforth LR), decision tree (henceforth DT), random forest (henceforth RF), Support vector machine (henceforth SVM), and K-nearest neighbor (henceforth KNN). After an appropriate preprocessing of the dataset, least absolute shrinkage and selection operator (LASSO) was used for feature selection (FS) using a 10-fold cross-validation (CV) approach. Employing LASSO with 10-fold cross-validation has been a novel steps introduced in this research. Afterwards, different performance evaluation metrics were measured to show accurate predictions based on the proposed algorithms. The result indicated top accuracy was received from RF classifier, approximately 99.41% with the integration of LASSO. Finally, a comprehensive comparison was carried out on Wisconsin breast cancer (diagnostic) dataset (WBCD) together with some current works containing all features.

Download Full-text

Quantifying the Influence of Achievement Emotions for Student Learning in MOOCs

Journal of Educational Computing Research ◽

10.1177/0735633120967318 ◽

2020 ◽

pp. 073563312096731

Author(s):

Bowen Liu ◽

Wanli Xing ◽

Yifang Zeng ◽

Yonghe Wu

Keyword(s):

Random Forest ◽

Nearest Neighbor ◽

Online Courses ◽

Learning Performance ◽

Support Vector ◽

K Nearest Neighbor ◽

Achievement Emotions ◽

Integrative Framework ◽

Emotional Interaction ◽

Performance Results

Massive Open Online Courses (MOOCs) have become a popular tool for worldwide learners. However, a lack of emotional interaction and support is an important reason for learners to abandon their learning and eventually results in poor learning performance. This study applied an integrative framework of achievement emotions to uncover their holistic influence on students’ learning by analyzing more than 400,000 forum posts from 13 MOOCs. Six machine-learning models were first built to automatically identify achievement emotions, including K-Nearest Neighbor, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machines. Results showed that Random Forest performed the best with a kappa of 0.83 and an ROC_AUC of 0.97. Then, multilevel modeling with the “Stepwise Build-up” strategy was used to quantify the effect of achievement emotions on students’ academic performance. Results showed that different achievement emotions influenced students’ learning differently. These findings allow MOOC platforms and instructors to provide relevant emotional feedback to students automatically or manually, thereby improving their learning in MOOCs.

Download Full-text

Feature and Decision Level Fusion in Children Multimodal Biometrics

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6396.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 2522-2527

Keyword(s):

Nearest Neighbor ◽

Principal Component ◽

Identification Accuracy ◽

Support Vector ◽

Biometric System ◽

K Nearest Neighbor ◽

Decision Level ◽

Multimodal Biometric System ◽

Decision Level Fusion ◽

Level Fusion

In this paper, we design method for recognition of fingerprint and IRIS using feature level fusion and decision level fusion in Children multimodal biometric system. Initially, Histogram of Gradients (HOG), Gabour and Maximum filter response are extracted from both the domains of fingerprint and IRIS and considered for identification accuracy. The combination of feature vector of all the possible features is recommended by biometrics traits of fusion. For fusion vector the Principal Component Analysis (PCA) is used to select features. The reduced features are fed into fusion classifier of K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Navie Bayes(NB). For children multimodal biometric system the suitable combination of features and fusion classifiers is identified. The experimentation conducted on children’s fingerprint and IRIS database and results reveal that fusion combination outperforms individual. In addition the proposed model advances the unimodal biometrics system.

Download Full-text