Implementasi Data Mining untuk Rekomendasi Pengambilan Mata Kuliah Pilihan Mahasiswa Sistem Informasi

Pada penelitian ini kami mengimplementasikan algoritma klasifikasi untuk memberikan rekomendasi kepada mahasiswa keminatan apa yang lebih cocok diambil berdasarkan nilai-nilai mata kuliah prasyarat di semester-semester sebelumnya. Diharapkan dengan adanya rekomendasi ini semakin jelas pembatas antara disiplin ilmu yang ada pada Program Studi Sistem Informasi Universitas Brawijaya dimana terdapat 3 jenis jalur keminatan mata kuliah pilihan yaitu Database, Logika & pemrograman dan Manajemen SI/TI. Data set yang terdiri dari data training dan data testing merupakan data akademik dari mahasiswa angkatan 2015 yang sudah mengambil mata kuliah pilihan, data target dari penelitian ini adalah data akademik mahasiswa angkatan 2016. Algoritma klasifikasi yang digunakan adalah Rule Induction, CHAID, Random Forest, ID3, dan Naive Bayes. Komposisi dari data training dan testing diubah-ubah untuk mengetahui pengaruh perubahan komposisi tersebut. Kelima algoritma tersebut diuji sebanyak 5 kali. Dari seluruh hasil pengujian didapatkan rata-rata akurasi dari kelima metode yang diusulkan berturut-turut adalah 66,48%, 67,49%, 80,62%, 86,90% dan 77,68%. Hasil tersebut menunjukkan bahwa algoritma dengan rata-rata akurasi tertinggi dimiliki oleh algoritma ID3 dikarenakan algoritmanya yang fleksibel dan dapat lebih akurat untuk menguji data yang digunakan.

Download Full-text

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Jurnal Sistem dan Teknologi Informasi (JustIN) ◽

10.26418/justin.v7i3.33316 ◽

2019 ◽

Vol 7 (3) ◽

pp. 202

Author(s):

Muhammad Sony Maulana ◽

Raja Sabarudin ◽

Wahyu Nugraha

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Rule Induction ◽

T Test

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

Download Full-text

Predicting heart ailment in patients with varying number of features using data mining techniques

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp56-62 ◽

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

The Comparison of Data Mining Methods Using C4.5 Algorithm and Naive Bayes in Predicting Heart Disease

Tech-E ◽

10.31253/te.v4i2.543 ◽

2021 ◽

Vol 4 (2) ◽

pp. 44

Author(s):

Rino Rino

Keyword(s):

Data Mining ◽

Heart Disease ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set ◽

A Value ◽

C4.5 Algorithm ◽

Calculation Results ◽

Mining Methods ◽

Bayes Algorithm

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.

Download Full-text

Prediction of Lung Cancer Risk using Random Forest Algorithm Based on Kaggle Data Set

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f7879.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1623-1630

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Random Forest ◽

Naive Bayes ◽

Early Stage ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Data Set ◽

Wide Range

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Download Full-text

Komparasi Algoritma Klasifikasi Data Mining untuk Memprediksi Tingkat Kematian Dini Kanker dengan Dataset Early Death Cancer

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v4i2.1008 ◽

2019 ◽

Vol 4 (2) ◽

pp. 63

Author(s):

Panny Agustia Rahayuningsih

Keyword(s):

Neural Network ◽

Data Mining ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Early Death ◽

Naïve Bayes ◽

T Test ◽

Fold Cross Validation

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).

Download Full-text

Prediction and Classification into Benign and Malignant using the Clinical Testing Features

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j7411.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 55-61

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Image Classification ◽

Naive Bayes ◽

Malignant Tumors ◽

Naïve Bayes ◽

Support Vector ◽

Natural Image ◽

Data Set ◽

Classification Techniques

Breast Cancer is the most often identified cancer among women and a major reason for the increased mortality rate among women. As the diagnosis of this disease manually takes long hours and the lesser availability of systems, there is a need to develop the automatic diagnosis system for early detection of cancer. The advanced engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. Data mining techniques contribute a lot to the development of such a system, Classification, and data mining methods are an effective way to classify data. For the classification of benign and malignant tumors, we have used classification techniques of machine learning in which the machine learns from the past data and can predict the category of new input. This study is a relative study on the implementation of models using Support Vector Machine (SVM), and Naïve Bayes on Breast cancer Wisconsin (Original) Data Set. With respect to the results of accuracy, precision, sensitivity, specificity, error rate, and f1 score, the efficiency of each algorithm is measured and compared. Our experiments have shown that SVM is the best for predictive analysis with an accuracy of 99.28% and naïve Bayes with an accuracy of 98.56%. It is inferred from this study that SVM is the well-suited algorithm for prediction.

Download Full-text

An Ingenious Methodology for the Collation of Existing Algorithms for the Prognosis of Student Performance

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2874.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1749-1752

Keyword(s):

Data Mining ◽

Academic Performance ◽

Random Forest ◽

Student Performance ◽

Naive Bayes ◽

Research Work ◽

Large Data ◽

Naïve Bayes ◽

Impact Factors ◽

Data Mining Technique

In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.

Download Full-text

ANALISA PERBANDINGAN ALGORITME NAIVE BAYES DAN DECISION TREE PADA KLASIFIKASI DATA TRANSFUSI DARAH

Jurnal Ilmiah Teknologi Infomasi Terapan ◽

10.33197/jitter.vol5.iss1.2018.251 ◽

2019 ◽

Vol 5 (1) ◽

pp. 38-44

Author(s):

Rini Indrayani

Keyword(s):

Data Mining ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Set

Donor darah merupakan proses pengambilan darah dari pendonor yang telah dinyatakan layak, ditinjau dari berbagai faktor. Penyakit yang diderita, usia, berat badan, tekanan darah, kadar hemoglobin, dan interval waktu donor merupakan aspek-aspek yang menjadi pertimbangan saat uji kelayakan. Karena pentingnya uji kelayakan tersebut, berbagai penelitian terkait uji kelayakan pendonor dilakukan menggunakan klasifikasi data mining dengan berbagai metode. Tantangan dari berbagai penelitian yang dilakukan adalah menemukan metode paling tepat dengan nilai akurasi dan presisi yang tinggi. Penelitian ini menggunakan 748 data set donor darah yang diproses menggunakan metode klasifikasi Na

Download Full-text

Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855958 ◽

2018 ◽

Vol 5 (5) ◽

pp. 567 ◽

Cited By ~ 2

Author(s):

Irvi Oktanisa ◽

Ahmad Afif Supianto

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Gradient Descent ◽

Naive Bayes ◽

Direct Marketing ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector

Klasifikasi merupakan teknik dalam data mining untuk mengelompokkan data berdasarkan keterikatan data terhadap data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada dataset Bank Direct Marketing. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada dataset Bank Direct Marketing. Teknik klasifikasi yang digunakan yaitu Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, dan CN2 Rule. Proses klasifikasi diawali dengan preprocessing data untuk melakukan penghilangan missing value dan pemilihan fitur pada dataset. Pada tahap evaluasi digunakan teknik 10 fold cross validation. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model Tree, Constant, Naive Bayes, dan Stochastic Gardient Descent. Kemudian diikuti oleh model Random Forest, K-Nearest Neighbor, CN-2 Rule, AdaBoost dan Support Vector Machine. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini Stochastic Gradient Descent terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada dataset Bank Direct Marketing. AbstractClassification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of 9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

Download Full-text