Pada penelitian ini kami mengimplementasikan algoritma klasifikasi untuk memberikan rekomendasi kepada mahasiswa keminatan apa yang lebih cocok diambil berdasarkan nilai-nilai mata kuliah prasyarat di semester-semester sebelumnya. Diharapkan dengan adanya rekomendasi ini semakin jelas pembatas antara disiplin ilmu yang ada pada Program Studi Sistem Informasi Universitas Brawijaya dimana terdapat 3 jenis jalur keminatan mata kuliah pilihan yaitu <em>Database</em>, Logika &amp; pemrograman dan Manajemen SI/TI. Data set yang terdiri dari data <em>training</em> dan data testing merupakan data akademik dari mahasiswa angkatan 2015 yang sudah mengambil mata kuliah pilihan, data target dari penelitian ini adalah data akademik mahasiswa angkatan 2016. Algoritma klasifikasi yang digunakan adalah <em>Rule Induction, CHAID, Random Forest, ID3, </em> dan <em>Naive Bayes</em>. Komposisi dari data <em>training</em> dan <em>testing</em> diubah-ubah untuk mengetahui pengaruh perubahan komposisi tersebut. Kelima algoritma tersebut diuji sebanyak 5 kali. Dari seluruh hasil pengujian didapatkan rata-rata akurasi dari kelima metode yang diusulkan berturut-turut adalah 66,48%, 67,49%, 80,62%, 86,90% dan 77,68%. Hasil tersebut menunjukkan bahwa algoritma dengan rata-rata akurasi tertinggi dimiliki oleh algoritma <em>ID3 </em>dikarenakan algoritmanya yang fleksibel dan dapat lebih akurat untuk menguji data yang digunakan<em>.</em>

Muhammad Sony Maulana ◽  
Raja Sabarudin ◽  
Wahyu Nugraha

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku  dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

T R Stella Mary ◽  
Shoney Sebastian

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>

T R Stella Mary ◽  
Shoney Sebastian

Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.

As huge amount of data accumulating currently, Challenges to draw out the required amount of data from available information is needed. Machine learning contributes to various fields. The fast-growing population caused the evolution of a wide range of diseases. This intern resulted in the need for the machine learning model that uses the patient's datasets. From different sources of datasets analysis, cancer is the most hazardous disease, it may cause the death of the forbearer. The outcome of the conducted surveys states cancer can be nearly cured in the initial stages and it may also cause the death of an affected person in later stages. One of the major types of cancer is lung cancer. It highly depends on the past data which requires detection in early stages. The recommended work is based on the machine learning algorithm for grouping the individual details into categories to predict whether they are going to expose to cancer in the early stage itself. Random forest algorithm is implemented, it results in more efficiency of 97% compare to KNN and Naive Bayes. Further, the KNN algorithm doesn't learn anything from training data but uses it for classification. Naive Bayes results in the inaccuracy of prediction. The proposed system is for predicting the chances of lung cancer by displaying three levels namely low, medium, and high. Thus, mortality rates can be reduced significantly.

Panny Agustia Rahayuningsih

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).

Breast Cancer is the most often identified cancer among women and a major reason for the increased mortality rate among women. As the diagnosis of this disease manually takes long hours and the lesser availability of systems, there is a need to develop the automatic diagnosis system for early detection of cancer. The advanced engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. Data mining techniques contribute a lot to the development of such a system, Classification, and data mining methods are an effective way to classify data. For the classification of benign and malignant tumors, we have used classification techniques of machine learning in which the machine learns from the past data and can predict the category of new input. This study is a relative study on the implementation of models using Support Vector Machine (SVM), and Naïve Bayes on Breast cancer Wisconsin (Original) Data Set. With respect to the results of accuracy, precision, sensitivity, specificity, error rate, and f1 score, the efficiency of each algorithm is measured and compared. Our experiments have shown that SVM is the best for predictive analysis with an accuracy of 99.28% and naïve Bayes with an accuracy of 98.56%. It is inferred from this study that SVM is the well-suited algorithm for prediction.

In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.

Rini Indrayani

Donor darah merupakan proses pengambilan darah dari pendonor yang telah dinyatakan layak, ditinjau dari berbagai faktor. Penyakit yang diderita, usia, berat badan, tekanan darah, kadar hemoglobin, dan interval waktu donor merupakan aspek-aspek yang menjadi pertimbangan saat uji kelayakan. Karena pentingnya uji kelayakan tersebut, berbagai penelitian terkait uji kelayakan pendonor dilakukan menggunakan klasifikasi data mining dengan berbagai metode. Tantangan dari berbagai penelitian yang dilakukan adalah menemukan metode paling tepat dengan nilai akurasi dan presisi yang tinggi. Penelitian ini menggunakan 748 data set donor darah yang diproses menggunakan metode klasifikasi Na

Irvi Oktanisa ◽  
Ahmad Afif Supianto

<p class="Abstrak">Klasifikasi merupakan teknik dalam <em>data mining</em> untuk mengelompokkan data berdasarkan keterikatan data terhadap  data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada <em>dataset Bank Direct Marketing</em>. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada <em>dataset Bank Direct Marketing</em>. Teknik klasifikasi yang digunakan yaitu <em>Support Vector Machine</em>, <em>AdaBoost</em>, <em>Naïve Bayes</em>, <em>Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent</em>, dan <em>CN2 Rule</em>. Proses klasifikasi diawali dengan <em>preprocessing</em> data untuk melakukan penghilangan <em>missing value</em> dan pemilihan fitur pada <em>dataset</em>. Pada tahap evaluasi digunakan teknik <em>10 fold cross validation</em>. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model <em>Tree, Constant</em>, <em>Naive Bayes</em>, dan <em>Stochastic Gardient Descent</em>. Kemudian diikuti oleh model <em>Random Forest</em>, <em>K-Nearest Neighbor</em>, <em>CN-2 Rule</em>, <em>AdaBoost</em> dan <em>Support Vector Machine</em>. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini <em>Stochastic Gradient Descent</em> terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada <em>dataset Bank Direct Marketing</em>.</p><p class="Abstrak"><em><strong><br /></strong></em></p><p class="Abstrak"><em><strong>Abstract</strong></em></p>Classification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of  9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

