PREDIKSI INDEKS PRESTASI MAHASISWA YANG BERKULIAH SAMBIL BEKERJA DI UNIVERSITAS ADVENT INDONESIA DENGAN MENGGUNAKAN METODE DECISION TREE C4.5 DAN SMOTE

Yusran Timur Samuel; Chrystle Beatrix Allbright Nahuway

doi:10.36342/teika.v10i01.2281

PREDIKSI INDEKS PRESTASI MAHASISWA YANG BERKULIAH SAMBIL BEKERJA DI UNIVERSITAS ADVENT INDONESIA DENGAN MENGGUNAKAN METODE DECISION TREE C4.5 DAN SMOTE

TeIKa ◽

10.36342/teika.v10i01.2281 ◽

2020 ◽

Vol 10 (01) ◽

pp. 69-77

Author(s):

Yusran Timur Samuel ◽

Chrystle Beatrix Allbright Nahuway

Keyword(s):

Decision Tree ◽

Cross Validation ◽

Confusion Matrix ◽

Split Test

Pendidikan tinggi adalah salah satu cara agar mendapat pekerjaan lebih mudah, hal tersebut disebabkan karena melalui pendidikan individu tersebut mampu meningkatkan kuliatas sumber daya manusia pada zaman ini. Namun biaya pendidikan yang tinggi sangat mahal sehingga individu yang ingin berkuliah harus juga bekerja disaat yang bersamaan, maka penelitian ini bertujuan untuk memprediksi indeks prestasi mahasiswa yang berkuliah sambil bekerja di Universitas Advent Indonesia. Dari hasil penelitian ini terdapat 8 atribut yang berpengaruh dalam memprediksi indek prestasi mahasiswa di Universitas Advent Indonesia yaitu Departemen Pekerjaan, Jam Kerja, Jurusan, Jenis Kelamin, Tempat Tinggal, Usia, Jumlah SKS dan Indeks Prestasi. Metode yang digunakan dalam penelitian ini adalah Decision Tree C4.5 yang diimplementasikan pada program WEKA dengan algoritma J48. Penelitian ini juga menggunakan algoritma SMOTE (Synthetic Minority Oversampling Technique) untuk menyeimbangkan jumlah data pada kelas minor. Root teratas dari penelitian ini adalah Jenis Kelamin yang mempengaruhi indeks prestasi mahasiswa di Universitas Advent Indonesia. Algoritma SMOTE pada penelitian ini berguna untuk membantu menaikan hasil dari penelitian ini sebesar 7-8% bisa dilihat dari hasil akurasi pengujian cross validation 10 folds adalah 63.6672%, kemudian rata-rata hasil dari precision dan recall adalah 0.621 dan 0.637. Sementara untuk hasil akurasi dari split test 70:30 adalah 62.7955%, kemudian rata-rata hasil dari precision dan recall adalah 0.621 dan 0.628. Jika dibandingkan dengan penggunaan algoritma decision tree C4.5 saja maka, akurasi dari pengujian cross validation 10 folds adalah 55.5044%, dengan rata-rata hasil dari precision dan recall adalah 0.545 dan 0.555. Sementara hasil akurasi dari split test 70:30 adalah 55.2995% dengan rata-rata hasil dari precision dan recall adalah 0.554 dan 0.553. Hasil analisa menggunakan confusion matrix serta kurva ROC dengan hasil dari 0.688 menjadi 0.756, yang berada dalam rentang 0.70 – 0.80 yang masuk kedalam tingkat diagnose fair classification. Dapat disimpulkan bawa terdapat pengaruh berkuliah sambil bekerja yang cukup kuat terhadap indeks prestasi mahasiswa. Dengan urutan atribut dari yang paling atas adalah Jenis Kelamin, Jumlah SKS, Jurusan, Umur, Departemen Kerja, Jam Kerja dan Tempat Tinggal.

Download Full-text

Metode Fuzzy ID3 Untuk Klasifikasi Status Preeklamsi Ibu Hamil

Teknika ◽

10.34148/teknika.v9i1.270 ◽

2020 ◽

Vol 9 (1) ◽

pp. 74-80

Author(s):

Yeni Kustiyahningsih ◽

Mula’ab ◽

Nur Hasanah

Keyword(s):

Decision Tree ◽

Cross Validation ◽

Information Gain ◽

Confusion Matrix ◽

Fuzzy Decision ◽

Fuzzy Decision Tree ◽

Fold Cross Validation

Angka Kematian Ibu (AKI) di Indonesia meningkat terus mulai tahun 2007 (SDKI 2012). Salah satu penyebab utamanya adalah penyakit hipertensi. Istilah hipertensi pada ibu hamil disebut dengan preeklamasi. Metode Fuzzy Decision Tree Iterative Dichotomiser 3 (ID3) digunakan untuk mengelompokkan penyakit preeklamsi menjadi 3 kelas yaitu normal, waspada preeklamsi ringan, dan bahaya preeklamsi berat. Pada penelitian ini terdapat 6 variabel yang digunakan yaitu tekanan darah sistolik, tekanan darah diastolik, usia ibu, usia kehamilan, protein urine, dan odema. Tujuan dari klasifikasi adalah membantu tenaga medis dalam memberikan tindakan kepada pasien (ibu hamil) agar diagnosisnya tepat sasaran dan lebih cepat dalam membantu pengambilan keputusan. Tahapan metode ID3 adalah melakukan inisialisasi nilai atribut fuzzy, perhitungan entropy, dan mencari nilai information gain. Uji coba sistem menggunakan algoritma k-fold cross validation serta menghitung akurasi menggunakan confusion matrix. Berdasarkan hasil uji coba, k-fold 5 mempunyai akurasi terbesar yaitu 98,44%, presisi terbesar 96,66%, dan recall terbesar 97,61%.

Download Full-text

Decision Tree Application to Classification Problems with Boosting Algorithm

Electronics ◽

10.3390/electronics10161903 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1903

Author(s):

Long Zhao ◽

Sanghyuk Lee ◽

Seon-Phil Jeong

Keyword(s):

Decision Tree ◽

Cross Validation ◽

Confusion Matrix ◽

Model Fitting ◽

Decision Tree Model ◽

Classification And Regression Tree ◽

Tree Model ◽

Data Set ◽

Boosting Algorithm ◽

Cart Algorithm

A personal credit evaluation algorithm is proposed by the design of a decision tree with a boosting algorithm, and the classification is carried out. By comparison with the conventional decision tree algorithm, it is shown that the boosting algorithm acts to speed up the processing time. The Classification and Regression Tree (CART) algorithm with the boosting algorithm showed 90.95% accuracy, slightly higher than without boosting, 90.31%. To avoid overfitting of the model on the training set due to unreasonable data set division, we consider cross-validation and illustrate the results with simulation; hypermeters of the model have been applied and the model fitting effect is verified. The proposed decision tree model is fitted optimally with the help of a confusion matrix. In this paper, relevant evaluation indicators are also introduced to evaluate the performance of the proposed model. For the comparison with the conventional methods, accuracy rate, error rate, precision, recall, etc. are also illustrated; we comprehensively evaluate the model performance based on the model accuracy after the 10-fold cross-validation. The results show that the boosting algorithm improves the performance of the model in accuracy and precision when CART is applied, but the model fitting time takes much longer, around 2 min. With the obtained result, it is verified that the performance of the decision tree model is improved under the boosting algorithm. At the same time, we test the performance of the proposed verification model with model fitting, and it could be applied to the prediction model for customers’ decisions on subscription to the fixed deposit business.

Download Full-text

PERBANDINGAN TINGKAT AKURASI METODE KNN DAN DECISION TREE DALAM MEMPREDIKSI LAMA STUDI MAHASISWA

Jurnal Ilmiah Binary STMIK Bina Nusantara Jaya Lubuklinggau ◽

10.52303/jb.v3i1.40 ◽

2021 ◽

Vol 3 (1) ◽

pp. 6-14

Author(s):

Endang Etriyanti

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

K Nearest Neighbor ◽

Fold Cross Validation

Kualitas lulusan dari sebuah Perguruan Tinggi salah satunya dapat dilihat dari lama studi mahasiswa. Selain itu lama studi mahasiswa menggambarkan tingkat capaian mahasiswa dalam pendidikannya. Lama studi juga sangat berpengaruh pada kualitas program studi karena lama studi mahasiswa merupakan salah satu kriteria penilaian akreditasi. Seringkali masalah yang dihadapi oleh suatu Perguruan Tinggi adalah banyaknya mahasiswa yang menyelesaikan pendidikannya lebih dari jangka waktu yang ditetapkan. STMIK Bina Nusantara Jaya Lubuklinggau juga mengalami hal tersebut. Untuk mengantisipasi hal tersebut perlu adanya prediksi lama studi mahasiswa karena lama studi mahasiswa menjadi salah satu hal yang penting yang perlu diperhatikan bagian program studi dalam suatu Perguruan Tinggi. Penelitian ini berkontribusi secara teoretis dalam implementasi data mining untuk memprediksi lama studi mahasiswa.Penelitian ini menerapkan preprocessing data untuk memperoleh data dengan kualitas baik sebelum dilakukan proses mining menggunakan metode K-Nearest Neighbor dan Decision Tree pada Tools RapidMiner, kedua metode divalidasi menggunakan K-Fold Cross Validation (dengan 10 kali iterasi/pengulangan) dan Confusion Matrix digunakan untuk memvalidasi nilai akurasi hasil prediksi. Nilai akurasi yang paling tinggi dari hasil penerapan kedua metode akan direkomendasikan untuk menyelesaikan masalah prediksi lama studi mahasiswa. Dari hasil penelitian diperoleh nilai akurasi metode Decision Tree (60,38%) lebih baik jika dibandingkan dengan nilai akurasi metode K-Nearest Neighbor (53,08%).

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text

COMPARISON OF NAIVE BAYES ALGORITHM AND C.45 ALGORITHM IN CLASSIFICATION OF POOR COMMUNITIES RECEIVING NON CASH FOOD ASSISTANCE IN WANASARI VILLAGE KARAWANG REGENCY

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v17i1.1191 ◽

2020 ◽

Vol 17 (1) ◽

pp. 37-42

Author(s):

Yuris Alkhalifi ◽

Ainun Zumarniansyah ◽

Rian Ardianto ◽

Nila Hardi ◽

Annisa Elfina Augustia

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Total Sample ◽

Naïve Bayes ◽

Food Assistance ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.

Download Full-text

Evaluasi Klasifikasi Penerima Bidikmisi Menggunakan Algoritma Interative Dichotomiser 3 (ID3) (Studi Kasus Institut Agama Islam Negeri Samarinda)

Sains, Aplikasi, Komputasi dan Teknologi Informasi ◽

10.30872/jsakti.v2i1.2615 ◽

2020 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Jainuddin Jainuddin ◽

Islamiyah Islamiyah ◽

Gubtha Mahendra Putra ◽

Haviluddin Haviluddin ◽

Vina Zahrotun Kamila

Keyword(s):

Data Mining ◽

Decision Tree ◽

Error Rate ◽

Confusion Matrix

Teknik klasifikasi data mining yang cukup populer adalah Decision Tree diantaranya menggunakan algoritma Interative Dichotomiser 3 (ID3). Klasifikasi didapatkan dari pohon keputusan yang terbentuk melalui algoritma Interative Dichotomiser 3 (ID3) yang akan diukur tingkat akurasi dan error rate algortima dalam menentukan klasifikasi. Hal ini dapat dilakukan dengan cara membentuk model pohon keputusan pada mesin learning RapidMinner menggunakan data training dan evaluasi membandingkan data nyata dengan data testing klasifikasi untuk mengukur akurasi algoritma. Tujuan dalam penelitian ini adalah untuk menghasilkan informasi klasifikasi kelayakan penerima bidikmisi menggunakan algoritma Interative Dichotomiser 3 (ID3) di Institut Agama Islam Negeri (IAIN) Samarinda dan untuk mengetahui akurasi algoritma yang digunakan. Variabel penelitian terdiri pekerjaan orang tua, jumlah penghasilan orang tua, jumlah anggota keluarga, status kepemilikan rumah, jumlah pengeluaran keluarga, dan status kepemilikan SKTM/KIP. Berdasarkan hasil analisis dengan mengukur kinerja algoritma menggunakan metode confusion matrix, dengan menghasilkan akurasi 98.3% dan error rate 1.7% dalam menentukan klasifikasi kelayakan penerima Bidikmisi di Institut Agama Islam Negeri (IAIN) Samarinda.

Download Full-text

Multi-Class Taxonomy of Well Integrity Anomalies Applying Inductive Learning Algorithms: Analytical Approach for Artificial-Lift Wells

10.2118/206129-ms ◽

2021 ◽

Author(s):

Mostafa Sa'eed Yakoot ◽

Adel Mohamed Salem Ragab ◽

Omar Mahmoud

Keyword(s):

Decision Tree ◽

Confusion Matrix ◽

Learning Algorithms ◽

Oil And Gas Industry ◽

Classification Model ◽

Gradient Boosting ◽

Support Vector ◽

Risk Category ◽

Well Integrity ◽

Extreme Gradient Boosting

Abstract Well integrity has become a crucial field with increased focus and being published intensively in industry researches. It is important to maintain the integrity of the individual well to ensure that wells operate as expected for their designated life (or higher) with all risks kept as low as reasonably practicable, or as specified. Machine learning (ML) and artificial intelligence (AI) models are used intensively in oil and gas industry nowadays. ML concept is based on powerful algorithms and robust database. Developing an efficient classification model for well integrity (WI) anomalies is now feasible because of having enormous number of well failures and well barrier integrity tests, and analyses in the database. Circa 9000 dataset points were collected from WI tests performed for 800 wells in Gulf of Suez, Egypt for almost 10 years. Moreover, those data have been quality-controlled and quality-assured by experienced engineers. The data contain different forms of WI failures. The contributing parameter set includes a total of 23 barrier elements. Data were structured and fed into 11 different ML algorithms to build an automated systematic tool for calculating imposed risk category of any well. Comparison analysis for the deployed models was performed to infer the best predictive model that can be relied on. 11 models include both supervised and ensemble learning algorithms such as random forest, support vector machine (SVM), decision tree and scalable boosting techniques. Out of 11 models, the results showed that extreme gradient boosting (XGB), categorical boosting (CatBoost), and decision tree are the most reliable algorithms. Moreover, novel evaluation metrics for confusion matrix of each model have been introduced to overcome the problem of existing metrics which don't consider domain knowledge during model evaluation. The innovated model will help to utilize company resources efficiently and dedicate personnel efforts to wells with the high-risk. As a result, progressive improvements on business, safety, environment, and performance of the business. This paper would be a milestone in the design and creation of the Well Integrity Database Management Program through the combination of integrity and ML.

Download Full-text

Perbandingan Levenshtein Distance Dan Jaro-Winkler Distance Untuk Koreksi Kata Dalam Preprocessing Analisis Sentimen Pengguna Twitter

Jurnal Fokus Elektroda : Energi Listrik, Telekomunikasi, Komputer, Elektronika dan Kendali) ◽

10.33772/jfe.v6i2.17751 ◽

2021 ◽

Vol 6 (2) ◽

pp. 88

Author(s):

M. Adnan Nur

Keyword(s):

Cross Validation ◽

Confusion Matrix ◽

Levenshtein Distance ◽

Fold Cross Validation

Pada analisis sentimen pengguna twitter dibutuhkan tahap preprocessing sebelum mengklasifikasikan sentimen. Preprocessing digunakan untuk menyaring kata yang dianggap perlu untuk kebutuhan klasifikasi. Kesalahan penulisan pada tweet merupakan suatu permasalahan dalam tahap preprocessing yang tentunya mempengaruhi tingkat akurasi klasifikasi. Berdasarkan hal tersebut dibutuhkan proses tambahan pada preprocessing untuk melakukan koreksi kesalahan penulisan kata. Pada penelitian ini, penulis membandingkan kinerja metode levenshtein distance dan jaro-winkler distance dalam melakukan koreksi kesalahan penulisan kata. Penelitian ini diawali dengan melakukan survei literatur untuk mengidentifikasi masalah. Selanjutnya melakukan studi pustaka untuk menentukan objek dan parameter yang dibutuhkan dalam merancang dan memodelkan data serta perangkat lunak. Perangkat lunak dikembangkan menggunakan bahasa pemrograman python dengan beberapa library sastrawi, levenshtein, pyjarowinkler dan sklearn. Perangkat lunak ini dibangun untuk memudahkan dalam melihat kinerja metode yang digunakan. Pengujian dilakukan menggunakan confusion matrix dengan 10 fold cross validation. Pengujian melibatkan pengukuran kinerja levenshtein distance jika ditempatkan sebelum dan sesudah proses stemming. Begitupula untuk metode jaro-winkler distance juga ditempatkan sebelum dan sesudah proses stemming dalam preprocessing. Dari hasil pengujian diperoleh nilai accuracy, recall dan f1score dari metode levenshtein distance lebih baik dibandingkan jaro-winkler distance. Penerapan koreksi kata dengan metode levenshtein distance juga meningkatkan accuracy, recall dan f1score jika dibandingkan tanpa koreksi kata pada preprocessing. Penempatan koreksi kata pada tahap preprocessing dari hasil pengujian menunjukan posisi setelah proses stemming lebih baik dari penempatan koreksi kata sebelum proses stemming

Download Full-text

A novel astrophysics-based framework for prediction of binding affinity of glucose binder

Modern Physics Letters B ◽

10.1142/s0217984920503467 ◽

2020 ◽

Vol 34 (31) ◽

pp. 2050346

Author(s):

Rajesh Kondabala ◽

Vijay Kumar ◽

Amjad Ali ◽

Manjit Kaur

Keyword(s):

Decision Tree ◽

Binding Affinity ◽

Cross Validation ◽

Learning Strategy ◽

Experimental Results ◽

The Other ◽

Computational Time ◽

Glucose Binding ◽

Regression Algorithms

In this paper, a novel astrophysics-based prediction framework is developed for estimating the binding affinity of a glucose binder. The proposed framework utilizes the molecule properties for predicting the binding affinity. It also uses the astrophysics-learning strategy that incorporates the concepts of Kepler’s law during the prediction process. The proposed framework is compared with 10 regression algorithms over ZINC dataset. Experimental results reveal that the proposed framework provides 99.30% accuracy of predicting binding affinity. However, decision tree provides the prediction with 97.14% accuracy. Cross-validation results show that the proposed framework provides better accuracy than the other existing models. The developed framework enables researchers to screen glucose binder rapidly. It also reduces computational time for designing small glucose binding molecule.

Download Full-text

Towards Optimization of Boosting Models for Formation Lithology Identification

Mathematical Problems in Engineering ◽

10.1155/2019/5309852 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Yunxin Xie ◽

Chenyang Zhu ◽

Yue Lu ◽

Zhengwei Zhu

Keyword(s):

Cross Validation ◽

Confusion Matrix ◽

Petroleum Engineering ◽

Gradient Boosting ◽

Gas Field ◽

Lithology Identification ◽

Extreme Gradient Boosting ◽

Evaluation Matrix ◽

Fold Cross Validation ◽

Geological Research

Lithology identification is an indispensable part in geological research and petroleum engineering study. In recent years, several mathematical approaches have been used to improve the accuracy of lithology classification. Based on our earlier work that assessed machine learning models on formation lithology classification, we optimize the boosting approaches to improve the classification ability of our boosting models with the data collected from the Daniudi gas field and Hangjinqi gas field. Three boosting models, namely, AdaBoost, Gradient Tree Boosting, and eXtreme Gradient Boosting, are evaluated with 5-fold cross validation. Regularization is applied to the Gradient Tree Boosting and eXtreme Gradient Boosting to avoid overfitting. After adapting the hyperparameter tuning approach on each boosting model to optimize the parameter set, we use stacking to combine the three optimized models to improve the classification accuracy. Results suggest that the optimized stacked boosting model has better performance concerning the evaluation matrix such as precision, recall, and f1 score compared with the single optimized boosting model. Confusion matrix also shows that the stacked model has better performance in distinguishing sandstone classes.

Download Full-text