Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi  Penilaian Kredit

AbstrakPenilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit, meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit. Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi tidak seimbang untuk penilaian kredit pada dataset German Credit. Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling AbstractCredit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset. Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resampling

Download Full-text

Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit

Jurnal Informatika ◽

10.31294/ji.v5i2.4158 ◽

2018 ◽

Vol 5 (2) ◽

pp. 175-185

Author(s):

Akhmad Syukron ◽

Agus Subekti

Keyword(s):

Random Forest ◽

Financial Institution ◽

Credit Scoring ◽

Classification Performance ◽

Random Forest Classification ◽

Random Forest Method ◽

Forest Classification ◽

Improve Accuracy ◽

Under Sampling ◽

Accuracy Performance

Download Full-text

An Experimental Assessment of Random Forest Classification Performance Improvisation with Sampling and Stage Wise Success Rate Calculation

Procedia Computer Science ◽

10.1016/j.procs.2020.03.381 ◽

2020 ◽

Vol 167 ◽

pp. 1711-1721

Author(s):

Anjali S. More ◽

Dipti P. Rana

Keyword(s):

Random Forest ◽

Success Rate ◽

Classification Performance ◽

Experimental Assessment ◽

Random Forest Classification ◽

Forest Classification ◽

Rate Calculation

Download Full-text

The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance

Journal of Spectroscopy ◽

10.1155/2018/8316918 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Na’eem Hoosen Agjee ◽

Onisimo Mutanga ◽

Kabir Peerbhay ◽

Riyad Ismail

Keyword(s):

Random Forest ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Classification ◽

Node Splitting ◽

Forest Classification ◽

Vector Machines ◽

Classifier Performance ◽

The Impact

Hyperspectral datasets contain spectral noise, the presence of which adversely affects the classifier performance to generalize accurately. Despite machine learning algorithms being regarded as robust classifiers that generalize well under unfavourable noisy conditions, the extent of this is poorly understood. This study aimed to evaluate the influence of simulated spectral noise (10%, 20%, and 30%) on random forest (RF) and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and low infested water hyacinth plants. Results from this study showed that RF was slightly influenced by simulated noise with classification accuracies decreasing for week one and week two with the addition of 30% noise. In comparison to RF, oRF-RR and oRF-SVM yielded higher test accuracies (oRF-RR: 5.36%–7.15%; oRF-SVM: 3.58%–5.36%) and test kappa coefficients (oRF-RR: 10.72%–14.29%; oRF-SVM: 7.15%–10.72%). Notably, oRF-RR test accuracies and kappa coefficients remained consistent irrespective of simulated noise level for week one and week two while similar results were achieved for week three using oRF-SVM. Overall, this study has demonstrated that oRF-RR can be regarded a robust classification algorithm that is not influenced by noisy spectral conditions.

Download Full-text