scholarly journals Klasifikasi Status NEET pada Penduduk Usia Muda di Indonesia dengan SVM dan Random Forest

2021 ◽  
Vol 1 (2) ◽  
pp. 42-52
Author(s):  
Herdina Dwi Ramadhanti

Not in Education, Employment, or Training (NEET) adalah suatu indikator untuk mengetahui tingkat kerentanan penduduk usia muda dalam pengangguran, putus sekolah, serta keputusasaan terhadap pasar tenaga kerja.  Menurut ILO, Indonesia merupakan salah satu negara dengan tingkat NEET tertinggi di Asia sehingga merupakan suatu masalah yang perlu untuk segera diatasi. Salah satu alternatif yang dapat dilakukan untuk mengatasi fenomena tersebut adalah dengan deteksi dini terhadap penduduk yang berisiko menjadi NEET yang dapat dilakukan dengan menggunakan indikator-indikator yang telah melekat dalam individu seperti jenis kelamin, status perkawinan, dan disabilitas. Penelitian ini bertujuan untuk melakukan klasifikasi terhadap status NEET pada penduduk usia muda agar dapat digunakan untuk memprediksi apakah seorang individu termasuk ke dalam NEET dengan menggunakan metode klasifikasi yang meliputi Support Vector Machine (SVM) dan random forest. Data yang digunakan merupakan data sekunder yang diperoleh dari raw data SAKERNAS periode Agustus 2018. Hasil penelitian menunjukkan bahwa metode random forest memberikan hasil akurasi yang lebih tinggi sehingga memiliki kemampuan yang lebih baik dalam mengklasifikasikan penduduk muda menurut status NEET yaitu dengan akurasi sebesar 82,94 persen. Oleh karena itu, metode ini dapat digunakan untuk memprediksi status NEET dalam rangka menunjang pengurangan persentase NEET di Indonesia.

In today’s modern world, the human beings are affected with heart disease irrespective of the age. With the advancement of technological growth, predicting the availability of Heart diseases still remains a challenging issue. The difficulty of predicting the heart disease prevails due to the lack of availability of the symptoms. According to World Health Organization, 33% of population died due to heart diseases. For this, the diagnosis of heart diseases is made by complex combination of clinical data. With this overview, we have used Heart Disease Prediction dataset extracted from UCI Machine Learning Repository for predicting the level of heart disease. The prediction of heart disease classes are achieved in four ways. Firstly, the data set is preprocessed with Feature Scaling and Missing Values. Secondly, the raw data set is fitted to classifiers like logistic regression, KNN classifier, Support Vector Machine, Kernel Support Vector Machine, Naive Bayes, Random Forest and Decision Tree classifiers. Third, the raw data set is subjected to dimensionality reduction using Principal Component Analysis to project the dataset with important components. The dimensionality PCA reduced data set is fitted to the above-mentioned classifiers. Fourth, the performance comparison of raw data set and PCA reduced data set is done by analyzing the performance metrics like Precision, Recall, Accuracy and F-score. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that Random forest is found to be effective with the accuracy of 89% without applying PCA, 85% with five component PCA and 86% with seven component PCA.


2020 ◽  
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
Williams D. Beavis

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.


2019 ◽  
Vol 71 (3) ◽  
pp. 702-725
Author(s):  
Nayara Vasconcelos Estrabis ◽  
José Marcato Junior ◽  
Hemerson Pistori

O Cerrado é um dos biomas existentes no Brasil e o segundo mais extenso da América do Sul. Possui grande importância devido a sua biodiversidade, ecossistema e principalmente por servir como um reservatório, ou “esponja”, que distribui água para os demais biomas, além de ser berço de nascentes de algumas das maiores bacias da América do Sul. No entanto, devido às atividades antrópicas praticadas (com destaque para a pecuária e silvicultura) e a redução da vegetação nativa, este bioma está ameaçado. Considerado como hotspot em biodiversidade, o Cerrado pode não existir em 2050. Com a necessidade de sua preservação, o objetivo desse trabalho consistiu em investigar o uso de algoritmos de aprendizado de máquina para realizar o mapeamento da vegetação nativa existente na região do município de Três Lagoas, utilizando a plataforma em nuvem Google Earth Engine. O processo foi realizado com uma imagem Landsat-8 OLI, datada de 10 de outubro de 2018, e com os algoritmos Random Forest (RF) e Support Vector Machine (SVM). Na validação da classificação, o RF e o SVM apresentaram índices kappa iguais a 0,94 e 0,97, respectivamente. O RF, quando comparado ao SVM, apresentou classificação mais ruidosa. Por fim, verificou-se a existência de vegetação nativa de aproximadamente 2556 km² ao adotar o RF e 2873 km² ao adotar SVM.


Sign in / Sign up

Export Citation Format

Share Document