scholarly journals Teknik Resampling untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Diabetes Menggunakan C4.5, Random Forest, dan SVM

Techno Com ◽  
2021 ◽  
Vol 20 (3) ◽  
pp. 352-361
Author(s):  
Wahyu Nugraha ◽  
Raja Sabaruddin

Penderita diabetes di seluruh dunia terus mengalami peningkatan dengan angka kematian sebesar 4,6 juta pada tahun 2011 dan diperkirakan akan terus meningkat secara global menjadi 552 juta pada tahun 2030. Pencegahan Penyakit diabetes mungkin dapat dilakukan secara efektif dengan cara mendeteksinya sejak dini. Data mining dan machine learning terus dikembangkan agar menjadi alat yang handal dalam membangun model komputasi untuk mengidentifikasi penyakit diabetes pada tahap awal. Namun, masalah yang sering dihadapi dalam menganalisis penyakit diabetes ialah masalah ketidakseimbangan class. Kelas yang tidak seimbang membuat model pembelajaran akan sulit melakukan prediksi karena model pembelajaran didominasi oleh instance kelas mayoritas sehingga mengabaikan prediksi kelas minoritas. Pada penelitian ini kami mencoba menganalisa dan mencoba mengatasi masalah ketidakseimbangan kelas dengan menggunakan pendekatan level data yaitu teknik resampling data. Eksperimen ini menggunakan R language dengan library ROSE (version 0.0-4). Dataset Pima Indians dipilih pada penelitian ini karena merupakan salah satu dataset yang mengalami ketidakseimbangan kelas. Model pengklasifikasian pada penelitian ini menggunakan algoritma decision tree C4.5, RF (Random Forest), dan SVM (Support Vector Machines). Dari hasil eksperimen yang dilakukan model klasifikasi SVM dengan teknik resampling yang menggabungkan over dan under-sampling menjadi model yang memiliki performa terbaik dengan nilai AUC (Area Under Curve) sebesar 0.80

Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5896
Author(s):  
Eddi Miller ◽  
Vladyslav Borysenko ◽  
Moritz Heusinger ◽  
Niklas Niedner ◽  
Bastian Engelmann ◽  
...  

Changeover times are an important element when evaluating the Overall Equipment Effectiveness (OEE) of a production machine. The article presents a machine learning (ML) approach that is based on an external sensor setup to automatically detect changeovers in a shopfloor environment. The door statuses, coolant flow, power consumption, and operator indoor GPS data of a milling machine were used in the ML approach. As ML methods, Decision Trees, Support Vector Machines, (Balanced) Random Forest algorithms, and Neural Networks were chosen, and their performance was compared. The best results were achieved with the Random Forest ML model (97% F1 score, 99.72% AUC score). It was also carried out that model performance is optimal when only a binary classification of a changeover phase and a production phase is considered and less subphases of the changeover process are applied.


Prediction of stock markets is the act of attempting to determine the future value of an inventory of a business or other financial instrument traded on an economic exchange.Effectively foreseeing the future cost of a stock will amplify the benefits of the financial specialist.This article suggests a model of machine learning to forecast the price of the stock market.During the way toward considering various techniques and factors that should be considered, we found that strategy, for example, random forest, support vector machines were not completely used in past structures. In this article, we will present and audit an increasingly suitable strategy for anticipating more prominent exactness stock oscillations.The primary thing we thought about was the securities exchange estimating informational index from yahoo stocks. We will audit the utilization of random forest after pre-handling the data, help the vector machine on the informational index and the outcomes it produces.The powerful stock gauge will be a superb resource for financial exchange associations and will give genuine options in contrast to the difficulties confronting the stock speculator.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tom Elliot ◽  
Robert Morse ◽  
Duane Smythe ◽  
Ashley Norris

AbstractIt is 50 years since Sieveking et al. published their pioneering research in Nature on the geochemical analysis of artefacts from Neolithic flint mines in southern Britain. In the decades since, geochemical techniques to source stone artefacts have flourished globally, with a renaissance in recent years from new instrumentation, data analysis, and machine learning techniques. Despite the interest over these latter approaches, there has been variation in the quality with which these methods have been applied. Using the case study of flint artefacts and geological samples from England, we present a robust and objective evaluation of three popular techniques, Random Forest, K-Nearest-Neighbour, and Support Vector Machines, and present a pipeline for their appropriate use. When evaluated correctly, the results establish high model classification performance, with Random Forest leading with an average accuracy of 85% (measured through F1 Scores), and with Support Vector Machines following closely. The methodology developed in this paper demonstrates the potential to significantly improve on previous approaches, particularly in removing bias, and providing greater means of evaluation than previously utilised.


Author(s):  
Farshid Bagheri Saravi ◽  
Shadi Moghanian ◽  
Giti Javidi ◽  
Ehsan O Sheybani

Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.


2021 ◽  
Vol 10 (2) ◽  
pp. 111-117
Author(s):  
Yulia Aryani ◽  
Arie Wahyu Wijayanto

ABSTRAK – Klasifikasi merupakan salah satu topik utama dalam data mining atau machine learning. Klasifikasi adalah suatu pengelompokan data dimana data yang digunakan tersebut mempunyai kelas label atau target. Klasifikasi digunakan untuk mengambil data dan ditempatkan kedalam kelompok tertentu.  Studi tentang ionosfer penting untuk penelitian di berbagai domain, khususnya dalam sistem komunikasi.  Dalam penelitian ionosfer, perlu dilakukan klasifikasi radar yang berguna dan tidak berguna dari ionosfer. Pada makalah ini, akan dilakukan klasifikasi  terhadap data inosphere yang diambil dari UCI machine learning repository.  Klasifikasi dilakukan dengan menggunakan tiga metode klasifikasi, yakni  SVM ( Support Vector Machine ) , Naïve Bayes, dan Random Forest. Hasil dari percobaan ini bisa menunjukkan prediksi dari setiap percobaan dengan tingkat akurasi dan prediksi yang berbeda-beda di setiap metode yang digunakan. Hasil akurasi, presisi, dan recall terbaik didapatkan pada metode Random Forest dengan rasio data latih dan data uji sebesar 85% didapat akurasi dari data uji sebesar 90,57% dengan presisi sebesar 94,12%. Kata Kunci – Ionosfer; Klasifikasi; SVM; Naïve Bayes; Random Forest.


Author(s):  
Adeel Ahmed ◽  
Kamlesh Kumar ◽  
Mansoor A. Khuhro ◽  
Asif A. Wagan ◽  
Imtiaz A. Halepoto ◽  
...  

Nowadays, educational data mining is being employed as assessing tool for study and analysis of hidden patterns in academic databases which can be used to predict student’s academic performance. This paper implements various machine learning classification techniques on students’ academic records for results predication. For this purpose, data of MS(CS) students were collected from a public university of Pakistan through their assignments, quizzes, and sessional marks. The WEKA data mining tool has been used for performing all experiments namely, data pre-processing, classification, and visualization. For performance measure, classifier models were trained with 3- and 10-fold cross validation methods to evaluate classifiers' accuracy. The results show that bagging classifier combined with support vector machines outperform other classifiers in terms of accuracy, precision, recall, and F-measure score. The obtained outcomes confirm that our research provides significant contribution in prediction of students’ academic performance which can ultimately be used to assists faculty members to focus low grades students in improving their academic records.


Environments ◽  
2020 ◽  
Vol 7 (10) ◽  
pp. 84
Author(s):  
Dakota Aaron McCarty ◽  
Hyun Woo Kim ◽  
Hye Kyung Lee

The ability to rapidly produce accurate land use and land cover maps regularly and consistently has been a growing initiative as they have increasingly become an important tool in the efforts to evaluate, monitor, and conserve Earth’s natural resources. Algorithms for supervised classification of satellite images constitute a necessary tool for the building of these maps and they have made it possible to establish remote sensing as the most reliable means of map generation. In this paper, we compare three machine learning techniques: Random Forest, Support Vector Machines, and Light Gradient Boosted Machine, using a 70/30 training/testing evaluation model. Our research evaluates the accuracy of Light Gradient Boosted Machine models against the more classic and trusted Random Forest and Support Vector Machines when it comes to classifying land use and land cover over large geographic areas. We found that the Light Gradient Booted model is marginally more accurate with a 0.01 and 0.059 increase in the overall accuracy compared to Support Vector and Random Forests, respectively, but also performed around 25% quicker on average.


2021 ◽  
Vol 22 (4) ◽  
pp. 1-15
Author(s):  
Jesús Alejandro Navarro Acosta ◽  
Valeria Soto Mendoza ◽  
Félix Raymundo Saucedo Zendejo ◽  
José Maria Guajardo Espinoza ◽  
María Teresa Rivera Morales

En la presente obra se describe la realización de un ejercicio de validación de resultados de una prueba psicológica aplicada a maestros y alumnos en estado de aislamiento por la pandemia por COVID-19 en el estado de Coahuila, México. El objetivo de este trabajo es aplicar técnicas de machine learning para validar un instrumento que mide las emociones y los sentimientos negativos, así como el sesgo cognitivo o desviación de pensamiento sobre la educación y la pandemia en situación de aislamiento. Para el cumplimiento del objetivo se aplicó un instrumento en formato electrónico que se diseminó en el estado de Coahuila, los usuarios responden y se genera la base de datos, la cual, después de su preprocesamiento es analizada mediante la combinación de Random forest (RF) y Support Vector Machines (SVM); obteniendo como resultado la pertinencia o no de algunos de los reactivos en las pruebas, dando con esto una validez interna al instrumento. Los resultados experimentales muestran que la metodología propuesta es capaz de seleccionar las variables predictoras más relevantes. De esta manera, se obtienen resultados satisfactorios en la clasificación y predicción de diagnósticos psicológicos globales y segmentados por características de los respondientes. Por otro lado, aunque las técnicas implementadas son robustas y confiables, éstas presentan limitaciones en cuanto a la observación de los otros tipos de validez: la de constructo, la externa, entre otras; lo cual pudiera limitar su utilización. Si bien, en el campo de la psicometría existen diversas estrategias clásicas, la metodología propuesta basada en la combinación de técnicas de machine learning para el análisis y validación de este tipo de pruebas, favorece el crecimiento de opciones para mejorar los diagnósticos y en consecuencia el tratamiento de padecimientos psicológicos.


Author(s):  
David R. Musicant

In recent years, massive quantities of business and research data have been collected and stored, partly due to the plummeting cost of data storage. Much interest has therefore arisen in how to mine this data to provide useful information. Data mining as a discipline shares much in common with machine learning and statistics, as all of these endeavors aim to make predictions about data as well as to better understand the patterns that can be found in a particular dataset. The support vector machine (SVM) is a current machine learning technique that performs quite well in solving common data mining problems.


Sign in / Sign up

Export Citation Format

Share Document