scholarly journals Komparasi Kinerja Algoritma C4.5, Random Forest, dan Gradient Boosting untuk Klasifikasi Komoditas

Techno Com ◽  
2021 ◽  
Vol 20 (3) ◽  
pp. 400-410
Author(s):  
Edi Ismanto ◽  
Melly Novalia

Penentuan komoditas unggulan pada suatu daerah merupakan hal yang sangat penting untuk dilakukan, salah satunya di Provinsi Riau. Memahami mengenai prioritas perencanaan pengembangan wilayah yang diarahkan pada pengembangan komoditas unggulan. Sejauh ini Provinsi Riau memiliki potensi komoditas disektor perkebunan yang sangat menjajikan, data yang ada sebelumnya banyak digunakan sebagai laporan, dalam bentuk data excel. Data komoditas bisa digali dengan teknik data mining untuk mendapatkan pola klasifikasi, sehingga lebih memudahkan Pemerintah Provinsi Riau dalam mendapatkan informasi komoditas unggulannya. Pada penelitian ini, dilakukan pengujian kinerja algoritma klasifikasi yang banyak digunakan dalam data mining, agar mendapatkan algoritma yang memiliki kinerja paling baik untuk klasifikasi data komoditas. Beberapa penelitian mengatakan algoritma klasifikasi C4.5 memiliki kinerja kurang baik dibandingkan dengan algoritma yang lain seperti random forest, dan gradient boosting. Dalam penelitian ini dilakukan perbandingan antara algoritma C4.5, random forest, dan gradient boosting, untuk mengukur kinerja terbaik dalam melakukan klasifikasi data komoditas. Data yang digunakan dalam penelitian ini yaitu data komoditas perkebunan Provinsi Riau pada tahun 2019. Hasil dari penelitian ini, algoritma yang memiliki kinerja terbaik untuk  klasifikasi adalah algoritma random forest dengan syarat menggunakan shuffle sampling. Dan mayoritas linear sampling menghasilkan kinerja kurang baik. Sedangkan shuffle sampling memiliki kinerja sangat baik untuk algoritma berbasis tree.

JNANALOKA ◽  
2020 ◽  
pp. 1-10
Author(s):  
Muhammad Kurniawan

Data mining berhubungan dengan pencarian data untuk menemukan pola atau pengetahuan da- ri data keseluruhan. Data mining dapat digunakan untuk memprediksi suatu keadaan, seperti apakah seseorang terkena penyakit ginjal kronis atau tidak. Dalam penelitian ini metode pengu- rangan fitur symmetrical uncertainty dengan algoritma klasifikasi Gradient Boosting, Random Forest, Support Vector Machine, dan Naïve Bayes digunakan untuk memprediksi penyakit ginjal kronis. Jumlah atribut yang diklasifikasi adalah 24, 12, 6, 5, dan 4 atribut. Peningkatan nilai akurasi didapatkan pada pengurangan atribut dari 24 ke 12 dengan algoritma Naïve Bayes. Se- lain itu, diperoleh Support Vector Machine memiliki akurasi terbaik pada semua jumlah atribut, diikuti Gradient Boosting, Random Forest, dan Naïve Bayes. Pada klasifikasi 5 atribut, terlihat algoritma Support Vector Machine dan Gradient Boosting masih memiliki akurasi 1. Kelima atribut tersebut antara lain: hemoglobin, packed cell volume, serum creatinine, albumin, dan specifity gravity. Pengurangan atribut dapat meningkatkan akurasi dan dapat memudahkan proses prediksi karena jumlah atribut lebih sedikit. Belum ada


JNANALOKA ◽  
2020 ◽  
pp. 1-10
Author(s):  
Muhammad Kurniawan

Data mining berhubungan dengan pencarian data untuk menemukan pola atau pengetahuan da- ri data keseluruhan. Data mining dapat digunakan untuk memprediksi suatu keadaan, seperti apakah seseorang terkena penyakit ginjal kronis atau tidak. Dalam penelitian ini metode pengu- rangan fitur symmetrical uncertainty dengan algoritma klasifikasi Gradient Boosting, Random Forest, Support Vector Machine, dan Naïve Bayes digunakan untuk memprediksi penyakit ginjal kronis. Jumlah atribut yang diklasifikasi adalah 24, 12, 6, 5, dan 4 atribut. Peningkatan nilai akurasi didapatkan pada pengurangan atribut dari 24 ke 12 dengan algoritma Naïve Bayes. Se- lain itu, diperoleh Support Vector Machine memiliki akurasi terbaik pada semua jumlah atribut, diikuti Gradient Boosting, Random Forest, dan Naïve Bayes. Pada klasifikasi 5 atribut, terlihat algoritma Support Vector Machine dan Gradient Boosting masih memiliki akurasi 1. Kelima atribut tersebut antara lain: hemoglobin, packed cell volume, serum creatinine, albumin, dan specifity gravity. Pengurangan atribut dapat meningkatkan akurasi dan dapat memudahkan proses prediksi karena jumlah atribut lebih sedikit. Belum ada


2018 ◽  
Vol 5 (1) ◽  
pp. 47-55
Author(s):  
Florensia Unggul Damayanti

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.


2021 ◽  
Vol 13 (5) ◽  
pp. 1021
Author(s):  
Hu Ding ◽  
Jiaming Na ◽  
Shangjing Jiang ◽  
Jie Zhu ◽  
Kai Liu ◽  
...  

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.


Author(s):  
Marcelo N. de Sousa ◽  
Ricardo Sant’Ana ◽  
Rigel P. Fernandes ◽  
Julio Cesar Duarte ◽  
José A. Apolinário ◽  
...  

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.


2021 ◽  
Vol 11 (4) ◽  
pp. 1378
Author(s):  
Seung Hyun Lee ◽  
Jaeho Son

It has been pointed out that the act of carrying a heavy object that exceeds a certain weight by a worker at a construction site is a major factor that puts physical burden on the worker’s musculoskeletal system. However, due to the nature of the construction site, where there are a large number of workers simultaneously working in an irregular space, it is difficult to figure out the weight of the object carried by the worker in real time or keep track of the worker who carries the excess weight. This paper proposes a prototype system to track the weight of heavy objects carried by construction workers by developing smart safety shoes with FSR (Force Sensitive Resistor) sensors. The system consists of smart safety shoes with sensors attached, a mobile device for collecting initial sensing data, and a web-based server computer for storing, preprocessing and analyzing such data. The effectiveness and accuracy of the weight tracking system was verified through the experiments where a weight was lifted by each experimenter from +0 kg to +20 kg in 5 kg increments. The results of the experiment were analyzed by a newly developed machine learning based model, which adopts effective classification algorithms such as decision tree, random forest, gradient boosting algorithm (GBM), and light GBM. The average accuracy classifying the weight by each classification algorithm showed similar, but high accuracy in the following order: random forest (90.9%), light GBM (90.5%), decision tree (90.3%), and GBM (89%). Overall, the proposed weight tracking system has a significant 90.2% average accuracy in classifying how much weight each experimenter carries.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Satoko Hiura ◽  
Shige Koseki ◽  
Kento Koyama

AbstractIn predictive microbiology, statistical models are employed to predict bacterial population behavior in food using environmental factors such as temperature, pH, and water activity. As the amount and complexity of data increase, handling all data with high-dimensional variables becomes a difficult task. We propose a data mining approach to predict bacterial behavior using a database of microbial responses to food environments. Listeria monocytogenes, which is one of pathogens, population growth and inactivation data under 1,007 environmental conditions, including five food categories (beef, culture medium, pork, seafood, and vegetables) and temperatures ranging from 0 to 25 °C, were obtained from the ComBase database (www.combase.cc). We used eXtreme gradient boosting tree, a machine learning algorithm, to predict bacterial population behavior from eight explanatory variables: ‘time’, ‘temperature’, ‘pH’, ‘water activity’, ‘initial cell counts’, ‘whether the viable count is initial cell number’, and two types of categories regarding food. The root mean square error of the observed and predicted values was approximately 1.0 log CFU regardless of food category, and this suggests the possibility of predicting viable bacterial counts in various foods. The data mining approach examined here will enable the prediction of bacterial population behavior in food by identifying hidden patterns within a large amount of data.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jong Ho Kim ◽  
Haewon Kim ◽  
Ji Su Jang ◽  
Sung Mi Hwang ◽  
So Young Lim ◽  
...  

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.


Sign in / Sign up

Export Citation Format

Share Document