imbalance data
Recently Published Documents


TOTAL DOCUMENTS

98
(FIVE YEARS 46)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Vol 19 (6) ◽  
pp. 633-643
Author(s):  
Wayan Firdaus Mahmudy ◽  
Candra Dewi ◽  
Rio Arifando ◽  
Beryl Labique Ahmadie ◽  
Muh Arif Rahman

Patchouli plants are main raw materials for essential oils in Indonesia. Patchouli leaves have a very varied physical form based on the area planted, making it difficult to recognize the variety. This condition makes it difficult for farmers to recognize these varieties and they need experts’ advice. As there are few experts in this field, a technology for identifying the types of patchouli varieties is required. In this study, the identification model is constructed using a combination of leaf morphological features, texture features extracted with Wavelet and shape features extracted with convex hull. The results of feature extraction are used as input data for training of classification algorithms. The effectiveness of the input features is tested using three classification methods in class artificial neural network algorithms: (1) feedforward neural networks with backpropagation algorithm for training, (2) learning vector quantization (LVQ), (3) extreme learning machine (ELM). Synthetic minority over-sampling technique (SMOTE) is applied to solve the problem of class imbalance in the patchouli variety dataset. The results of the patchouli variety identification system by combining these three features indicate the level of recognition with an average accuracy of 72.61%, accuracy with the combination of these three features is higher when compared to using only morphological features (58.68%) or using only Wavelet features (59.03 %) or both (67.25%). In this study also showed that the use of SMOTE in imbalance data increases the accuracy with the highest average accuracy of 88.56%.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yong Chen

An improved nonlinear weighted extreme gradient boosting (XGBoost) technique is developed to forecast length of stay for patients with imbalance data. The algorithm first chooses an effective technique for fitting the duration of stay and determining the distribution law and then optimizes the negative log likelihood loss function using a heuristic nonlinear weighting method based on sample percentage. Theoretical and practical results reveal that, when compared to existing algorithms, the XGBoost method based on nonlinear weighting may achieve higher classification accuracy and better prediction performance, which is beneficial in treating more patients with fewer hospital beds.


2021 ◽  
Vol 183 (6) ◽  
pp. 29-35
Author(s):  
Ragini Gour ◽  
Ramratan Ahirwal
Keyword(s):  

2021 ◽  
Vol 5 (3) ◽  
pp. 504-510
Author(s):  
Agung Nugroho ◽  
Yoga Religia

The increasing demand for credit applications to banks has motivated the banking world to switch to more sophisticated techniques for analyzing the level of credit risk. One technique for analyzing the level of credit risk is the data mining approach. Data mining provides a technique for finding meaningful information from large amounts of data by way of classification. However, bank marketing data is a type of imbalance data so that if the classification is done the results are less than optimal. The classification algorithm that can be used for imbalance data types can use naïve Bayes. Naïve Bayes performs well in terms of classification. However, optimization is needed in order to obtain more optimal classification results. Optimization techniques in handling imbalance data have been developed with several approaches. Bagging and Genetic Algorithms can be used to overcome imbalance data. This study aims to compare the accuracy level of the naïve Bayes algorithm after optimization using the bagging and genetic algorithm. The results showed that the combination of bagging and a genetic algorithm could improve the performance of Naive Bayes by 4.57%.


Author(s):  
Changsu Kim ◽  
Hyesoo Lee ◽  
Hoekyung Jung

Smart farm refers to a farm that can remotely and automatically maintain proper growth and management of crops and livestock by integrating technology with agriculture. Currently, smart farms are concentrated in the field of smart horticulture, and although spreading research is being conducted in limited spaces. In addition, it is difficult to obtain a sufficient amount of data to be used for learning, and there is a problem that data imbalance occurs because it is difficult to obtain a similar amount for each class. In this paper, we propose a method to amplify a small amount of data and to solve the problems of imbalance data by using a feature that can learn to mimic the data of a generative adversarial network. The proposed method can create dataset of various crops and also show high hit rate. Dataset generated from crops would be used to solve problems of data imbalance by learning.


Sign in / Sign up

Export Citation Format

Share Document