imbalanced dataset
Recently Published Documents


TOTAL DOCUMENTS

223
(FIVE YEARS 142)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hao Hu ◽  
Mengya Gao ◽  
Mingsheng Wu

In the real-world scenario, data often have a long-tailed distribution and training deep neural networks on such an imbalanced dataset has become a great challenge. The main problem caused by a long-tailed data distribution is that common classes will dominate the training results and achieve a very low accuracy on the rare classes. Recent work focuses on improving the network representation ability to overcome the long-tailed problem, while it always ignores adapting the network classifier to a long-tailed case, which will cause the “incompatibility” problem of network representation and network classifier. In this paper, we use knowledge distillation to solve the long-tailed data distribution problem and fully optimize the network representation and classifier simultaneously. We propose multiexperts knowledge distillation with class-balanced sampling to jointly learn high-quality network representation and classifier. Also, a channel activation-based knowledge distillation method is also proposed to improve the performance further. State-of-the-art performance on several large-scale long-tailed classification datasets shows the superior generalization of our method.


2021 ◽  
Vol 13 (24) ◽  
pp. 4970
Author(s):  
Colbert M. Jackson ◽  
Elhadi Adam

Accurate maps of the spatial distribution of tropical tree species provide valuable insights for ecologists and forest management. The discrimination of tree species for economic, ecological, and technical reasons is usually necessary for achieving promising results in tree species mapping. Most of the data used in tree species mapping normally have some degree of imbalance. This study aimed to assess the effects of imbalanced data in identifying and mapping trees species under threat in a selectively logged sub-montane heterogeneous tropical forest using random forest (RF) and support vector machine with radial basis function (RBF-SVM) kernel classifiers and WorldView-2 multispectral imagery. For comparison purposes, the original imbalanced dataset was standardized using three data sampling techniques: oversampling, undersampling, and combined oversampling and undersampling techniques in R. The combined oversampling and undersampling technique produced the best results: F1-scores of 68.56 ± 2.6% for RF and 64.64 ± 3.4% for SVM. The balanced dataset recorded improved classification accuracy compared to the original imbalanced dataset. This research observed that more separable classes recorded higher F1-scores. Among the species, Syzygium guineense and Zanthoxylum gilletii were the most accurately mapped whereas Newtonia buchananii was the least accurately mapped. The most important spectral bands with the ability to detect and distinguish between tree species as measured by random forest classifier, were the Red, Red Edge, Near Infrared 1, and Near Infrared 2.


Computers ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 141
Author(s):  
Salah Al-Darraji ◽  
Dhafer G. Honi ◽  
Francesca Fallucchi ◽  
Ayad I. Abdulsada ◽  
Romeo Giuliano ◽  
...  

Decision-making plays an essential role in the management and may represent the most important component in the planning process. Employee attrition is considered a well-known problem that needs the right decisions from the administration to preserve high qualified employees. Interestingly, artificial intelligence is utilized extensively as an efficient tool for predicting such a problem. The proposed work utilizes the deep learning technique along with some preprocessing steps to improve the prediction of employee attrition. Several factors lead to employee attrition. Such factors are analyzed to reveal their intercorrelation and to demonstrate the dominant ones. Our work was tested using the imbalanced dataset of IBM analytics, which contains 35 features for 1470 employees. To get realistic results, we derived a balanced version from the original one. Finally, cross-validation is implemented to evaluate our work precisely. Extensive experiments have been conducted to show the practical value of our work. The prediction accuracy using the original dataset is about 91%, whereas it is about 94% using a synthetic dataset.


2021 ◽  
Author(s):  
Deepan Datta ◽  
Gagandeep Singh ◽  
Aurobinda Routray ◽  
William K. Mohanty ◽  
Rahul Mahadik

Sign in / Sign up

Export Citation Format

Share Document