A two-stage ensemble method for the detection of class-label noise

2018 ◽  
Vol 275 ◽  
pp. 2374-2383 ◽  
Author(s):  
Maryam Sabzevari ◽  
Gonzalo Martínez-Muñoz ◽  
Alberto Suárez
Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6718
Author(s):  
Wei Feng ◽  
Yinghui Quan ◽  
Gabriel Dauphin

Real-world datasets are often contaminated with label noise; labeling is not a clear-cut process and reliable methods tend to be expensive or time-consuming. Depending on the learning technique used, such label noise is potentially harmful, requiring an increased size of the training set, making the trained model more complex and more prone to overfitting and yielding less accurate prediction. This work proposes a cleaning technique called the ensemble method based on the noise detection metric (ENDM). From the corrupted training set, an ensemble classifier is first learned and used to derive four metrics assessing the likelihood for a sample to be mislabeled. For each metric, three thresholds are set to maximize the classifying performance on a corrupted validation dataset when using three different ensemble classifiers, namely Bagging, AdaBoost and k-nearest neighbor (k-NN). These thresholds are used to identify and then either remove or correct the corrupted samples. The effectiveness of the ENDM is demonstrated in performing the classification of 15 public datasets. A comparative analysis is conducted concerning the homogeneous-ensembles-based majority vote method and consensus vote method, two popular ensemble-based label noise filters.


2017 ◽  
Vol 9 (2) ◽  
pp. 173 ◽  
Author(s):  
Charlotte Pelletier ◽  
Silvia Valero ◽  
Jordi Inglada ◽  
Nicolas Champion ◽  
Claire Marais Sicre ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document