scholarly journals Annotation-efficient classification combining active learning, pre-training and semi-supervised learning for biomedical images

2020 ◽  
Author(s):  
Sayedali Shetab Boushehri ◽  
Ahmad Bin Qasim ◽  
Dominik Waibel ◽  
Fabian Schmich ◽  
Carsten Marr

AbstractDeep learning image classification algorithms typically require large annotated datasets. In contrast to real world images where labels are typically cheap and easy to get, biomedical applications require experts’ time for annotation, which is often expensive and scarce. Therefore, identifying methods to maximize performance with a minimal amount of annotation is crucial. A number of active learning algorithms address this problem and iteratively identify most informative images for annotation from the data. However, they are mostly benchmarked on natural image datasets and it is not clear how they perform on biomedical image data with strong class imbalance, little color variance and high similarity between classes. Moreover, active learning neglects the typically abundant unlabeled data available.In this paper, we thus explore strategies combining active learning with pre-training and semi-supervised learning to increase performance on biomedical image classification tasks. We first benchmarked three active learning algorithms, three pre-training methods, and two training strategies on a dataset containing almost 20,000 white blood cell images, split up into ten different classes. Both pre-training using self-supervised learning and pre-trained ImageNet weights boosts the performance of active learning algorithms. A further improvement was achieved using semi-supervised learning. An extensive grid-search through the different active learning algorithms, pre-training methods and training strategies on three biomedical image datasets showed that a specific combination of these methods should be used. This recommended strategy improved the results over conventional annotation-efficient classification strategies by 3% to 14% macro recall in every case. We propose this strategy for other biomedical image classification tasks and expect to boost performance whenever scarce annotation is a problem.

2021 ◽  
Author(s):  
Sayedali Shetab Boushehri ◽  
Ahmad Qasim ◽  
Dominik Waibel ◽  
Fabian Schmich ◽  
Carsten Marr

Abstract Deep learning based classification of biomedical images requires manual annotation by experts, which is time-consuming and expensive. Incomplete-supervision approaches including active learning, pre-training and semi-supervised learning address this issue and aim to increase classification performance with a limited number of annotated images. Up to now, these approaches have been mostly benchmarked on natural image datasets, where image complexity and class balance typically differ considerably from biomedical classification tasks. In addition, it is not clear how to combine them to improve classification performance on biomedical image data. We thus performed an extensive grid search combining seven active learning algorithms, three pre-training methods and two training strategies as well as respective baselines (random sampling, random initialization, and supervised learning). For four biomedical datasets, we started training with 1% of labeled data and increased it by 5% iteratively, using 4-fold cross-validation in each cycle. We found that the contribution of pre-training and semi-supervised learning can reach up to 20% macro F1-score in each cycle. In contrast, the state-of-the-art active learning algorithms contribute less than 5% to macro F1-score in each cycle. Based on performance, implementation ease and computation requirements, we recommend the combination of BADGE active learning, ImageNet-weights pre-training, and pseudo-labeling as training strategy, which reached over 90% of fully supervised results with only 25% of annotated data for three out of four datasets. We believe that our study is an important step towards annotation and resource efficient model training for biomedical classification challenges.


Sign in / Sign up

Export Citation Format

Share Document