Enhance Image Classification Performance Via Unsupervised Pre-trained Transformers Language Models
Keyword(s):
Abstract Image classification and categorization are essential to the capability of telling the difference between images for a machine. As Bidirectional Encoder Representations from Transformers became popular in many tasks of natural language processing recent years, it is intuitive to use these pre-trained language models for enhancing the computer vision tasks, \eg image classification. In this paper, by encoding image pixels using pre-trained transformers, then connect to a fully connected layer, the classification model outperforms the Wide ResNet model and the linear-probe iGPT-L model, and achieved accuracy of 99.60%~99.74% on the CIFAR-10 image set and accuracy of 99.10%~99.76% on the CIFAR-100 image set.
2020 ◽
Keyword(s):
2012 ◽
Vol 5
(11)
◽
pp. 2881-2892
◽
Keyword(s):
2012 ◽
Vol 5
(4)
◽
pp. 4535-4569
◽
Keyword(s):
2019 ◽
Vol 5
(1)
◽
pp. 40