scholarly journals Zero-Shot Image Classification Based on a Learnable Deep Metric

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3241
Author(s):  
Jingyi Liu ◽  
Caijuan Shi ◽  
Dongjing Tu ◽  
Ze Shi ◽  
Yazhi Liu

The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.

Author(s):  
Xinxun Xu ◽  
Muli Yang ◽  
Yanhua Yang ◽  
Hao Wang

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into a low-dimensional common space for efficient retrieval. However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic features from different modalities. Moreover, the domain information and semantic information are entangled in visual features, which is not conducive for cross-modal matching since it will hinder the reduction of domain gap between sketch and image. In this paper, we propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of original semantic knowledge, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR. The progressive projection strategy maintains strong semantic supervision. Besides, to guarantee the retrieval features to capture clean and complete semantic information, the cross-reconstruction loss is introduced to encourage that any combinations of retrieval features and domain features can reconstruct the visual features. Extensive experiments demonstrate the superiority of our PDFD over state-of-the-art competitors.


2019 ◽  
Author(s):  
Dongyang Dai ◽  
Zhiyong Wu ◽  
Shiyin Kang ◽  
Xixin Wu ◽  
Jia Jia ◽  
...  
Keyword(s):  

2021 ◽  
Vol 23 ◽  
pp. 100237
Author(s):  
Luise J. Fischer ◽  
Heini Wernli ◽  
David N. Bresch

2017 ◽  
Vol 17 (02) ◽  
pp. 1750007 ◽  
Author(s):  
Chunwei Tian ◽  
Guanglu Sun ◽  
Qi Zhang ◽  
Weibing Wang ◽  
Teng Chen ◽  
...  

Collaborative representation classification (CRC) is an important sparse method, which is easy to carry out and uses a linear combination of training samples to represent a test sample. CRC method utilizes the offset between representation result of each class and the test sample to implement classification. However, the offset usually cannot well express the difference between every class and the test sample. In this paper, we propose a novel representation method for image recognition to address the above problem. This method not only fuses sparse representation and CRC method to improve the accuracy of image recognition, but also has novel fusion mechanism to classify images. The implementations of the proposed method have the following steps. First of all, it produces collaborative representation of the test sample. That is, a linear combination of all the training samples is first determined to represent the test sample. Then, it gets the sparse representation classification (SRC) of the test sample. Finally, the proposed method respectively uses CRC and SRC representations to obtain two kinds of scores of the test sample and fuses them to recognize the image. The experiments of face recognition show that the combination of CRC and SRC has satisfactory performance for image classification.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Ziqiang Wang ◽  
Xia Sun ◽  
Lijun Sun ◽  
Yuchun Huang

In many image classification applications, it is common to extract multiple visual features from different views to describe an image. Since different visual features have their own specific statistical properties and discriminative powers for image classification, the conventional solution for multiple view data is to concatenate these feature vectors as a new feature vector. However, this simple concatenation strategy not only ignores the complementary nature of different views, but also ends up with “curse of dimensionality.” To address this problem, we propose a novel multiview subspace learning algorithm in this paper, named multiview discriminative geometry preserving projection (MDGPP) for feature extraction and classification. MDGPP can not only preserve the intraclass geometry and interclass discrimination information under a single view, but also explore the complementary property of different views to obtain a low-dimensional optimal consensus embedding by using an alternating-optimization-based iterative algorithm. Experimental results on face recognition and facial expression recognition demonstrate the effectiveness of the proposed algorithm.


Sign in / Sign up

Export Citation Format

Share Document