Zero-Shot Image Classification Based on a Learnable Deep Metric

The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.

Download Full-text

Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/137 ◽

2020 ◽

Author(s):

Xinxun Xu ◽

Muli Yang ◽

Yanhua Yang ◽

Hao Wang

Keyword(s):

Image Retrieval ◽

Semantic Information ◽

Semantic Space ◽

Semantic Knowledge ◽

Visual Features ◽

Semantic Features ◽

Retrieval Task ◽

Common Space ◽

Low Dimensional ◽

Domain Independent

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into a low-dimensional common space for efficient retrieval. However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic features from different modalities. Moreover, the domain information and semantic information are entangled in visual features, which is not conducive for cross-modal matching since it will hinder the reduction of domain gap between sketch and image. In this paper, we propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of original semantic knowledge, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR. The progressive projection strategy maintains strong semantic supervision. Besides, to guarantee the retrieval features to capture clean and complete semantic information, the cross-reconstruction loss is introduced to encourage that any combinations of retrieval features and domain features can reconstruct the visual features. Extensive experiments demonstrate the superiority of our PDFD over state-of-the-art competitors.

Download Full-text

Uncertainties and Interdisciplinary Transfers Through the End-to-End System (UNITES): Capturing Uncertainty in the Common Tactical Environmental Picture

10.21236/ada625147 ◽

2001 ◽

Author(s):

Allan R. Robinson

Keyword(s):

The Common ◽

End To End

Download Full-text

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT

10.21437/interspeech.2019-2292 ◽

2019 ◽

Author(s):

Dongyang Dai ◽

Zhiyong Wu ◽

Shiyin Kang ◽

Xixin Wu ◽

Jia Jia ◽

...

Keyword(s):

Semantic Features ◽

End To End

Download Full-text

Widening the common space to reduce the gap between climate science and decision-making in industry

Climate Services ◽

10.1016/j.cliser.2021.100237 ◽

2021 ◽

Vol 23 ◽

pp. 100237

Author(s):

Luise J. Fischer ◽

Heini Wernli ◽

David N. Bresch

Keyword(s):

Decision Making ◽

Climate Science ◽

Common Space ◽

The Common

Download Full-text

Multi-camera Vehicle Tracking from End-to-end based on Spatial-Temporal Information and Visual Features

Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence ◽

10.1145/3374587.3374629 ◽

2019 ◽

Author(s):

Wei Jin

Keyword(s):

Temporal Information ◽

Vehicle Tracking ◽

Visual Features ◽

End To End

Download Full-text

Integrating Sparse and Collaborative Representation Classifications for Image Classification

International Journal of Image and Graphics ◽

10.1142/s0219467817500073 ◽

2017 ◽

Vol 17 (02) ◽

pp. 1750007 ◽

Cited By ~ 8

Author(s):

Chunwei Tian ◽

Guanglu Sun ◽

Qi Zhang ◽

Weibing Wang ◽

Teng Chen ◽

...

Keyword(s):

Sparse Representation ◽

Image Classification ◽

Linear Combination ◽

Image Recognition ◽

Test Sample ◽

Collaborative Representation ◽

Training Samples ◽

Sparse Representation Classification ◽

The Difference ◽

Representation Method

Collaborative representation classification (CRC) is an important sparse method, which is easy to carry out and uses a linear combination of training samples to represent a test sample. CRC method utilizes the offset between representation result of each class and the test sample to implement classification. However, the offset usually cannot well express the difference between every class and the test sample. In this paper, we propose a novel representation method for image recognition to address the above problem. This method not only fuses sparse representation and CRC method to improve the accuracy of image recognition, but also has novel fusion mechanism to classify images. The implementations of the proposed method have the following steps. First of all, it produces collaborative representation of the test sample. That is, a linear combination of all the training samples is first determined to represent the test sample. Then, it gets the sparse representation classification (SRC) of the test sample. Finally, the proposed method respectively uses CRC and SRC representations to obtain two kinds of scores of the test sample and fuses them to recognize the image. The experiments of face recognition show that the combination of CRC and SRC has satisfactory performance for image classification.

Download Full-text

TableDet: An End-to-End Deep Learning Approach for Table Detection and Table Image Classification in Data Sheet Images

Neurocomputing ◽

10.1016/j.neucom.2021.10.023 ◽

2021 ◽

Author(s):

Johan Fernandes ◽

Murat Simsek ◽

Burak Kantarci ◽

Shahzad Khan

Keyword(s):

Deep Learning ◽

Image Classification ◽

Data Sheet ◽

Learning Approach ◽

End To End

Download Full-text

A Study on the Application of Symbiosis Design to the Common Space Design of Youth Rental Apartment

Korea Institute of Design Research Society ◽

10.46248/kidrs.2021.1.282 ◽

2021 ◽

Vol 6 (1) ◽

pp. 282-293

Author(s):

Yuan Chen ◽

Jae Cheo Kang

Keyword(s):

Common Space ◽

Space Design ◽

The Common

Download Full-text

Dual Graph Convolutional Network for Hyperspectral Image Classification With Limited Training Samples

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3061088 ◽

2021 ◽

pp. 1-18

Author(s):

Xin He ◽

Yushi Chen ◽

Pedram Ghamisi

Keyword(s):

Image Classification ◽

Hyperspectral Image ◽

Dual Graph ◽

Hyperspectral Image Classification ◽

Convolutional Network ◽

Training Samples ◽

Limited Training Samples

Download Full-text

Multiview Discriminative Geometry Preserving Projection for Image Classification

The Scientific World JOURNAL ◽

10.1155/2014/924090 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Ziqiang Wang ◽

Xia Sun ◽

Lijun Sun ◽

Yuchun Huang

Keyword(s):

Image Classification ◽

Learning Algorithm ◽

Subspace Learning ◽

Visual Features ◽

Expression Recognition ◽

Single View ◽

Discrimination Information ◽

New Feature ◽

Low Dimensional ◽

Conventional Solution

In many image classification applications, it is common to extract multiple visual features from different views to describe an image. Since different visual features have their own specific statistical properties and discriminative powers for image classification, the conventional solution for multiple view data is to concatenate these feature vectors as a new feature vector. However, this simple concatenation strategy not only ignores the complementary nature of different views, but also ends up with “curse of dimensionality.” To address this problem, we propose a novel multiview subspace learning algorithm in this paper, named multiview discriminative geometry preserving projection (MDGPP) for feature extraction and classification. MDGPP can not only preserve the intraclass geometry and interclass discrimination information under a single view, but also explore the complementary property of different views to obtain a low-dimensional optimal consensus embedding by using an alternating-optimization-based iterative algorithm. Experimental results on face recognition and facial expression recognition demonstrate the effectiveness of the proposed algorithm.

Download Full-text