Visual feature synthesis with semantic reconstructor for traditional and generalized zero‐shot object classification

Author(s):  
Ye Zhao ◽  
Tingting Xu ◽  
Xueliang Liu ◽  
Dan Guo ◽  
Zhenzhen Hu ◽  
...  
2020 ◽  
Vol 34 (07) ◽  
pp. 11733-11740
Author(s):  
Peirong Ma ◽  
Xiao Hu

Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross-modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a problem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the proposed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then utilizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.


2006 ◽  
Vol 33 (S 1) ◽  
Author(s):  
E. Huberle ◽  
K. Seymour ◽  
C.F. Altmann ◽  
H.O. Karnath

2019 ◽  
Author(s):  
Sushrut Thorat

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.


1999 ◽  
Author(s):  
Kimberly Coombs ◽  
Debra Freel ◽  
Douglas Lampert ◽  
Steven Brahm

Author(s):  
Jiwei Wei ◽  
Yang Yang ◽  
Xing Xu ◽  
Yanli Ji ◽  
Xiaofeng Zhu ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document