Visual feature synthesis with semantic reconstructor for traditional and generalized zero‐shot object classification

A Variational Autoencoder with Deep Embedding Model for Generalized Zero-Shot Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6844 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11733-11740

Author(s):

Peirong Ma ◽

Xiao Hu

Keyword(s):

Feature Space ◽

Semantic Space ◽

Image Features ◽

Visual Feature ◽

Softmax Classifier ◽

Proposed Model ◽

Variational Autoencoder ◽

Deep Embedding ◽

Latent Features ◽

Generalized Zero

Generalized zero-shot learning (GZSL) is a challenging task that aims to recognize not only unseen classes unavailable during training, but also seen classes used at training stage. It is achieved by transferring knowledge from seen classes to unseen classes via a shared semantic space (e.g. attribute space). Most existing GZSL methods usually learn a cross-modal mapping between the visual feature space and the semantic space. However, the mapping model learned only from the seen classes will produce an inherent bias when used in the unseen classes. In order to tackle such a problem, this paper integrates a deep embedding network (DE) and a modified variational autoencoder (VAE) into a novel model (DE-VAE) to learn a latent space shared by both image features and class embeddings. Specifically, the proposed model firstly employs DE to learn the mapping from the semantic space to the visual feature space, and then utilizes VAE to transform both original visual features and the features obtained by the mapping into latent features. Finally, the latent features are used to train a softmax classifier. Extensive experiments on four GZSL benchmark datasets show that the proposed model significantly outperforms the state of the arts.

Download Full-text

Visual feature integration is a function of the temporo-parietal-junction

Aktuelle Neurologie ◽

10.1055/s-2006-953118 ◽

2006 ◽

Vol 33 (S 1) ◽

Author(s):

E. Huberle ◽

K. Seymour ◽

C.F. Altmann ◽

H.O. Karnath

Keyword(s):

Feature Integration ◽

Visual Feature

Download Full-text

Visual feature integration and the role of attention in the fusiform face area

Aktuelle Neurologie ◽

10.1055/s-2007-987954 ◽

2007 ◽

Vol 34 (S 2) ◽

Author(s):

E Huberle ◽

HO Karnath

Keyword(s):

Feature Integration ◽

Visual Feature ◽

Fusiform Face Area ◽

Face Area

Download Full-text

Using Convolutional Neural Networks to measure the contribution of visual features to the representation of object animacy in the brain

10.31237/osf.io/fxz4q ◽

2019 ◽

Cited By ~ 1

Author(s):

Sushrut Thorat

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Temporal Cortex ◽

Large Degree ◽

Visual Features ◽

Visual Feature ◽

Visual Stream ◽

Feature Information ◽

Ventral Visual Stream ◽

Animal Images

A mediolateral gradation in neural responses for images spanning animals to artificial objects is observed in the ventral temporal cortex (VTC). Which information streams drive this organisation is an ongoing debate. Recently, in Proklova et al. (2016), the visual shape and category (“animacy”) dimensions in a set of stimuli were dissociated using a behavioural measure of visual feature information. fMRI responses revealed a neural cluster (extra-visual animacy cluster - xVAC) which encoded category information unexplained by visual feature information, suggesting extra-visual contributions to the organisation in the ventral visual stream. We reassess these findings using Convolutional Neural Networks (CNNs) as models for the ventral visual stream. The visual features developed in the CNN layers can categorise the shape-matched stimuli from Proklova et al. (2016) in contrast to the behavioural measures used in the study. The category organisations in xVAC and VTC are explained to a large degree by the CNN visual feature differences, casting doubt over the suggestion that visual feature differences cannot account for the animacy organisation. To inform the debate further, we designed a set of stimuli with animal images to dissociate the animacy organisation driven by the CNN visual features from the degree of familiarity and agency (thoughtfulness and feelings). Preliminary results from a new fMRI experiment designed to understand the contribution of these non-visual features are presented.

Download Full-text