scholarly journals Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval

2020 ◽  
Vol 34 (07) ◽  
pp. 11515-11522
Author(s):  
Kaiyi Lin ◽  
Xing Xu ◽  
Lianli Gao ◽  
Zheng Wang ◽  
Heng Tao Shen

Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.

2020 ◽  
Vol 34 (03) ◽  
pp. 2661-2668
Author(s):  
Chuang Lin ◽  
Sicheng Zhao ◽  
Lei Meng ◽  
Tat-Seng Chua

Existing domain adaptation methods on visual sentiment classification typically are investigated under the single-source scenario, where the knowledge learned from a source domain of sufficient labeled data is transferred to the target domain of loosely labeled or unlabeled data. However, in practice, data from a single source domain usually have a limited volume and can hardly cover the characteristics of the target domain. In this paper, we propose a novel multi-source domain adaptation (MDA) method, termed Multi-source Sentiment Generative Adversarial Network (MSGAN), for visual sentiment classification. To handle data from multiple source domains, it learns to find a unified sentiment latent space where data from both the source and target domains share a similar distribution. This is achieved via cycle consistent adversarial learning in an end-to-end manner. Extensive experiments conducted on four benchmark datasets demonstrate that MSGAN significantly outperforms the state-of-the-art MDA approaches for visual sentiment classification.


2020 ◽  
Vol 10 (23) ◽  
pp. 8415
Author(s):  
Jeongmin Lee ◽  
Younkyoung Yoon ◽  
Junseok Kwon

We propose a novel generative adversarial network for class-conditional data augmentation (i.e., GANDA) to mitigate data imbalance problems in image classification tasks. The proposed GANDA generates minority class data by exploiting majority class information to enhance the classification accuracy of minority classes. For stable GAN training, we introduce a new denoising autoencoder initialization with explicit class conditioning in the latent space, which enables the generation of definite samples. The generated samples are visually realistic and have a high resolution. Experimental results demonstrate that the proposed GANDA can considerably improve classification accuracy, especially when datasets are highly imbalanced on standard benchmark datasets (i.e., MNIST and CelebA). Our generated samples can be easily used to train conventional classifiers to enhance their classification accuracy.


2021 ◽  
Vol 6 (1) ◽  
pp. 1-5
Author(s):  
Adam Balint ◽  
Graham Taylor

Recent advances in Generative Adversarial Networks (GANs) have shown great progress on a large variety of tasks. A common technique used to yield greater diversity of samples is conditioning on class labels. Conditioning on high-dimensional structured or unstructured information has also been shown to improve generation results, e.g. Image-to-Image translation. The conditioning information is provided in the form of human annotations, which can be expensive and difficult to obtain in cases where domain knowledge experts are needed. In this paper, we present an alternative: conditioning on low-dimensional structured information that can be automatically extracted from the input without the need for human annotators. Specifically, we propose a Palette-conditioned Generative Adversarial Network (Pal-GAN), an architecture-agnostic model that conditions on both a colour palette and a segmentation mask for high quality image synthesis. We show improvements on conditional consistency, intersection-over-union, and Fréchet inception distance scores. Additionally, we show that sampling colour palettes significantly changes the style of the generated images.


2021 ◽  
Author(s):  
Tham Vo

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.


Author(s):  
Liang Yang ◽  
Yuexue Wang ◽  
Junhua Gu ◽  
Chuan Wang ◽  
Xiaochun Cao ◽  
...  

Motivated by the capability of Generative Adversarial Network on exploring the latent semantic space and capturing semantic variations in the data distribution, adversarial learning has been adopted in network embedding to improve the robustness. However, this important ability is lost in existing adversarially regularized network embedding methods, because their embedding results are directly compared to the samples drawn from perturbation (Gaussian) distribution without any rectification from real data. To overcome this vital issue, a novel Joint Adversarial Network Embedding (JANE) framework is proposed to jointly distinguish the real and fake combinations of the embeddings, topology information and node features. JANE contains three pluggable components, Embedding module, Generator module and Discriminator module. The overall objective function of JANE is defined in a min-max form, which can be optimized via alternating stochastic gradient. Extensive experiments demonstrate the remarkable superiority of the proposed JANE on link prediction (3% gains in both AUC and AP) and node clustering (5% gain in F1 score).


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4291
Author(s):  
Xuejiao Gong ◽  
Bo Tang ◽  
Ruijin Zhu ◽  
Wenlong Liao ◽  
Like Song

Due to the strong concealment of electricity theft and the limitation of inspection resources, the number of power theft samples mastered by the power department is insufficient, which limits the accuracy of power theft detection. Therefore, a data augmentation method for electricity theft detection based on the conditional variational auto-encoder (CVAE) is proposed. Firstly, the stealing power curves are mapped into low dimensional latent variables by using the encoder composed of convolutional layers, and the new stealing power curves are reconstructed by the decoder composed of deconvolutional layers. Then, five typical attack models are proposed, and the convolutional neural network is constructed as a classifier according to the data characteristics of stealing power curves. Finally, the effectiveness and adaptability of the proposed method is verified by a smart meters’ data set from London. The simulation results show that the CVAE can take into account the shapes and distribution characteristics of samples at the same time, and the generated stealing power curves have the best effect on the performance improvement of the classifier than the traditional augmentation methods such as the random oversampling method, synthetic minority over-sampling technique, and conditional generative adversarial network. Moreover, it is suitable for different classifiers.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Linyan Li ◽  
Yu Sun ◽  
Fuyuan Hu ◽  
Tao Zhou ◽  
Xuefeng Xi ◽  
...  

In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image.


2019 ◽  
Vol 9 (8) ◽  
pp. 1550 ◽  
Author(s):  
Aihong Shen ◽  
Huasheng Wang ◽  
Junjie Wang ◽  
Hongchen Tan ◽  
Xiuping Liu ◽  
...  

Person re-identification (re-ID) is a fundamental problem in the field of computer vision. The performance of deep learning-based person re-ID models suffers from a lack of training data. In this work, we introduce a novel image-specific data augmentation method on the feature map level to enforce feature diversity in the network. Furthermore, an attention assignment mechanism is proposed to enforce that the person re-ID classifier focuses on nearly all important regions of the input person image. To achieve this, a three-stage framework is proposed. First, a baseline classification network is trained for person re-ID. Second, an attention assignment network is proposed based on the baseline network, in which the attention module learns to suppress the response of the current detected regions and re-assign attentions to other important locations. By this means, multiple important regions for classification are highlighted by the attention map. Finally, the attention map is integrated in the attention-aware adversarial network (AAA-Net), which generates high-performance classification results with an adversarial training strategy. We evaluate the proposed method on two large-scale benchmark datasets, including Market1501 and DukeMTMC-reID. Experimental results show that our algorithm performs favorably against the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document