scholarly journals TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets

Author(s):  
Xin Zhao ◽  
Guiguang Ding ◽  
Yuchen Guo ◽  
Jungong Han ◽  
Yue Gao

Cross-view retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Cross-view hashing is to efficiently solve the cross-view retrieval problem with binary hash codes. Most existing works on cross-view hashing exploit multi-view embedding method to tackle this problem, which inevitably causes the information loss in both image and text domains. Inspired by the Generative Adversarial Nets (GANs), this paper presents a new model that is able to Turn Cross-view Hashing into single-view hashing (TUCH), thus enabling the information of image to be preserved as much as possible. TUCH is a novel deep architecture that integrates a language model network T for text feature extraction, a generator network G to generate fake images from text feature and a hashing network H for learning hashing functions to generate compact binary codes. Our architecture effectively unifies joint generative adversarial learning and cross-view hashing. Extensive empirical evidence shows that our TUCH approach achieves state-of-the-art results, especially on text to image retrieval, based on image-sentences datasets, i.e. standard IAPRTC-12 and large-scale Microsoft COCO.

Author(s):  
Chao Li ◽  
Cheng Deng ◽  
Lei Wang ◽  
De Xie ◽  
Xianglong Liu

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.


2020 ◽  
Vol 34 (04) ◽  
pp. 4412-4419 ◽  
Author(s):  
Zhao Kang ◽  
Wangtao Zhou ◽  
Zhitong Zhao ◽  
Junming Shao ◽  
Meng Han ◽  
...  

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario. Extensive experiments on various large-scale benchmark data sets validate the effectiveness and efficiency of our approach with respect to state-of-the-art clustering methods.


Author(s):  
Jie Lin ◽  
Zechao Li ◽  
Jinhui Tang

With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for large-scale face image retrieval. The proposed network incorporates the end-to-end learning, the divide-and-encode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolution-pooling layers is proposed to extract multi-scale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divide-and-encode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some state-of-the-art hashing methods.


2021 ◽  
Author(s):  
Qi Zhai ◽  
Zhigang Kan ◽  
Linhui Feng ◽  
Linbo Qiao ◽  
Feng Liu

Recently, Chinese event detection has attracted more and more attention. As a special kind of hieroglyphics, Chinese glyphs are semantically useful but still unexplored in this task. In this paper, we propose a novel Glyph-Aware Fusion Network, named GlyFN. It introduces the glyphs' information into the pre-trained language model representation. To obtain a better representation, we design a Vector Linear Fusion mechanism to fuse them. Specifically, it first utilizes a max-pooling to capture salient information. Then, we use the linear operation of vectors to retain unique information. Moreover, for large-scale unstructured text, we distribute the data into different clusters parallelly. Finally, we conduct extensive experiments on ACE2005 and large-scale data. Experimental results show that GlyFN obtains increases of 7.48(10.18%) and 6.17(8.7%) in the F1-score for trigger identification and classification over the state-of-the-art methods, respectively. Furthermore, the event detection task for large-scale unstructured text can be efficiently accomplished through distribution.


Author(s):  
Junjie Chen ◽  
William K. Cheung

Quantization has been widely adopted for large-scale multimedia retrieval due to its effectiveness of coding highdimensional data. Deep quantization models have been demonstrated to achieve the state-of-the-art retrieval accuracy. However, training the deep models given a large-scale database is highly time-consuming as a large amount of parameters are involved. Existing deep quantization methods often sample only a subset from the database for training, which may end up with unsatisfactory retrieval performance as a large portion of label information is discarded. To alleviate this problem, we propose a novel model called Similarity Preserving Deep Asymmetric Quantization (SPDAQ) which can directly learn the compact binary codes and quantization codebooks for all the items in the database efficiently. To do that, SPDAQ makes use of an image subset as well as the label information of all the database items so the image subset items and the database items are mapped to two different but correlated distributions, where the label similarity can be well preserved. An efficient optimization algorithm is proposed for the learning. Extensive experiments conducted on four widely-used benchmark datasets demonstrate the superiority of our proposed SPDAQ model.


2020 ◽  
Vol 34 (07) ◽  
pp. 11410-11417
Author(s):  
Wenjing Li ◽  
Zhongcheng Wu

This paper considers a novel problem, named One-View Learning (OVL), in human retrieval a.k.a. person re-identification (re-ID). Unlike fully-supervised learning, OVL only requires pretty cheap annotation cost: labeled training images are only provided from one camera view (source view/domain), while the annotations of training images from other camera views (target views/domains) are not available. OVL is a problem of multi-target open set domain adaptation that is difficult for existing domain adaptation methods to handle. This is because 1) unlabeled samples are drawn from multiple target views in different distributions, and 2) the target views may contain samples of “unknown identity” that are not shared by the source view. To address this problem, this work introduces a novel one-view learning framework for person re-ID. This is achieved by adversarial multi-view learning (AMVL) and adversarial unknown rejection learning (AURL). The former learns a multi-view discriminator by adversarial learning to align the feature distributions between all views. The later is designed to reject unknown samples from target views through adversarial learning with two unknown identity classifiers. Extensive experiments on three large-scale datasets demonstrate the advantage of the proposed method over state-of-the-art domain adaptation and semi-supervised methods.


Author(s):  
Zhongyang Li ◽  
Xiao Ding ◽  
Ting Liu

Recent advances, such as GPT, BERT, and RoBERTa, have shown success in incorporating a pre-trained transformer language model and fine-tuning operations to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story-ending prediction as the target task to conduct experiments. The final results of 96.0% and 95.0% accuracy on two versions of Story Cloze Test datasets dramatically outperform previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT. Furthermore, experiments on six English and three Chinese datasets show that TransBERT generalizes well to other tasks, languages, and pre-trained models.


Author(s):  
Liang Xie ◽  
Jialie Shen ◽  
Jungong Han ◽  
Lei Zhu ◽  
Ling Shao

Advanced hashing technique is essential to facilitate effective large scale online image organization and retrieval, where image contents could be frequently changed. Traditional multi-view hashing methods are developed based on batch-based learning, which leads to very expensive updating cost. Meanwhile, existing online hashing methods mainly focus on single-view data and thus can not achieve promising performance when searching real online images, which are multiple view based data. Further, both types of hashing methods can only produce hash code with fixed length. Consequently they suffer from limited capability to comprehensive characterization of streaming image data in the real world. In this paper, we propose dynamic multi-view hashing (DMVH), which can adaptively augment hash codes according to dynamic changes of image. Meanwhile, DMVH leverages online learning to generate hash codes. It can increase the code length when current code is not able to represent new images effectively. Moreover, to gain further improvement on overall performance, each view is assigned with a weight, which can be efficiently updated during the online learning process. In order to avoid the frequent updating of code length and view weights, an intelligent buffering scheme is also specifically designed to preserve significant data to maintain good effectiveness of DMVH. Experimental results on two real-world image datasets demonstrate superior performance of DWVH over several state-of-the-art hashing methods.


2020 ◽  
Vol 30 (01) ◽  
pp. 2050001
Author(s):  
Takumi Maruyama ◽  
Kazuhide Yamamoto

Inspired by machine translation task, recent text simplification approaches regard a task as a monolingual text-to-text generation, and neural machine translation models have significantly improved the performance of simplification tasks. Although such models require a large-scale parallel corpus, such corpora for text simplification are very few in number and smaller in size compared to machine translation task. Therefore, we have attempted to facilitate the training of simplification rewritings using pre-training from a large-scale monolingual corpus such as Wikipedia articles. In addition, we propose a translation language model to seamlessly conduct a fine-tuning of text simplification from the pre-training of the language model. The experimental results show that the translation language model substantially outperforms a state-of-the-art model under a low-resource setting. In addition, a pre-trained translation language model with only 3000 supervised examples can achieve a performance comparable to that of the state-of-the-art model using 30,000 supervised examples.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1038
Author(s):  
Shohel Sayeed ◽  
Pa Pa Min ◽  
Thian Song Ong

Background: Gait recognition is perceived as the most promising biometric approach for future decades especially because of its efficient applicability in surveillance systems. Due to recent growth in the use of gait biometrics across surveillance systems, the ability to rapidly search for the required data has become an emerging need. Therefore, we addressed the gait retrieval problem, which retrieves people with gaits similar to a query subject from a large-scale dataset. Methods: This paper presents the deep gait retrieval hashing (DGRH) model to address the gait retrieval problem for large-scale datasets. Our proposed method is based on a supervised hashing method with a deep convolutional network. We use the ability of the convolutional neural network (CNN) to capture the semantic gait features for feature representation and learn the compact hash codes with the compatible hash function. Therefore, our DGRH model combines gait feature learning with binary hash codes. In addition, the learning loss is designed with a classification loss function that learns to preserve similarity and a quantization loss function that controls the quality of the hash codes Results: The proposed method was evaluated against the CASIA-B, OUISIR-LP, and OUISIR-MVLP benchmark datasets and received the promising result for gait retrieval tasks. Conclusions: The end-to-end deep supervised hashing model is able to learn discriminative gait features and is efficient in terms of the storage memory and speed for gait retrieval.


Sign in / Sign up

Export Citation Format

Share Document