Graph Convolutional Network Hashing for Cross-Modal Retrieval

Deep network based cross-modal retrieval has recently made significant progress. However, bridging modality gap to further enhance the retrieval accuracy still remains a crucial bottleneck. In this paper, we propose a Graph Convolutional Hashing (GCH) approach, which learns modality-unified binary codes via an affinity graph. An end-to-end deep architecture is constructed with three main components: a semantic encoder module, two feature encoding networks, and a graph convolutional network (GCN). We design a semantic encoder as a teacher module to guide the feature encoding process, a.k.a. student module, for semantic information exploiting. Furthermore, GCN is utilized to explore the inherent similarity structure among data points, which will help to generate discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.

Download Full-text

R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6884 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12063-12070

Author(s):

Chang Tang ◽

Xinwang Liu ◽

Xinzhong Zhu ◽

En Zhu ◽

Kun Sun ◽

...

Keyword(s):

Semantic Information ◽

Ground Truth ◽

Input Image ◽

Convolutional Network ◽

Multi Scale ◽

Blur Detection ◽

Defocus Blur ◽

Potential Applications ◽

Benchmark Datasets ◽

Background Clutter

Defocus blur detection aims to separate the in-focus and out-of-focus regions in an image. Although attracting more and more attention due to its remarkable potential applications, there are still several challenges for accurate defocus blur detection, such as the interference of background clutter, sensitivity to scales and missing boundary details of defocus blur regions. In order to address these issues, we propose a deep neural network which Recurrently Refines Multi-scale Residual Features (R2MRF) for defocus blur detection. We firstly extract multi-scale deep features by utilizing a fully convolutional network. For each layer, we design a novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) to more accurately detect blur regions from the input image. Considering that the features from bottom layers are able to capture rich low-level features for details preservation while the features from top layers are capable of characterizing the semantic information for locating blur regions, we aggregate the deep features from different layers to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch. Since the defocus degree is sensitive to image scales, we finally fuse the side output of each branch to obtain the final blur detection map. We evaluate the proposed network on two commonly used defocus blur detection benchmark datasets by comparing it with other 11 state-of-the-art methods. Extensive experimental results with ablation studies demonstrate that R2MRF consistently and significantly outperforms the competitors in terms of both efficiency and accuracy.

Download Full-text

CIMON: Towards High-quality Hash Codes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/125 ◽

2021 ◽

Author(s):

Xiao Luo ◽

Daqing Wu ◽

Zeyu Ma ◽

Chong Chen ◽

Minghua Deng ◽

...

Keyword(s):

Semantic Similarity ◽

Nearest Neighbor ◽

Feature Space ◽

Nearest Neighbor Search ◽

Point Pair ◽

Neighbor Search ◽

Wide Range ◽

Benchmark Datasets ◽

Similarity Structure ◽

Hash Codes

Recently, hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information, i.e., treating each point pair similar if their distance is small in feature space. However, due to the inefficient representation ability of the pre-trained model, many false positives and negatives in local semantic similarity will be introduced and lead to error propagation during the hash code learning. Moreover, few of the methods consider the robustness of models, which will cause instability of hash codes to disturbance. In this paper, we propose a new method named Comprehensive sImilarity Mining and cOnsistency learNing (CIMON). First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods in both retrieval performance and robustness.

Download Full-text

Slanted Stixels: A Way to Represent Steep Streets

International Journal of Computer Vision ◽

10.1007/s11263-019-01226-9 ◽

2019 ◽

Vol 127 (11-12) ◽

pp. 1643-1658

Author(s):

Daniel Hernandez-Juarez ◽

Lukas Schneider ◽

Pau Cebrian ◽

Antonio Espinosa ◽

David Vazquez ◽

...

Keyword(s):

Semantic Information ◽

Approximation Scheme ◽

Geometric Accuracy ◽

Disparity Map ◽

Convolutional Network ◽

Depth Cues ◽

Segmentation Strategy ◽

Local Extrema ◽

Scene Representation ◽

Benchmark Datasets

Abstract This work presents and evaluates a novel compact scene representation based on Stixels that infers geometric and semantic information. Our approach overcomes the previous rather restrictive geometric assumptions for Stixels by introducing a novel depth model to account for non-flat roads and slanted objects. Both semantic and depth cues are used jointly to infer the scene representation in a sound global energy minimization formulation. Furthermore, a novel approximation scheme is introduced in order to significantly reduce the computational complexity of the Stixel algorithm, and then achieve real-time computation capabilities. The idea is to first perform an over-segmentation of the image, discarding the unlikely Stixel cuts, and apply the algorithm only on the remaining Stixel cuts. This work presents a novel over-segmentation strategy based on a fully convolutional network, which outperforms an approach based on using local extrema of the disparity map. We evaluate the proposed methods in terms of semantic and geometric accuracy as well as run-time on four publicly available benchmark datasets. Our approach maintains accuracy on flat road scene datasets while improving substantially on a novel non-flat road dataset.

Download Full-text

Discrete Semantics-Guided Asymmetric Hashing for Large-Scale Multimedia Retrieval

Applied Sciences ◽

10.3390/app11188769 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8769

Author(s):

Jun Long ◽

Longzhi Sun ◽

Liujie Hua ◽

Zhan Yang

Keyword(s):

Code Generation ◽

Hash Function ◽

Large Scale ◽

Semantic Information ◽

Multimedia Data ◽

Multimedia Retrieval ◽

Coordinate Descent Algorithm ◽

Hash Code ◽

Benchmark Datasets ◽

Hash Codes

Cross-modal hashing technology is a key technology for real-time retrieval of large-scale multimedia data in real-world applications. Although the existing cross-modal hashing methods have achieved impressive accomplishment, there are still some limitations: (1) some cross-modal hashing methods do not make full consider the rich semantic information and noise information in labels, resulting in a large semantic gap, and (2) some cross-modal hashing methods adopt the relaxation-based or discrete cyclic coordinate descent algorithm to solve the discrete constraint problem, resulting in a large quantization error or time consumption. Therefore, in order to solve these limitations, in this paper, we propose a novel method, named Discrete Semantics-Guided Asymmetric Hashing (DSAH). Specifically, our proposed DSAH leverages both label information and similarity matrix to enhance the semantic information of the learned hash codes, and the ℓ2,1 norm is used to increase the sparsity of matrix to solve the problem of the inevitable noise and subjective factors in labels. Meanwhile, an asymmetric hash learning scheme is proposed to efficiently perform hash learning. In addition, a discrete optimization algorithm is proposed to fast solve the hash code directly and discretely. During the optimization process, the hash code learning and the hash function learning interact, i.e., the learned hash codes can guide the learning process of the hash function and the hash function can also guide the hash code generation simultaneously. Extensive experiments performed on two benchmark datasets highlight the superiority of DSAH over several state-of-the-art methods.

Download Full-text

Self-Attention and Adversary Guided Hashing Network for Cross-Modal Retrieval

10.20944/preprints202009.0416.v1 ◽

2020 ◽

Author(s):

Shubai Chen ◽

Li Wang ◽

Song Wu

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Local Minima ◽

Adversarial Learning ◽

High Ranking ◽

Benchmark Datasets ◽

Semantic Relevance ◽

Triplet Loss ◽

Query Efficiency ◽

Hash Codes

Recently deep cross-modal hashing networks have received increasing interests due to its superior query efficiency and low storage cost. However, most of existing methods concentrate less on hash representations learning part, which means the semantic information of data cannot be fully used. Furthermore, they may neglect the high-ranking relevance and consistency of hash codes. To solve these problems, we propose a Self-Attention and Adversary Guided Hashing Network (SAAGHN). Specifically, it employs self-attention mechanism in hash representations learning part to extract rich semantic relevance information. Meanwhile, in order to keep invariability of hash codes, adversarial learning is adopted in the hash codes learning part. In addition, to generate higher-ranking hash codes and avoid local minima early, a new batch semi-hard cosine triplet loss and a cosine quantization loss are proposed. Extensive experiments on two benchmark datasets have shown that SAAGHN outperforms other baselines and achieves the state-of-the-art performance.

Download Full-text

An Anatomy of a Hybrid Color Descriptor with a Neural Network Model to Enhance the Retrieval Accuracy of an Image Retrieval System

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191122113801 ◽

2019 ◽

Vol 13 ◽

Author(s):

Shikha Bhardwaj ◽

Gitanjali Pandove ◽

Pawan Kumar Dahiya

Keyword(s):

Neural Network ◽

Deep Learning ◽

Image Retrieval ◽

Hybrid System ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Retrieval Accuracy ◽

Color Descriptor ◽

Benchmark Datasets ◽

Color Moment

Background: In order to retrieve a particular image from vast repository of images, an efficient system is required and such an eminent system is well-known by the name Content-based image retrieval (CBIR) system. Color is indeed an important attribute of an image and the proposed system consist of a hybrid color descriptor which is used for color feature extraction. Deep learning, has gained a prominent importance in the current era. So, the performance of this fusion based color descriptor is also analyzed in the presence of Deep learning classifiers. Method: This paper describes a comparative experimental analysis on various color descriptors and the best two are chosen to form an efficient color based hybrid system denoted as combined color moment-color autocorrelogram (Co-CMCAC). Then, to increase the retrieval accuracy of the hybrid system, a Cascade forward back propagation neural network (CFBPNN) is used. The classification accuracy obtained by using CFBPNN is also compared to Patternnet neural network. Results: The results of the hybrid color descriptor depict that the proposed system has superior results of the order of 95.4%, 88.2%, 84.4% and 96.05% on Corel-1K, Corel-5K, Corel-10K and Oxford flower benchmark datasets respectively as compared to many state-of-the-art related techniques. Conclusion: This paper depict an experimental and analytical analysis on different color feature descriptors namely, Color moment (CM), Color auto-correlogram (CAC), Color histogram (CH), Color coherence vector (CCV) and Dominant color descriptor (DCD). The proposed hybrid color descriptor (Co-CMCAC) is utilized for the withdrawal of color features with Cascade forward back propagation neural network (CFBPNN) is used as a classifier on four benchmark datasets namely Corel-1K, Corel-5K and Corel-10K and Oxford flower.

Download Full-text

SGAN4AbSum: A Semantic-Enhanced Generative Adversarial Network for Abstractive Text Summarization

10.21203/rs.3.rs-648146/v1 ◽

2021 ◽

Author(s):

Tham Vo

Keyword(s):

Ground Truth ◽

Text Summarization ◽

Generative Adversarial Network ◽

Convolutional Network ◽

Training Strategy ◽

Adversarial Network ◽

Deep Recurrent Neural Network ◽

Benchmark Datasets ◽

Latent Representations ◽

Abstractive Summarization

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.

Download Full-text

Multiple Saliency and Channel Sensitivity Network for Aggregated Convolutional Feature

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019013 ◽

2019 ◽

Vol 33 ◽

pp. 9013-9020

Author(s):

Xuanlu Xiang ◽

Zhipeng Wang ◽

Zhicheng Zhao ◽

Fei Su

Keyword(s):

State Of The Art ◽

Image Representation ◽

Training Data ◽

Gram Matrix ◽

Redundant Information ◽

Deep Architecture ◽

Benchmark Datasets ◽

Supervised Methods ◽

Ranking Loss ◽

Effective Channel

In this paper, aiming at two key problems of instance-level image retrieval, i.e., the distinctiveness of image representation and the generalization ability of the model, we propose a novel deep architecture - Multiple Saliency and Channel Sensitivity Network(MSCNet). Specifically, to obtain distinctive global descriptors, an attention-based multiple saliency learning is first presented to highlight important details of the image, and then a simple but effective channel sensitivity module based on Gram matrix is designed to boost the channel discrimination and suppress redundant information. Additionally, in contrast to most existing feature aggregation methods, employing pre-trained deep networks, MSCNet can be trained in two modes: the first one is an unsupervised manner with an instance loss, and another is a supervised manner, which combines classification and ranking loss and only relies on very limited training data. Experimental results on several public benchmark datasets, i.e., Oxford buildings, Paris buildings and Holidays, indicate that the proposed MSCNet outperforms the state-of-the-art unsupervised and supervised methods.

Download Full-text

Lasagne: A Multi-Layer Graph Convolutional Network Framework via Node-aware Deep Architecture

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3103984 ◽

2021 ◽

pp. 1-1

Author(s):

Xupeng Miao ◽

Wentao Zhang ◽

Yingxia Shao ◽

Bin Cui ◽

Lei Chen ◽

...

Keyword(s):

Convolutional Network ◽

Deep Architecture

Download Full-text

Residual Invertible Spatio-Temporal Network for Video Super-Resolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015981 ◽

2019 ◽

Vol 33 ◽

pp. 5981-5988 ◽

Cited By ~ 12

Author(s):

Xiaobin Zhu ◽

Zhuangzi Li ◽

Xiao-Yu Zhang ◽

Changsheng Li ◽

Yaqi Liu ◽

...

Keyword(s):

Spatial Information ◽

Super Resolution ◽

Temporal Consistency ◽

Temporal Network ◽

Convolutional Network ◽

Feature Representations ◽

Video Frames ◽

Temporal Features ◽

Benchmark Datasets ◽

Spatio Temporal

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.

Download Full-text