scholarly journals Similarity Preserving Deep Asymmetric Quantization for Image Retrieval

Author(s):  
Junjie Chen ◽  
William K. Cheung

Quantization has been widely adopted for large-scale multimedia retrieval due to its effectiveness of coding highdimensional data. Deep quantization models have been demonstrated to achieve the state-of-the-art retrieval accuracy. However, training the deep models given a large-scale database is highly time-consuming as a large amount of parameters are involved. Existing deep quantization methods often sample only a subset from the database for training, which may end up with unsatisfactory retrieval performance as a large portion of label information is discarded. To alleviate this problem, we propose a novel model called Similarity Preserving Deep Asymmetric Quantization (SPDAQ) which can directly learn the compact binary codes and quantization codebooks for all the items in the database efficiently. To do that, SPDAQ makes use of an image subset as well as the label information of all the database items so the image subset items and the database items are mapped to two different but correlated distributions, where the label similarity can be well preserved. An efficient optimization algorithm is proposed for the learning. Extensive experiments conducted on four widely-used benchmark datasets demonstrate the superiority of our proposed SPDAQ model.

Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


2023 ◽  
Vol 55 (1) ◽  
pp. 1-39
Author(s):  
Thanh Tuan Nguyen ◽  
Thanh Phuong Nguyen

Representing dynamic textures (DTs) plays an important role in many real implementations in the computer vision community. Due to the turbulent and non-directional motions of DTs along with the negative impacts of different factors (e.g., environmental changes, noise, illumination, etc.), efficiently analyzing DTs has raised considerable challenges for the state-of-the-art approaches. For 20 years, many different techniques have been introduced to handle the above well-known issues for enhancing the performance. Those methods have shown valuable contributions, but the problems have been incompletely dealt with, particularly recognizing DTs on large-scale datasets. In this article, we present a comprehensive taxonomy of DT representation in order to purposefully give a thorough overview of the existing methods along with overall evaluations of their obtained performances. Accordingly, we arrange the methods into six canonical categories. Each of them is then taken in a brief presentation of its principal methodology stream and various related variants. The effectiveness levels of the state-of-the-art methods are then investigated and thoroughly discussed with respect to quantitative and qualitative evaluations in classifying DTs on benchmark datasets. Finally, we point out several potential applications and the remaining challenges that should be addressed in further directions. In comparison with two existing shallow DT surveys (i.e., the first one is out of date as it was made in 2005, while the newer one (published in 2016) is an inadequate overview), we believe that our proposed comprehensive taxonomy not only provides a better view of DT representation for the target readers but also stimulates future research activities.


Author(s):  
Chao Li ◽  
Cheng Deng ◽  
Lei Wang ◽  
De Xie ◽  
Xianglong Liu

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.


Author(s):  
Ning Li ◽  
Chao Li ◽  
Cheng Deng ◽  
Xianglong Liu ◽  
Xinbo Gao

Hashing has been widely deployed to large-scale image retrieval due to its low storage cost and fast query speed. Almost all deep hashing methods do not sufficiently discover semantic correlation from label information, which results in the learned hash codes less discriminative. In this paper, we propose a novel Deep Joint Semantic-Embedding Hashing (DSEH) approach that contains LabNet and ImgNet. Specifically, LabNet is explored to capture abundant semantic correlation between sample pairs and supervise ImgNet from semantic level and hash codes level, which is conductive to the generated hash codes being more discriminative and similarity-preserving. Extensive experiments on three benchmark datasets show that the proposed model outperforms the state-of-the-art methods.


2020 ◽  
Author(s):  
Xiongnan Jin ◽  
Yooyoung Lee ◽  
Jonathan Fiscus ◽  
Haiying Guan ◽  
Amy N. Yates ◽  
...  

Author(s):  
Jingkuan Song ◽  
Xiaosu Zhu ◽  
Lianli Gao ◽  
Xin-Shun Xu ◽  
Wu Liu ◽  
...  

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.


Author(s):  
Shiyang Yan ◽  
Jun Xu ◽  
Yuai Liu ◽  
Lin Xu

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.


2021 ◽  
Author(s):  
Yiu-ming Cheung ◽  
Zhikai Hu

<div><p>Unsupervised cross-modal retrieval has received increasing attention recently, because of the extreme difficulty of labeling the explosive multimedia data. The core challenge of it is how to measure the similarities between multi-modal data without label information. In previous works, various distance metrics are selected for measuring the similarities and predicting whether samples belong to the same class. However, these predictions are not always right. Unfortunately, even a few wrong predictions can undermine the final retrieval performance. To address this problem, in this paper, we categorize predictions as solid and soft ones based on their confidence. We further categorize samples as solid and soft ones based on the predictions. We propose that these two kinds of predictions and samples should be treated differently. Besides, we find that the absolute values of similarities can represent not only the similarity but also the confidence of the predictions. Thus, we first design an elegant dot product fusion strategy to obtain effective inter-modal similarities. Subsequently, utilizing these similarities, we propose a generalized and flexible weighted loss function where larger weights are assigned to solid samples to increase the retrieval performance, and smaller weights are assigned to soft samples to decrease the disturbance of wrong predictions. Despite less information is used, empirical studies show that the proposed approach achieves the state-of-the-art retrieval performance.</p><br></div>


Author(s):  
Rong-Cheng Tu ◽  
Xian-Ling Mao ◽  
Wei Wei

Most of the unsupervised hashing methods usually map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure as guiding information, i.e., treating each point similar to its k nearest neighbours. However, for an image, some of its k nearest neighbours may be dissimilar to it, i.e., they are noisy datapoints which will damage the retrieval performance. Thus, to tackle this problem, in this paper, we propose a novel deep unsupervised hashing method, called MLS3RDUH, which can reduce the noisy datapoints to further enhance retrieval performance. Specifically, the proposed method first defines a novel similarity matrix by utilising the intrinsic manifold structure in feature space and the cosine similarity of datapoints to reconstruct the local semantic similarity structure. Then a novel log-cosh hashing loss function is used to optimize the hashing network to generate compact hash codes by incorporating the defined similarity as guiding information. Extensive experiments on three public datasets show that the proposed method outperforms the state-of-the-art baselines.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3780 ◽  
Author(s):  
Mustansar Fiaz ◽  
Arif Mahmood ◽  
Ki Yeol Baek ◽  
Sehar Shahzad Farooq ◽  
Soon Ki Jung

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.


Sign in / Sign up

Export Citation Format

Share Document