scholarly journals Deep Recurrent Quantization for Generating Sequential Binary Codes

Author(s):  
Jingkuan Song ◽  
Xiaosu Zhu ◽  
Lianli Gao ◽  
Xin-Shun Xu ◽  
Wu Liu ◽  
...  

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.

2021 ◽  
Vol 2 (2) ◽  
pp. 1-18
Author(s):  
Hongchao Gao ◽  
Yujia Li ◽  
Jiao Dai ◽  
Xi Wang ◽  
Jizhong Han ◽  
...  

Recognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.


Author(s):  
Chandra Kusuma Dewa ◽  
Amanda Lailatul Fadhilah ◽  
A Afiahayati

Convolutional neural network (CNN) is state-of-the-art method in object recognition task. Specialized for spatial input data type, CNN has special convolutional and pooling layers which enable hierarchical feature learning from the input space. For offline handwritten character recognition problem such as classifying character in MNIST database, CNN shows better classification result than any other methods. By leveraging the advantages of CNN over character recognition task, in this paper we developed a software which utilizes digital image processing methods and CNN module for offline handwritten Javanese character recognition. The software performs image segmentation process using contour and Canny edge detection with OpenCV library over captured handwritten Javanese character image. CNN will classify the segmented image into 20 classes of Javanese letters. For evaluation purposes, we compared CNN to multilayer perceptron (MLP) on classification accuracy and training time. Experiment results show that CNN model testing accuracy outperforms MLP accuracy although CNN needs more training time than MLP.


Author(s):  
Junjie Chen ◽  
William K. Cheung

Quantization has been widely adopted for large-scale multimedia retrieval due to its effectiveness of coding highdimensional data. Deep quantization models have been demonstrated to achieve the state-of-the-art retrieval accuracy. However, training the deep models given a large-scale database is highly time-consuming as a large amount of parameters are involved. Existing deep quantization methods often sample only a subset from the database for training, which may end up with unsatisfactory retrieval performance as a large portion of label information is discarded. To alleviate this problem, we propose a novel model called Similarity Preserving Deep Asymmetric Quantization (SPDAQ) which can directly learn the compact binary codes and quantization codebooks for all the items in the database efficiently. To do that, SPDAQ makes use of an image subset as well as the label information of all the database items so the image subset items and the database items are mapped to two different but correlated distributions, where the label similarity can be well preserved. An efficient optimization algorithm is proposed for the learning. Extensive experiments conducted on four widely-used benchmark datasets demonstrate the superiority of our proposed SPDAQ model.


2020 ◽  
Vol 34 (01) ◽  
pp. 1046-1053
Author(s):  
Ti-Rong Wu ◽  
Ting-Han Wei ◽  
I-Chen Wu

AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source state-of-the-art AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47% against ELF OpenGo under the same circumstances.


2021 ◽  
Vol 25 (3) ◽  
pp. 669-685
Author(s):  
Xiaojun Qi ◽  
Xianhua Zeng ◽  
Shumin Wang ◽  
Yicai Xie ◽  
Liming Xu

Due to the emergence of the era of big data, cross-modal learning have been applied to many research fields. As an efficient retrieval method, hash learning is widely used frequently in many cross-modal retrieval scenarios. However, most of existing hashing methods use fixed-length hash codes, which increase the computational costs for large-size datasets. Furthermore, learning hash functions is an NP hard problem. To address these problems, we initially propose a novel method named Cross-modal Variable-length Hashing Based on Hierarchy (CVHH), which can learn the hash functions more accurately to improve retrieval performance, and also reduce the computational costs and training time. The main contributions of CVHH are: (1) We propose a variable-length hashing algorithm to improve the algorithm performance; (2) We apply the hierarchical architecture to effectively reduce the computational costs and training time. To validate the effectiveness of CVHH, our extensive experimental results show the superior performance compared with recent state-of-the-art cross-modal methods on three benchmark datasets, WIKI, NUS-WIDE and MIRFlickr.


Author(s):  
Ke Wang ◽  
Xin Geng

Label Distribution Learning (LDL) is a novel learning paradigm in machine learning, which assumes that an instance is labeled by a distribution over all labels, rather than labeled by a logic label or some logic labels. Thus, LDL can model the description degree of all possible labels to an instance. Although many LDL methods have been put forward to deal with different application tasks, most existing methods suffer from the scalability issue. In this paper, a scalable LDL framework named Binary Coding based Label Distribution Learning (BC-LDL) is proposed for large-scale LDL. The proposed framework includes two parts, i.e., binary coding and label distribution generation. In the binary coding part, the learning objective is to generate the optimal binary codes for the instances. We integrate the label distribution information of the instances into a binary coding procedure, leading to high-quality binary codes. In the label distribution generation part, given an instance, the k nearest training instances in the Hamming space are searched and the mean of the label distributions of all the neighboring instances is calculated as the predicted label distribution. Experiments on five benchmark datasets validate the superiority of BC-LDL over several state-of-the-art LDL methods.  


2018 ◽  
Author(s):  
Debanjan Mahata ◽  
John Kuriakose ◽  
Rajiv Ratn Shah ◽  
Roger Zimmermann ◽  
John R. Talburt

Keyword extraction is a fundamental task in naturallanguage processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using supervised and unsupervised approaches. In this paper, we present an unsupervised technique that uses a combination of theme-weighted personalized PageRank algorithm and neural phrase embeddings for extracting and ranking keywords. Wealso introduce an efficient way of processing text documents and training phrase embeddings using existing techniques. We share an evaluation dataset derived from an existing dataset that is used for choosing the underlying embedding model. The evaluations for ranked keyword extraction are performed on two benchmark datasets comprising of short abstracts (Inspec), and long scientific papers (SemEval 2010), and is shown to produce results better than the state-of-the-art systems.


Author(s):  
Xiao Liang ◽  
Xuewei Wang ◽  
Litong Lyu ◽  
Yanjun Han ◽  
Jinjin Zheng ◽  
...  

AbstractBlur detection is aimed to differentiate the blurry and sharp regions from a given image. This task has attracted much attention in recent years due to its importance in computer vision with the integration of image processing and artificial intelligence. However, blur detection still suffers from problems such as the oversensitivity to image noise and the difficulty in cost–benefit balance. To deal with these issues, we propose an accurate and efficient blur detection method, which is concise in architecture and robust against noise. First, we develop a sequency spectrum-based blur metric to estimate the blurriness of each pixel by integrating a re-blur scheme and the Walsh transform. Meanwhile, to eliminate the noise interference, we propose an adaptive sequency spectrum truncation strategy by which we can obtain an accurate blur map even in noise-polluted cases. Finally, a multi-scale fusion segmentation framework is designed to extract the blur region based on the clustering-guided region growth. Experimental results on benchmark datasets demonstrate that the proposed method achieves state-of-the-art performance and the best balance between cost and benefit. It offers an average F1 score of 0.887, MAE of 0.101, detecting time of 0.7 s, and training time of 0.5 s. Especially for noise-polluted blurry images, the proposed method achieves the F1 score of 0.887 and MAE of 0.101, which significantly surpasses other competitive approaches. Our method yields a cost–benefit advantage and noise immunity that has great application prospect in complex sensing environment.


Author(s):  
Shikha Bhardwaj ◽  
Gitanjali Pandove ◽  
Pawan Kumar Dahiya

Background: In order to retrieve a particular image from vast repository of images, an efficient system is required and such an eminent system is well-known by the name Content-based image retrieval (CBIR) system. Color is indeed an important attribute of an image and the proposed system consist of a hybrid color descriptor which is used for color feature extraction. Deep learning, has gained a prominent importance in the current era. So, the performance of this fusion based color descriptor is also analyzed in the presence of Deep learning classifiers. Method: This paper describes a comparative experimental analysis on various color descriptors and the best two are chosen to form an efficient color based hybrid system denoted as combined color moment-color autocorrelogram (Co-CMCAC). Then, to increase the retrieval accuracy of the hybrid system, a Cascade forward back propagation neural network (CFBPNN) is used. The classification accuracy obtained by using CFBPNN is also compared to Patternnet neural network. Results: The results of the hybrid color descriptor depict that the proposed system has superior results of the order of 95.4%, 88.2%, 84.4% and 96.05% on Corel-1K, Corel-5K, Corel-10K and Oxford flower benchmark datasets respectively as compared to many state-of-the-art related techniques. Conclusion: This paper depict an experimental and analytical analysis on different color feature descriptors namely, Color moment (CM), Color auto-correlogram (CAC), Color histogram (CH), Color coherence vector (CCV) and Dominant color descriptor (DCD). The proposed hybrid color descriptor (Co-CMCAC) is utilized for the withdrawal of color features with Cascade forward back propagation neural network (CFBPNN) is used as a classifier on four benchmark datasets namely Corel-1K, Corel-5K and Corel-10K and Oxford flower.


Sign in / Sign up

Export Citation Format

Share Document