scholarly journals Deep Hash with Improved Dual Attention for Image Retrieval

Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 285
Author(s):  
Wenjing Yang ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Yongming Li ◽  
Anyu Du

Recently, deep learning to hash has extensively been applied to image retrieval, due to its low storage cost and fast query speed. However, there is a defect of insufficiency and imbalance when existing hashing methods utilize the convolutional neural network (CNN) to extract image semantic features and the extracted features do not include contextual information and lack relevance among features. Furthermore, the process of the relaxation hash code can lead to an inevitable quantization error. In order to solve these problems, this paper proposes deep hash with improved dual attention for image retrieval (DHIDA), which chiefly has the following contents: (1) this paper introduces the improved dual attention mechanism (IDA) based on the ResNet18 pre-trained module to extract the feature information of the image, which consists of the position attention module and the channel attention module; (2) when calculating the spatial attention matrix and channel attention matrix, the average value and maximum value of the column of the feature map matrix are integrated in order to promote the feature representation ability and fully leverage the features of each position; and (3) to reduce quantization error, this study designs a new piecewise function to directly guide the discrete binary code. Experiments on CIFAR-10, NUS-WIDE and ImageNet-100 show that the DHIDA algorithm achieves better performance.

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Nouman Qadeer ◽  
Dongting Hu ◽  
Xiabi Liu ◽  
Shahzad Anwar ◽  
Malik Saad Sultan

In computer vision, image retrieval remained a significant problem and recent resurgent of image retrieval also relies on other postprocessing methods to improve the accuracy instead of solely relying on good feature representation. Our method addressed the shape retrieval of binary images. This paper proposes a new integration scheme to best utilize feature representation along with contextual information. For feature representation we used articulation invariant representation; dynamic programming is then utilized for better shape matching followed by manifold learning based postprocessing modified mutualkNN graph to further improve the similarity score. We conducted extensive experiments on widely used MPEG-7 database of shape images by so-called bulls-eye score with and without normalization of modified mutualkNN graph which clearly indicates the importance of normalization. Finally, our method demonstrated better results compared to other methods. We also computed the computational time with another graph transduction method which clearly shows that our method is computationally very fast. Furthermore, to show consistency of postprocessing method, we also performed experiments on challenging ORL and YALE face datasets and improved baseline results.


Perception ◽  
10.1068/p5192 ◽  
2005 ◽  
Vol 34 (9) ◽  
pp. 1117-1134 ◽  
Author(s):  
Claus-Christian Carbon ◽  
Helmut Leder

We investigated the early stages of face recognition and the role of featural and holistic face information. We exploited the fact that, on inversion, the alienating disorientation of the eyes and mouth in thatcherised faces is hardly detectable. This effect allows featural and holistic information to be dissociated and was used to test specific face-processing hypotheses. In inverted thatcherised faces, the cardinal features are already correctly oriented, whereas in undistorted faces, the whole Gestalt is coherent but all information is disoriented. Experiment 1 and experiment 3 revealed that, for inverted faces, featural information processing precedes holistic information. Moreover, the processing of contextual information is necessary to process local featural information within a short presentation time (26 ms). Furthermore, for upright faces, holistic information seems to be available faster than for inverted faces (experiment 2). These differences in processing inverted and upright faces presumably cause the differential importance of featural and holistic information for inverted and upright faces.


Content based image retrieval system retrieve the images according to the strong feature related to desire as color, texture and shape of an image. Although visual features cannot be completely determined by semantic features, but still semantic features can be integrate easily into mathematical formulas. This paper is focused on retrieval of images within a large image collection, based on color projection by applying segmentation and quantification on different color models and compared for good result. This method is applied on different categories of image set and evaluated its retrieval rate in different models


2021 ◽  
Vol 32 (4) ◽  
pp. 1-13
Author(s):  
Xia Feng ◽  
Zhiyi Hu ◽  
Caihua Liu ◽  
W. H. Ip ◽  
Huiying Chen

In recent years, deep learning has achieved remarkable results in the text-image retrieval task. However, only global image features are considered, and the vital local information is ignored. This results in a failure to match the text well. Considering that object-level image features can help the matching between text and image, this article proposes a text-image retrieval method that fuses salient image feature representation. Fusion of salient features at the object level can improve the understanding of image semantics and thus improve the performance of text-image retrieval. The experimental results show that the method proposed in the paper is comparable to the latest methods, and the recall rate of some retrieval results is better than the current work.


2021 ◽  
Author(s):  
Zhuangping Qi ◽  
Lei Liu ◽  
Huijie Liu ◽  
Li Li ◽  
Hua Gao

Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 909
Author(s):  
Shuo Li ◽  
Chiru Ge ◽  
Xiaodan Sui ◽  
Yuanjie Zheng ◽  
Weikuan Jia

Cup-to-disc ratio (CDR) is of great importance during assessing structural changes at the optic nerve head (ONH) and diagnosis of glaucoma. While most efforts have been put on acquiring the CDR number through CNN-based segmentation algorithms followed by the calculation of CDR, these methods usually only focus on the features in the convolution kernel, which is, after all, the operation of the local region, ignoring the contribution of rich global features (such as distant pixels) to the current features. In this paper, a new end-to-end channel and spatial attention regression deep learning network is proposed to deduces CDR number from the regression perspective and combine the self-attention mechanism with the regression network. Our network consists of four modules: the feature extraction module to extract deep features expressing the complicated pattern of optic disc (OD) and optic cup (OC), the attention module including the channel attention block (CAB) and the spatial attention block (SAB) to improve feature representation by aggregating long-range contextual information, the regression module to deduce CDR number directly, and the segmentation-auxiliary module to focus the model’s attention on the relevant features instead of the background region. Especially, the CAB selects relatively important feature maps in channel dimension, shifting the emphasis on the OD and OC region; meanwhile, the SAB learns the discriminative ability of feature representation at pixel level by capturing the relationship of intra-feature map. The experimental results of ORIGA dataset show that our method obtains absolute CDR error of 0.067 and the Pearson’s correlation coefficient of 0.694 in estimating CDR and our method has a great potential in predicting the CDR number.


2018 ◽  
Vol 7 (2.24) ◽  
pp. 159
Author(s):  
Durga Prasad K ◽  
Manjunathachari K ◽  
Giri Prasad M.N

This paper focus on Image retrieval using Sketch based image retrieval system. The low complexity model for image representation has given the sketch based image retrieval (SBIR) a optimal selection for next generation application in low resource environment. The SBIR approach uses the geometrical region representation to describe the feature and utilize for recognition. In the SBIR model, the features represented define the image. Towards the improvement of SBIR recognition performance, in this paper a new invariant modeling using “orientation feature transformed modeling” is proposed. The approach gives the enhancement of invariant property and retrieval performance improvement in transformed domain. The experimental results illustrate the significance of invariant orientation feature representation in SBIR over the conventional models.  


Information ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 280
Author(s):  
Shaoxiu Wang ◽  
Yonghua Zhu ◽  
Wenjing Gao ◽  
Meng Cao ◽  
Mengyao Li

The sentiment analysis of microblog text has always been a challenging research field due to the limited and complex contextual information. However, most of the existing sentiment analysis methods for microblogs focus on classifying the polarity of emotional keywords while ignoring the transition or progressive impact of words in different positions in the Chinese syntactic structure on global sentiment, as well as the utilization of emojis. To this end, we propose the emotion-semantic-enhanced bidirectional long short-term memory (BiLSTM) network with the multi-head attention mechanism model (EBILSTM-MH) for sentiment analysis. This model uses BiLSTM to learn feature representation of input texts, given the word embedding. Subsequently, the attention mechanism is used to assign the attentive weights of each words to the sentiment analysis based on the impact of emojis. The attentive weights can be combined with the output of the hidden layer to obtain the feature representation of posts. Finally, the sentiment polarity of microblog can be obtained through the dense connection layer. The experimental results show the feasibility of our proposed model on microblog sentiment analysis when compared with other baseline models.


Sign in / Sign up

Export Citation Format

Share Document