text retrieval
Recently Published Documents


TOTAL DOCUMENTS

773
(FIVE YEARS 95)

H-INDEX

42
(FIVE YEARS 3)

2022 ◽  
Vol 2022 ◽  
pp. 1-12
Author(s):  
Xiuye Yin ◽  
Liyong Chen

In view of the complexity of the multimodal environment and the existing shallow network structure that cannot achieve high-precision image and text retrieval, a cross-modal image and text retrieval method combining efficient feature extraction and interactive learning convolutional autoencoder (CAE) is proposed. First, the residual network convolution kernel is improved by incorporating two-dimensional principal component analysis (2DPCA) to extract image features and extracting text features through long short-term memory (LSTM) and word vectors to efficiently extract graphic features. Then, based on interactive learning CAE, cross-modal retrieval of images and text is realized. Among them, the image and text features are respectively input to the two input terminals of the dual-modal CAE, and the image-text relationship model is obtained through the interactive learning of the middle layer to realize the image-text retrieval. Finally, based on Flickr30K, MSCOCO, and Pascal VOC 2007 datasets, the proposed method is experimentally demonstrated. The results show that the proposed method can complete accurate image retrieval and text retrieval. Moreover, the mean average precision (MAP) has reached more than 0.3, the area of precision-recall rate (PR) curves are better than other comparison methods, and they are applicable.


2021 ◽  
Author(s):  
Zhixian Zeng ◽  
Jianjun Cao ◽  
Guoquan Jiang ◽  
Nianfeng Weng ◽  
Yuxin Xu ◽  
...  

2021 ◽  
Author(s):  
Jayaprakash Akula ◽  
Abhishek ◽  
Rishabh Dabral ◽  
Preethi Jyothi ◽  
Ganesh Ramakrishnan

2021 ◽  
Author(s):  
Hongying Liu ◽  
Ruyi Luo ◽  
Fanhua Shang ◽  
Mantang Niu ◽  
Yuanyuan Liu

2021 ◽  
Author(s):  
Sungkwon Choo ◽  
Seong Jong Ha ◽  
Joonsoo Lee

2021 ◽  
Author(s):  
Manh-Duy Nguyen ◽  
Binh T. Nguyen ◽  
Cathal Gurrin

Conventional approaches to image-text retrieval mainly focus on indexing visual objects appearing in pictures but ignore the interactions between these objects. Such objects occurrences and interactions are equivalently useful and important in this field as they are usually mentioned in the text. Scene graph presentation is a suitable method for the image-text matching challenge and obtained good results due to its ability to capture the inter-relationship information. Both images and text are represented in scene graph levels and formulate the retrieval challenge as a scene graph matching challenge. In this paper, we introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method by integrating an extra graph convolution network to capture the general information of a graph. Specifically, for a pair of scene graphs of an image and its caption, two separate models are used to learn the features of each graph’s nodes and edges. Then a Siamese-structure graph convolution model is employed to embed graphs into vector forms. We finally combine the graph-level and the vector-level to calculate the similarity of this image-text pair. The empirical experiments show that our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset. Our implementation code can be found at https://github.com/m2man/LGSGM.


2021 ◽  
Vol 58 (5) ◽  
pp. 102672
Author(s):  
Zhi Zheng ◽  
Kai Hui ◽  
Ben He ◽  
Xianpei Han ◽  
Le Sun ◽  
...  

2021 ◽  
Author(s):  
Jie Cao ◽  
Shengsheng Qian ◽  
Huaiwen Zhang ◽  
Quan Fang ◽  
Changsheng Xu

Sign in / Sign up

Export Citation Format

Share Document