Super Visual Semantic Embedding for Cross-Modal Image-Text Retrieval

Mapping Intimacies ◽

10.1145/3487075.3487167 ◽

2021 ◽

Author(s):

Zhixian Zeng ◽

Jianjun Cao ◽

Guoquan Jiang ◽

Nianfeng Weng ◽

Yuxin Xu ◽

...

Keyword(s):

Text Retrieval ◽

Semantic Embedding

Download Full-text

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval

Journal of Imaging ◽

10.3390/jimaging7080125 ◽

2021 ◽

Vol 7 (8) ◽

pp. 125

Author(s):

Yan Gong ◽

Georgina Cosma ◽

Hui Fang

Keyword(s):

Information Retrieval ◽

Question Answering ◽

Text Retrieval ◽

Future Research ◽

Image Captioning ◽

Image Objects ◽

Semantic Embedding ◽

Average Recall ◽

Text Information ◽

Basic Functions

Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed that the pre-trained network, UNITER, achieved 61.5% on average Recall@5 for the task of retrieving all relevant descriptions. The traditional networks, VSRN, SCAN, and VSE++, achieved 50.3%, 47.1%, and 29.4% on average Recall@5, respectively, for the same task. An additional analysis was performed on image–text pairs from the top 25 worst-performing classes using a subset of the Flickr30K-based dataset to identify the limitations of the performance of the best-performing models, VSRN and UNITER. These limitations are discussed from the perspective of image scenes, image objects, image semantics, and basic functions of neural networks. This paper discusses the strengths and limitations of VSE networks to guide further research into the topic of using VSE networks for cross-modal information retrieval tasks.

Download Full-text

A Ranking model of proximal and structural text retrieval based on region algebra

Proceedings of the conference on SIGGRAPH 2004 course notes - GRAPH '04 ◽

10.3115/1075178.1075185 ◽

2003 ◽

Author(s):

Katsuya Masuda

Keyword(s):

Text Retrieval ◽

Download Full-text

Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model

10.3115/v1/d14-1015 ◽

2014 ◽

Author(s):

Haiyang Wu ◽

Daxiang Dong ◽

Xiaoguang Hu ◽

Dianhai Yu ◽

Wei He ◽

...

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Context Sensitive ◽

Semantic Embedding

Download Full-text

Learning Binary Semantic Embedding for Breast Histology Image Classification and Retrieval

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9415036 ◽

2021 ◽

Author(s):

Xiao Kang ◽

Xingbo Liu ◽

Xiushan Nie ◽

Yilong Yin

Keyword(s):

Image Classification ◽

Semantic Embedding

Download Full-text

Learning Controlled Semantic Embedding for Cross-Modal Retrieval

2021 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme51207.2021.9428280 ◽

2021 ◽

Author(s):

Rong Yang ◽

Min Meng ◽

Jun Yu ◽

Jigang Wu

Keyword(s):

Semantic Embedding

Download Full-text

Deep Multi-view Document Clustering with Enhanced Semantic Embedding

Information Sciences ◽

10.1016/j.ins.2021.02.027 ◽

2021 ◽

Author(s):

Ruina Bai ◽

Ruizhang Huang ◽

Yanping Chen ◽

Yongbin Qin

Keyword(s):

Document Clustering ◽

Semantic Embedding

Download Full-text

Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463031 ◽

2021 ◽

Author(s):

Yi He ◽

Xin Liu ◽

Yiu-Ming Cheung ◽

Shu-Juan Peng ◽

Jinhan Yi ◽

...

Keyword(s):

Text Retrieval ◽

Fine Grained ◽

Correlation Learning

Download Full-text

Semantic-Preserving Metric Learning for Video-Text Retrieval

10.1109/icip42928.2021.9506697 ◽

2021 ◽

Author(s):

Sungkwon Choo ◽

Seong Jong Ha ◽

Joonsoo Lee

Keyword(s):

Metric Learning ◽

Download Full-text

A combined syntactic-semantic embedding model based on lexicalized tree-adjoining grammar

Computer Speech & Language ◽

10.1016/j.csl.2021.101202 ◽

2021 ◽

Vol 68 ◽

pp. 101202

Author(s):

Hoang-Vu Dang ◽

Phuong Le-Hong

Keyword(s):

Semantic Embedding ◽

Model Based ◽

Tree Adjoining Grammar

Download Full-text

A Deep Semantic Alignment Network for Cross-Modal Image-Text Retrieval in Remote Sensing

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2021.3070872 ◽

2021 ◽

pp. 1-1

Author(s):

Qimin Cheng ◽

Yuzhuo Zhou ◽

Peng Fu ◽

Yuan Xu ◽

Liang Zhang

Keyword(s):

Remote Sensing ◽

Text Retrieval ◽

Semantic Alignment

Download Full-text