2. Multimodal Retrieval between Vision and Language

The Journal of The Institute of Image Information and Television Engineers ◽

10.3169/itej.72.655 ◽

2018 ◽

Vol 72 (9) ◽

pp. 655-658

Author(s):

Masataka Yamaguchi

Keyword(s):

Multimodal Retrieval ◽

Vision And Language

Download Full-text

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm49941.2020.9313289 ◽

2020 ◽

Author(s):

Yikuan Li ◽

Hanyin Wang ◽

Yuan Luo

Keyword(s):

Medical Images ◽

Representation Learning ◽

Language Models ◽

Multimodal Representation ◽

Vision And Language

Download Full-text

Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation

Sensors ◽

10.3390/s21031012 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1012

Author(s):

Jisu Hwang ◽

Incheol Kim

Keyword(s):

Natural Language ◽

Language Processing ◽

Language Instruction ◽

Scoring Method ◽

Processing Technologies ◽

Backtracking Search ◽

Panoramic Images ◽

Benchmark Datasets ◽

Vision And Language

Due to the development of computer vision and natural language processing technologies in recent years, there has been a growing interest in multimodal intelligent tasks that require the ability to concurrently understand various forms of input data such as images and text. Vision-and-language navigation (VLN) require the alignment and grounding of multimodal input data to enable real-time perception of the task status on panoramic images and natural language instruction. This study proposes a novel deep neural network model (JMEBS), with joint multimodal embedding and backtracking search for VLN tasks. The proposed JMEBS model uses a transformer-based joint multimodal embedding module. JMEBS uses both multimodal context and temporal context. It also employs backtracking-enabled greedy local search (BGLS), a novel algorithm with a backtracking feature designed to improve the task success rate and optimize the navigation path, based on the local and global scores related to candidate actions. A novel global scoring method is also used for performance improvement by comparing the partial trajectories searched thus far with a plurality of natural language instructions. The performance of the proposed model on various operations was then experimentally demonstrated and compared with other models using the Matterport3D Simulator and room-to-room (R2R) benchmark datasets.

Download Full-text

A Novel Attention-based Aggregation Function to Combine Vision and Language

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9413269 ◽

2021 ◽

Author(s):

Matteo Stefanini ◽

Marcella Cornia ◽

Lorenzo Baraldi ◽

Rita Cucchiara

Keyword(s):

Aggregation Function ◽

Vision And Language

Download Full-text

Neural processing of vision and language in kindergarten is associated with prereading skills and predicts future literacy

Human Brain Mapping ◽

10.1002/hbm.25449 ◽

2021 ◽

Author(s):

Johanna Liebig ◽

Eva Froehlich ◽

Teresa Sylvester ◽

Mario Braun ◽

Hauke R. Heekeren ◽

...

Keyword(s):

Neural Processing ◽

Prereading Skills ◽

Vision And Language

Download Full-text

Vision and Language Navigation using Multi-head Attention Mechanism

2020 6th International Conference on Big Data and Information Analytics (BigDIA) ◽

10.1109/bigdia51454.2020.00020 ◽

2020 ◽

Author(s):

Sai Mao ◽

Junmin Wu ◽

Siqi Hong

Keyword(s):

Attention Mechanism ◽

Vision And Language

Download Full-text

Multimodal Retrieval using Mutual Information based Textual Query Reformulation

Expert Systems with Applications ◽

10.1016/j.eswa.2016.09.039 ◽

2017 ◽

Vol 68 ◽

pp. 81-92 ◽

Author(s):

Deepanwita Datta ◽

Shubham Varma ◽

Ravindranath Chowdary C. ◽

Sanjay K. Singh

Keyword(s):

Mutual Information ◽

Query Reformulation ◽

Multimodal Retrieval

Download Full-text

Multimodal retrieval with relevance feedback based on genetic programming

Multimedia Tools and Applications ◽

10.1007/s11042-012-1152-7 ◽

2012 ◽

Vol 69 (3) ◽

pp. 991-1019 ◽

Author(s):

Rodrigo Tripodi Calumby ◽

Ricardo da Silva Torres ◽

Marcos André Gonçalves

Keyword(s):

Genetic Programming ◽

Relevance Feedback ◽

Multimodal Retrieval

Download Full-text

Change, vision and language: the early works and Inferno Canto Two

Dante: The Divine Comedy ◽

10.1017/cbo9780511804731.003 ◽

2012 ◽

pp. 21-54

Author(s):

Robin Kirkpatrick

Keyword(s):

Vision And Language

Download Full-text

Interpretable Multimodal Retrieval for Fashion Products

2018 ACM Multimedia Conference on Multimedia Conference - MM '18 ◽

10.1145/3240508.3240646 ◽

2018 ◽

Author(s):

Lizi Liao ◽

Xiangnan He ◽

Bo Zhao ◽

Chong-Wah Ngo ◽

Tat-Seng Chua

Keyword(s):

Fashion Products ◽

Multimodal Retrieval

Download Full-text

Content Based Multimodal Retrieval for Databases of Indian Monuments

Communications in Computer and Information Science - Contemporary Computing ◽

10.1007/978-3-642-14834-7_42 ◽

2010 ◽

pp. 446-455

Author(s):

Aman Agarwal ◽

Vikas Saxena

Keyword(s):

Multimodal Retrieval

Download Full-text