Multimodal attention networks for low-level vision-and-language navigation

Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space . To tackle this problem, we propose DiMBERT (short for Di sentangled M ultimodal-Attention BERT ), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image–sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Di sentangled M ultimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts.

Download Full-text

Diagnosing the Environment Bias in Vision-and-Language Navigation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/124 ◽

2020 ◽

Author(s):

Yubo Zhang ◽

Hao Tan ◽

Mohit Bansal

Keyword(s):

Visual Information ◽

Semantic Features ◽

Visual Appearance ◽

Agent Model ◽

Low Level ◽

Significant Performance ◽

Previous State ◽

Testing Environments ◽

Feature Replacement ◽

Vision And Language

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. These step-by-step navigational instructions are crucial when the agent is navigating new environments about which it has no prior knowledge. Most recent works that study VLN observe a significant performance drop when tested on unseen environments (i.e., environments not used in training), indicating that the neural agent models are highly biased towards training environments. Although this issue is considered as one of the major challenges in VLN research, it is still under-studied and needs a clearer explanation. In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias. We observe that neither the language nor the underlying navigational graph, but the low-level visual appearance conveyed by ResNet features directly affects the agent model and contributes to this environment bias in results. According to this observation, we explore several kinds of semantic representations that contain less low-level visual information, hence the agent learned with these features could be better generalized to unseen testing environments. Without modifying the baseline agent model and its training method, our explored semantic features significantly decrease the performance gaps between seen and unseen on multiple datasets (i.e. R2R, R4R, and CVDN) and achieve competitive unseen results to previous state-of-the-art models.

Download Full-text

Tryptophan-Niacin Metabolism in Rat with Puromycin Aminonucleoside-Induced Nephrosis

International Journal for Vitamin and Nutrition Research ◽

10.1024/0300-9831.76.1.28 ◽

2006 ◽

Vol 76 (1) ◽

pp. 28-33 ◽

Cited By ~ 7

Author(s):

Yukari Egashira ◽

Shin Nagaki ◽

Hiroo Sanada

Keyword(s):

Body Weight ◽

Urinary Excretion ◽

Enzyme Activities ◽

Treated Group ◽

Control Group ◽

Puromycin Aminonucleoside ◽

Low Level ◽

Key Enzyme

We investigated the change of tryptophan-niacin metabolism in rats with puromycin aminonucleoside PAN-induced nephrosis, the mechanisms responsible for their change of urinary excretion of nicotinamide and its metabolites, and the role of the kidney in tryptophan-niacin conversion. PAN-treated rats were intraperitoneally injected once with a 1.0% (w/v) solution of PAN at a dose of 100 mg/kg body weight. The collection of 24-hour urine was conducted 8 days after PAN injection. Daily urinary excretion of nicotinamide and its metabolites, liver and blood NAD, and key enzyme activities of tryptophan-niacin metabolism were determined. In PAN-treated rats, the sum of urinary excretion of nicotinamide and its metabolites was significantly lower compared with controls. The kidneyα-amino-β-carboxymuconate-ε-semialdehyde decarboxylase (ACMSD) activity in the PAN-treated group was significantly decreased by 50%, compared with the control group. Although kidney ACMSD activity was reduced, the conversion of tryptophan to niacin tended to be lower in the PAN-treated rats. A decrease in urinary excretion of niacin and the conversion of tryptophan to niacin in nephrotic rats may contribute to a low level of blood tryptophan. The role of kidney ACMSD activity may be minimal concerning tryptophan-niacin conversion under this experimental condition.

Download Full-text