Multi-Level Policy and Reward Reinforcement Learning for Image Captioning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/114 ◽

2018 ◽

Cited By ~ 7

Author(s):

Anan Liu ◽

Ning Xu ◽

Hanwang Zhang ◽

Weizhi Nie ◽

Yuting Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Policy Network ◽

Image Captioning ◽

Language Understanding ◽

Word Generation ◽

Reward Function ◽

Word Level ◽

Sentence Level ◽

Sequential Prediction ◽

Multi Level

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

Download Full-text

The synergy of double attention: Combine sentence-level and word-level attention for image captioning

Computer Vision and Image Understanding ◽

10.1016/j.cviu.2020.103068 ◽

2020 ◽

Vol 201 ◽

pp. 103068

Author(s):

Haiyang Wei ◽

Zhixin Li ◽

Canlong Zhang ◽

Huifang Ma

Keyword(s):

Image Captioning ◽

Word Level ◽

Sentence Level

Download Full-text

Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

IEEE Transactions on Multimedia ◽

10.1109/tmm.2019.2941820 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1372-1383 ◽

Cited By ~ 10

Author(s):

Ning Xu ◽

Hanwang Zhang ◽

An-An Liu ◽

Weizhi Nie ◽

Yuting Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Image Captioning ◽

Learning Framework ◽

Multi Level

Download Full-text

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/107 ◽

2020 ◽

Author(s):

Longteng Guo ◽

Jing Liu ◽

Xinxin Zhu ◽

Xingjian He ◽

Jie Jiang ◽

...

Keyword(s):

Learning System ◽

Autoregressive Models ◽

Target Sequence ◽

Image Captioning ◽

Word Level ◽

Sentence Level ◽

Agent Learning ◽

Speed Up ◽

Multi Agent

Most image captioning models are autoregressive, i.e. they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. Recently, non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel. Typically, these models use the word-level cross-entropy loss to optimize each word independently. However, such a learning process fails to consider the sentence-level consistency, thus resulting in inferior generation quality of these non-autoregressive models. In this paper, we propose a Non-Autoregressive Image Captioning (NAIC) model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates NAIC as a multi-agent reinforcement learning system where positions in the target sequence are viewed as agents that learn to cooperatively maximize a sentence-level reward. Besides, we propose to utilize massive unlabeled images to boost captioning performance. Extensive experiments on MSCOCO image captioning benchmark show that our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.

Download Full-text

Image Captioning Based On Sentence-Level And Word-Level Attention

2019 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2019.8852118 ◽

2019 ◽

Cited By ~ 3

Author(s):

Haiyang Wei ◽

Zhixin Li ◽

Canlong Zhang ◽

Tao Zhou ◽

Yu Quan

Keyword(s):

Image Captioning ◽

Word Level ◽

Sentence Level

Download Full-text

Narrative thinking lingers in spontaneous thought

10.31234/osf.io/gxzyj ◽

2021 ◽

Author(s):

Buddhika Bellana ◽

Abhijit Mahabal ◽

Christopher John Honey

Keyword(s):

Word Order ◽

Semantic Meaning ◽

Word Generation ◽

Mental Context ◽

Word Level ◽

Sentence Level ◽

Self Reports ◽

The World ◽

Spontaneous Thought ◽

Different Levels

What we think about at any moment is shaped by what preceded it. Why do some experiences, such as reading an immersive story, feel as if they linger in mind beyond their conclusion? In this study, we hypothesize that the stream of our thinking is especially affected by "deeper" forms of processing, emphasizing the meaning and implications of a stimulus rather than its immediate physical properties or low-level semantics (e.g., reading a story vs. reading disconnected words). To test this idea, we presented participants with short stories that preserved different levels of coherence (word-level, sentence-level, or intact narrative), and we measured participants’ self-reports of lingering and spontaneous word generation. Participants reported that stories lingered in their minds after reading, but this effect was greatly reduced when the same words were read with sentence or word-order randomly shuffled. Furthermore, the words that participants spontaneously generated after reading shared semantic meaning with the story’s central themes, particularly when the story was coherent (i.e., intact). Crucially, regardless of the objective coherence of what each participant read, lingering was strongest amongst participants who reported being ‘transported’ into the world of the story while reading. We further generalized this result to a non-narrative stimulus, finding that participants reported lingering after reading a list of words, especially when they had sought an underlying narrative or theme across words. We conclude that recent experiences are most likely to exert a lasting mental context when we seek to extract and represent their deep situation-level meaning.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Injecting Word Information with Multi-Level Word Adapter for Chinese Spoken Language Understanding

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413657 ◽

2021 ◽

Author(s):

Dechuan Teng ◽

Libo Qin ◽

Wanxiang Che ◽

Sendong Zhao ◽

Ting Liu

Keyword(s):

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding ◽

Multi Level

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text