scholarly journals Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Author(s):  
Qiuyuan Huang ◽  
Zhe Gan ◽  
Asli Celikyilmaz ◽  
Dapeng Wu ◽  
Jianfeng Wang ◽  
...  

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


2020 ◽  
Vol 34 (09) ◽  
pp. 13659-13662
Author(s):  
Giuseppe De Giacomo ◽  
Luca Iocchi ◽  
Marco Favorito ◽  
Fabio Patrizi

In this work we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces f/f over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.1


2020 ◽  
Vol 34 (05) ◽  
pp. 9185-9192
Author(s):  
Ruize Wang ◽  
Zhongyu Wei ◽  
Piji Li ◽  
Qi Zhang ◽  
Xuanjing Huang

Visual storytelling aims at generating a story from an image stream. Most existing methods tend to represent images directly with the extracted high-level features, which is not intuitive and difficult to interpret. We argue that translating each image into a graph-based semantic representation, i.e., scene graph, which explicitly encodes the objects and relationships detected within image, would benefit representing and describing images. To this end, we propose a novel graph-based architecture for visual storytelling by modeling the two-level relationships on scene graphs. In particular, on the within-image level, we employ a Graph Convolution Network (GCN) to enrich local fine-grained region representations of objects on scene graphs. To further model the interaction among images, on the cross-images level, a Temporal Convolution Network (TCN) is utilized to refine the region representations along the temporal dimension. Then the relation-aware representations are fed into the Gated Recurrent Unit (GRU) with attention mechanism for story generation. Experiments are conducted on the public visual storytelling dataset. Automatic and human evaluation results indicate that our method achieves state-of-the-art.


2020 ◽  
pp. 174569162095069
Author(s):  
Yaacov Trope ◽  
Alison Ledgerwood ◽  
Nira Liberman ◽  
Kentaro Fujita

Adaptive functioning requires the ability to both immerse oneself in the here and now as well as to move beyond current experience. We leverage and expand construal-level theory to understand how individuals and groups regulate thoughts, feelings, and behavior to address both proximal and distal ends. To connect to distant versus proximal events in a way that meaningfully informs and guides responses in the immediate here and now, people must expand versus contract their regulatory scope. We propose that humans have evolved a number of mental and social tools that enable the modulation of regulatory scope and address the epistemic, emotive, and executive demands of regulation. Critically, across these tools, it is possible to distinguish a hierarchy that varies in abstractness. Whereas low-level tools enable contractive scope, high-level tools enable expansion. We review empirical results that support these assertions and highlight the novel insights that a regulatory-scope framework provides for understanding diverse phenomena.


Symmetry ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1335
Author(s):  
Menghao Wu ◽  
Yanbin Gao ◽  
Pengfei Wang ◽  
Fan Zhang ◽  
Zhejun Liu

In robotics, obstacle avoidance is an essential ability for distance sensor-based robots. This type of robot has axisymmetrically distributed distance sensors to acquire obstacle distance, so the state is symmetrical. Training the control policy with a reinforcement learning method is a trend. Considering the complexity of environments, such as narrow paths and right-angle turns, robots will have a better ability if the control policy can control the steering direction and speed simultaneously. This paper proposes the multi-dimensional action control (MDAC) approach based on a reinforcement learning technique, which can be used in multiple continuous action space tasks. It adopts a hierarchical structure, which has high and low-level modules. Low-level policies output concrete actions and the high-level policy determines when to invoke low-level modules according to the environment’s features. We design robot navigation experiments with continuous action spaces to test the method’s performance. It is an end-to-end approach and can solve complex obstacle avoidance tasks in navigation.


2021 ◽  
Vol 11 (22) ◽  
pp. 10595
Author(s):  
Wenlong Zhao ◽  
Zhijun Meng ◽  
Kaipeng Wang ◽  
Jiahui Zhang ◽  
Shaoze Lu

Active tracking control is essential for UAVs to perform autonomous operations in GPS-denied environments. In the active tracking task, UAVs take high-dimensional raw images as input and execute motor actions to actively follow the dynamic target. Most research focuses on three-stage methods, which entail perception first, followed by high-level decision-making based on extracted spatial information of the dynamic target, and then UAV movement control, using a low-level dynamic controller. Perception methods based on deep neural networks are powerful but require considerable effort for manual ground truth labeling. Instead, we unify the perception and decision-making stages using a high-level controller and then leverage deep reinforcement learning to learn the mapping from raw images to the high-level action commands in the V-REP-based environment, where simulation data are infinite and inexpensive. This end-to-end method also has the advantages of a small parameter size and reduced effort requirements for parameter turning in the decision-making stage. The high-level controller, which has a novel architecture, explicitly encodes the spatial and temporal features of the dynamic target. Auxiliary segmentation and motion-in-depth losses are introduced to generate denser training signals for the high-level controller’s fast and stable training. The high-level controller and a conventional low-level PID controller constitute our hierarchical active tracking control framework for the UAVs’ active tracking task. Simulation experiments show that our controller trained with several augmentation techniques sufficiently generalizes dynamic targets with random appearances and velocities, and achieves significantly better performance, compared with three-stage methods.


Author(s):  
Rundong Wang ◽  
Runsheng Yu ◽  
Bo An ◽  
Zinovi Rabinovich

Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies to stabilize the high-level policy training. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we furthermore stabilize the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.


Sign in / Sign up

Export Citation Format

Share Document