scholarly journals Multi-View Deep Attention Network for Reinforcement Learning (Student Abstract)

2020 ◽  
Vol 34 (10) ◽  
pp. 13811-13812
Author(s):  
Yueyue Hu ◽  
Shiliang Sun ◽  
Xin Xu ◽  
Jing Zhao

The representation approximated by a single deep network is usually limited for reinforcement learning agents. We propose a novel multi-view deep attention network (MvDAN), which introduces multi-view representation learning into the reinforcement learning task for the first time. The proposed model approximates a set of strategies from multiple representations and combines these strategies based on attention mechanisms to provide a comprehensive strategy for a single-agent. Experimental results on eight Atari video games show that the MvDAN has effective competitive performance than single-view reinforcement learning methods.

Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


Author(s):  
Matteo Hessel ◽  
Hubert Soyer ◽  
Lasse Espeholt ◽  
Wojciech Czarnecki ◽  
Simon Schmitt ◽  
...  

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.


Author(s):  
Michael Janner ◽  
Karthik Narasimhan ◽  
Regina Barzilay

The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.


2020 ◽  
Vol 34 (05) ◽  
pp. 7236-7243
Author(s):  
Heechang Ryu ◽  
Hayong Shin ◽  
Jinkyoo Park

Most previous studies on multi-agent reinforcement learning focus on deriving decentralized and cooperative policies to maximize a common reward and rarely consider the transferability of trained policies to new tasks. This prevents such policies from being applied to more complex multi-agent tasks. To resolve these limitations, we propose a model that conducts both representation learning for multiple agents using hierarchical graph attention network and policy learning using multi-agent actor-critic. The hierarchical graph attention network is specially designed to model the hierarchical relationships among multiple agents that either cooperate or compete with each other to derive more advanced strategic policies. Two attention networks, the inter-agent and inter-group attention layers, are used to effectively model individual and group level interactions, respectively. The two attention networks have been proven to facilitate the transfer of learned policies to new tasks with different agent compositions and allow one to interpret the learned strategies. Empirically, we demonstrate that the proposed model outperforms existing methods in several mixed cooperative and competitive tasks.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1363
Author(s):  
Nelson Vithayathil Varghese ◽  
Qusay H. Mahmoud

Driven by the recent technological advancements within the field of artificial intelligence research, deep learning has emerged as a promising representation learning technique across all of the machine learning classes, especially within the reinforcement learning arena. This new direction has given rise to the evolution of a new technological domain named deep reinforcement learning, which combines the representational learning power of deep learning with existing reinforcement learning methods. Undoubtedly, the inception of deep reinforcement learning has played a vital role in optimizing the performance of reinforcement learning-based intelligent agents with model-free based approaches. Although these methods could improve the performance of agents to a greater extent, they were mainly limited to systems that adopted reinforcement learning algorithms focused on learning a single task. At the same moment, the aforementioned approach was found to be relatively data-inefficient, particularly when reinforcement learning agents needed to interact with more complex and rich data environments. This is primarily due to the limited applicability of deep reinforcement learning algorithms to many scenarios across related tasks from the same environment. The objective of this paper is to survey the research challenges associated with multi-tasking within the deep reinforcement arena and present the state-of-the-art approaches by comparing and contrasting recent solutions, namely DISTRAL (DIStill & TRAnsfer Learning), IMPALA(Importance Weighted Actor-Learner Architecture) and PopArt that aim to address core challenges such as scalability, distraction dilemma, partial observability, catastrophic forgetting and negative knowledge transfer.


2019 ◽  
Author(s):  
Motofumi Sumiya ◽  
Kentaro Katahira

Surprise occurs because of differences between a decision outcome and its predicted outcome (prediction error), regardless of whether the error is positive or negative. It has recently been postulated that surprise affects the reward value of the action outcome itself; studies have indicated that increasing surprise, as absolute value of prediction error, decreases the value of the outcome. However, how surprise affects the value of the outcome and subsequent decision making is unclear. We suggested that, on the assumption that surprise decreases the outcome value, agents will increase their risk averse choices when an outcome is often surprisal. Here, we propose the surprise-sensitive utility model, a reinforcement learning model that states that surprise decreases the outcome value, to explain how surprise affects subsequent decision-making. To investigate the assumption, we compared this model with previous reinforcement learning models on a risky probabilistic learning task with simulation analysis, and model selection with two experimental datasets with different tasks and population. We further simulated a simple decision-making task to investigate how parameters within the proposed model modulate the choice preference. As a result, we found the proposed model explains the risk averse choices in a manner similar to the previous models, and risk averse choices increased as the surprise-based modulation parameter of outcome value increased. The model fits these datasets better than the other models, with same free parameters, thus providing a more parsimonious and robust account for risk averse choices. These findings indicate that surprise acts as a reducer of outcome value and decreases the action value for risky choices in which prediction error often occurs.


2021 ◽  
Vol 8 ◽  
Author(s):  
Pablo Barros ◽  
Anne C. Bloem ◽  
Inge M. Hootsmans ◽  
Lena M. Opheij ◽  
Romain H. A. Toebosch ◽  
...  

Reinforcement learning simulation environments pose an important experimental test bed and facilitate data collection for developing AI-based robot applications. Most of them, however, focus on single-agent tasks, which limits their application to the development of social agents. This study proposes the Chef’s Hat simulation environment, which implements a multi-agent competitive card game that is a complete reproduction of the homonymous board game, designed to provoke competitive strategies in humans and emotional responses. The game was shown to be ideal for developing personalized reinforcement learning, in an online learning closed-loop scenario, as its state representation is extremely dynamic and directly related to each of the opponent’s actions. To adapt current reinforcement learning agents to this scenario, we also developed the COmPetitive Prioritized Experience Replay (COPPER) algorithm. With the help of COPPER and the Chef’s Hat simulation environment, we evaluated the following: (1) 12 experimental learning agents, trained via four different regimens (self-play, play against a naive baseline, PER, or COPPER) with three algorithms based on different state-of-the-art learning paradigms (PPO, DQN, and ACER), and two “dummy” baseline agents that take random actions, (2) the performance difference between COPPER and PER agents trained using the PPO algorithm and playing against different agents (PPO, DQN, and ACER) or all DQN agents, and (3) human performance when playing against two different collections of agents. Our experiments demonstrate that COPPER helps agents learn to adapt to different types of opponents, improving the performance when compared to off-line learning models. An additional contribution of the study is the formalization of the Chef’s Hat competitive game and the implementation of the Chef’s Hat Player Club, a collection of trained and assessed agents as an enabler for embedding human competitive strategies in social continual and competitive reinforcement learning.


Author(s):  
Marco Boaretto ◽  
Gabriel Chaves Becchi ◽  
Luiza Scapinello Aquino ◽  
Aderson Cleber Pifer ◽  
Helon Vicente Hultmann Ayala ◽  
...  

2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


Sign in / Sign up

Export Citation Format

Share Document