The Validity of the Handicap Principle in Discrete Action–Response Games

1999 ◽  
Vol 198 (4) ◽  
pp. 593-602 ◽  
Author(s):  
Szabolcs Számadó
2021 ◽  
Vol 126 (20) ◽  
Author(s):  
Zhengqian Cheng ◽  
Chris A. Marianetti

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tyler J. Adkins ◽  
Bradley S. Gary ◽  
Taraz G. Lee

AbstractIncentives can be used to increase motivation, leading to better learning and performance on skilled motor tasks. Prior work has shown that monetary punishments enhance on-line performance while equivalent monetary rewards enhance off-line skill retention. However, a large body of literature on loss aversion has shown that losses are treated as larger than equivalent gains. The divergence between the effects of punishments and reward on motor learning could be due to perceived differences in incentive value rather than valence per se. We test this hypothesis by manipulating incentive value and valence while participants trained to perform motor sequences. Consistent with our hypothesis, we found that large reward enhanced on-line performance but impaired the ability to retain the level of performance achieved during training. However, we also found that on-line performance was better with reward than punishment and that the effect of increasing incentive value was more linear with reward (small, medium, large) while the effect of value was more binary with punishment (large vs not large). These results suggest that there are differential effects of punishment and reward on motor learning and that these effects of valence are unlikely to be driven by differences in the subjective magnitude of gains and losses.


2021 ◽  
pp. 1-10
Author(s):  
Wei Zhou ◽  
Xing Jiang ◽  
Bingli Guo (Member, IEEE) ◽  
Lingyu Meng

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).


Nature ◽  
1976 ◽  
Vol 261 (5557) ◽  
pp. 192-192
Author(s):  
John Krebs

Author(s):  
Mohammadamin Barekatain ◽  
Ryo Yonetani ◽  
Masashi Hamaya

Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.


Sign in / Sign up

Export Citation Format

Share Document