The Validity of the Handicap Principle in Discrete Action–Response Games

AbstractIncentives can be used to increase motivation, leading to better learning and performance on skilled motor tasks. Prior work has shown that monetary punishments enhance on-line performance while equivalent monetary rewards enhance off-line skill retention. However, a large body of literature on loss aversion has shown that losses are treated as larger than equivalent gains. The divergence between the effects of punishments and reward on motor learning could be due to perceived differences in incentive value rather than valence per se. We test this hypothesis by manipulating incentive value and valence while participants trained to perform motor sequences. Consistent with our hypothesis, we found that large reward enhanced on-line performance but impaired the ability to retain the level of performance achieved during training. However, we also found that on-line performance was better with reward than punishment and that the effect of increasing incentive value was more linear with reward (small, medium, large) while the effect of value was more binary with punishment (large vs not large). These results suggest that there are differential effects of punishment and reward on motor learning and that these effects of valence are unlikely to be driven by differences in the subjective magnitude of gains and losses.

Download Full-text

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211787 ◽

2021 ◽

pp. 1-10

Author(s):

Wei Zhou ◽

Xing Jiang ◽

Bingli Guo (Member, IEEE) ◽

Lingyu Meng

Keyword(s):

Software Defined Network ◽

Training Time ◽

Good Convergence ◽

Reward Function ◽

Routing Optimization ◽

Policy Gradient ◽

Discrete Action ◽

Policy Optimization ◽

Optimization Mechanism ◽

Network Pattern

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).

Download Full-text

Automated Generation of Optimal Steel Sequences with Discrete Action, Status, and Interaction Simulation in BIM

Construction Research Congress 2018 ◽

10.1061/9780784481264.002 ◽

2018 ◽

Cited By ~ 1

Author(s):

Madhumita Akhuli ◽

Chunhee Cho ◽

JeeWoong Park ◽

Kyungki Kim

Keyword(s):

Automated Generation ◽

Discrete Action ◽

Interaction Simulation

Download Full-text

Sexual selection and the handicap principle

Nature ◽

10.1038/261192a0 ◽

1976 ◽

Vol 261 (5557) ◽

pp. 192-192

Author(s):

John Krebs

Keyword(s):

Sexual Selection ◽

Handicap Principle

Download Full-text

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/430 ◽

2020 ◽

Author(s):

Mohammadamin Barekatain ◽

Ryo Yonetani ◽

Masashi Hamaya

Keyword(s):

Reinforcement Learning ◽

Task Performance ◽

Experimental Evaluation ◽

Control Problems ◽

Target Task ◽

Learning Efficiency ◽

Simulated Environments ◽

Discrete Action ◽

Key Techniques ◽

Action Spaces

Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.

Download Full-text