scholarly journals A Deep Reinforcement Learning Approach to Concurrent Bilateral Negotiation

Author(s):  
Pallavi Bagga ◽  
Nicola Paoletti ◽  
Bedour Alrayes ◽  
Kostas Stathis

We present a novel negotiation model that allows an agent to learn how to negotiate during concurrent bilateral negotiations in unknown and dynamic e-markets. The agent uses an actor-critic architecture with model-free reinforcement learning to learn a strategy expressed as a deep neural network. We pre-train the strategy by supervision from synthetic market data, thereby decreasing the exploration time required for learning during negotiation. As a result, we can build automated agents for concurrent negotiations that can adapt to different e-market settings without the need to be pre-programmed. Our experimental evaluation shows that our deep reinforcement learning based agents outperform two existing well-known negotiation strategies in one-to-many concurrent bilateral negotiations for a range of e-market settings.

2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Pallavi Bagga ◽  
Nicola Paoletti ◽  
Bedour Alrayes ◽  
Kostas Stathis

AbstractWe present a novel negotiation model that allows an agent to learn how to negotiate during concurrent bilateral negotiations in unknown and dynamic e-markets. The agent uses an actor-critic architecture with model-free reinforcement learning to learn a strategy expressed as a deep neural network. We pre-train the strategy by supervision from synthetic market data, thereby decreasing the exploration time required for learning during negotiation. As a result, we can build automated agents for concurrent negotiations that can adapt to different e-market settings without the need to be pre-programmed. Our experimental evaluation shows that our deep reinforcement learning based agents outperform two existing well-known negotiation strategies in one-to-many concurrent bilateral negotiations for a range of e-market settings.


Author(s):  
Francesco M. Solinas ◽  
Andrea Bellagarda ◽  
Enrico Macii ◽  
Edoardo Patti ◽  
Lorenzo Bottaccioli

2021 ◽  
Author(s):  
Andre Menezes ◽  
Pedro Vicente ◽  
Alexandre Bernardino ◽  
Rodrigo Ventura

2021 ◽  
Vol 4 ◽  
Author(s):  
Marina Dorokhova ◽  
Christophe Ballif ◽  
Nicolas Wyrsch

In the past few years, the importance of electric mobility has increased in response to growing concerns about climate change. However, limited cruising range and sparse charging infrastructure could restrain a massive deployment of electric vehicles (EVs). To mitigate the problem, the need for optimal route planning algorithms emerged. In this paper, we propose a mathematical formulation of the EV-specific routing problem in a graph-theoretical context, which incorporates the ability of EVs to recuperate energy. Furthermore, we consider a possibility to recharge on the way using intermediary charging stations. As a possible solution method, we present an off-policy model-free reinforcement learning approach that aims to generate energy feasible paths for EV from source to target. The algorithm was implemented and tested on a case study of a road network in Switzerland. The training procedure requires low computing and memory demands and is suitable for online applications. The results achieved demonstrate the algorithm’s capability to take recharging decisions and produce desired energy feasible paths.


2020 ◽  
Vol 17 (1) ◽  
pp. 172988141989834
Author(s):  
Guoyu Zuo ◽  
Qishen Zhao ◽  
Jiahao Lu ◽  
Jiangeng Li

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.


Author(s):  
Wenxing Liu ◽  
Hanlin Niu ◽  
Muhammad Nasiruddin Mahyuddin ◽  
Guido Herrmann ◽  
Joaquin Carrasco

Sign in / Sign up

Export Citation Format

Share Document