Deep Reinforcement Learning Algorithms for Path Planning Domain in Grid-like Environment

Recently, more and more solutions have utilised artificial intelligence approaches in order to enhance or optimise processes to achieve greater sustainability. One of the most pressing issues is the emissions caused by cars; in this paper, the problem of optimising the route of delivery cars is tackled. In this paper, the applicability of the deep reinforcement learning algorithms with regards to the aforementioned problem is tested on a simulation game designed and implemented to pose various challenges such as constant change of delivery locations. The algorithms chosen for this task are Advantage Actor-Critic (A2C) with and without Proximal Policy Optimisation (PPO). These novel and advanced reinforcement learning algorithms have yet not been utilised in similar scenarios. The differences in performance and learning process of those are visualised and discussed. It is demonstrated that both of those algorithms present a slow but steady learning curve, which is an expected effect of reinforcement learning algorithms, leading to a conclusion that the algorithms would discover an optimal policy with an adequately long learning process. Additionally, the benefits of the Proximal Policy Optimisation algorithm are proven by the enhanced learning curve with comparison to the Advantage Actor-Critic approach, as the learning process is characterised by faster growth with a significantly smaller variation. Finally, the applicability of such algorithms in the described scenarios is discussed, alongside the possible improvements and future work.

Download Full-text

Deep Reinforcement Learning via Past-Success Directed Exploration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019979 ◽

2019 ◽

Vol 33 ◽

pp. 9979-9980

Author(s):

Xiaoming Liu ◽

Zhixiong Xu ◽

Lei Cao ◽

Xiliang Chen ◽

Kai Kang

Keyword(s):

Online Learning ◽

Adaptive Control ◽

Reinforcement Learning ◽

Learning Process ◽

Control Method ◽

Learning Algorithms ◽

Action Selection ◽

Experimental Results ◽

Continuous Control ◽

Exploration And Exploitation

The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.

Download Full-text

Relational Representations and Traces for Efficient Reinforcement Learning

Decision Theory Models for Applications in Artificial Intelligence ◽

10.4018/978-1-60960-165-2.ch009 ◽

2012 ◽

pp. 190-217

Author(s):

Eduardo F. Morales ◽

Julio H. Zaragoza

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Learning Process ◽

Domain Knowledge ◽

Learning Algorithms ◽

Flight Simulator ◽

Small Subset ◽

First Order ◽

Search Spaces ◽

Order Relations

This chapter introduces an approach for reinforcement learning based on a relational representation that: (i) can be applied over large search spaces, (ii) can incorporate domain knowledge, and (iii) can use previously learned policies on different, but similar, problems. The underlying idea is to represent states as sets of first order relations, actions in terms of those relations, and to learn policies over such generalized representation. It is shown how this representation can produce powerful abstractions and that policies learned over this generalized representation can be directly applied, without any further learning, to other problems that can be characterized by the same set of relations. To accelerate the learning process, we present an extension where traces of the tasks to be learned are provided by the user. These traces are used to select only a small subset of possible actions increasing the convergence of the learning algorithms. The effectiveness of the approach is tested on a flight simulator and on a mobile robot.

Download Full-text

Relational Representations and Traces for Efficient Reinforcement Learning

Robotics ◽

10.4018/978-1-4666-4607-0.ch013 ◽

2013 ◽

pp. 248-273

Author(s):

Eduardo F. Morales ◽

Julio H. Zaragoza

Keyword(s):

Reinforcement Learning ◽

Mobile Robot ◽

Learning Process ◽

Domain Knowledge ◽

Learning Algorithms ◽

Flight Simulator ◽

Small Subset ◽

First Order ◽

Search Spaces ◽

Order Relations

Download Full-text

Cognitive Radio Networks with Reinforcement Learning Algorithms for Spectrum Allocation: A Survey

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/211952020 ◽

2020 ◽

Vol 9 (5) ◽

pp. 8371-8384

Keyword(s):

Reinforcement Learning ◽

Cognitive Radio ◽

Cognitive Radio Networks ◽

Learning Algorithms ◽

Radio Networks ◽

Spectrum Allocation

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text

Benchmarking reinforcement learning algorithms for demand response applications

2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe) ◽

10.1109/isgt-europe47291.2020.9248800 ◽

2020 ◽

Author(s):

Brida V. Mbuwir ◽

Carlo Manna ◽

Fred Spiessens ◽

Geert Deconinck

Keyword(s):

Reinforcement Learning ◽

Demand Response ◽

Learning Algorithms

Download Full-text

Reinforcement Learning Algorithms: Analysis and Applications

10.1007/978-3-030-41188-6 ◽

2021 ◽

Keyword(s):

Reinforcement Learning ◽

Learning Algorithms

Download Full-text

Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control

Applied Energy ◽

10.1016/j.apenergy.2021.117164 ◽

2021 ◽

Vol 298 ◽

pp. 117164

Author(s):

Marco Biemann ◽

Fabian Scheller ◽

Xiufeng Liu ◽

Lizhen Huang

Keyword(s):

Reinforcement Learning ◽

Experimental Evaluation ◽

Learning Algorithms ◽

Model Free ◽

Hvac Control

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Comparative Analysis of Reinforcement Learning Algorithms on TORCS Environment

2020 28th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu49456.2020.9302358 ◽

2020 ◽

Keyword(s):

Reinforcement Learning ◽

Comparative Analysis ◽

Learning Algorithms

Download Full-text