Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

Mapping Intimacies ◽

10.48011/asba.v2i1.1547 ◽

2020 ◽

Author(s):

Renata Garcia Oliveira ◽

Wouter Caarls

Keyword(s):

Reinforcement Learning ◽

Ensemble Methods ◽

Continuous Control ◽

Search Technique ◽

Improve Performance ◽

Action Function ◽

Continuous Action ◽

Pendulum Swing ◽

Aggregation Techniques

Deep Reinforcement Learning has been very promising in learning continuous control policies. For complex tasks, Reinforcement Learning with minimal human intervention is still a challenge. This article proposes a study to improve performance and to stabilize the learning curve using the ensemble learning methods. Learning a combined parameterized action function using multiple agents in a single environment, while searching for a better way to learn, regardless of the quality of the parametrization. The action ensemble methods were applied in three environments: pendulum swing-up, cart pole and half cheetah. Their results demonstrated that action ensemble can improve performance with respect to the grid search technique. This article also presents as contribution the comparison of the effectiveness of the aggregation techniques, the analysis considers the use of the separate or the combined policies during training. The latter presents better learning results when used with the data center aggregation strategy.

Download Full-text

Solving Continuous Control with Episodic Memory

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/365 ◽

2021 ◽

Author(s):

Igor Kuznetsov ◽

Andrey Filchenkov

Keyword(s):

Reinforcement Learning ◽

Episodic Memory ◽

Data Structures ◽

State Of The Art ◽

Learning Algorithms ◽

Continuous Control ◽

Improve Performance ◽

The Past ◽

Model Free ◽

Discrete Action

Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.

Download Full-text

Method and Technology for Continuous Control of Performance and Operating Quality of Automotive and Combine Machinery

International Review of Mechanical Engineering (IREME) ◽

10.15866/ireme.v14i3.18731 ◽

2020 ◽

Vol 14 (3) ◽

pp. 160

Author(s):

Ildar Gabitov ◽

Samat Insafuddinov ◽

Ildar Badretdinov ◽

Viktor Pavlenko ◽

Filyus Safin

Keyword(s):

Continuous Control

Download Full-text

A game strategy model in the digital curling system based on NFSP

Complex & Intelligent Systems ◽

10.1007/s40747-021-00345-6 ◽

2021 ◽

Author(s):

Yuntao Han ◽

Qibin Zhou ◽

Fuqing Duan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Action Space ◽

Learning Networks ◽

Game Tree ◽

Continuous Action ◽

Extensive Game ◽

Strategy Model ◽

Zero Sum ◽

Tree Searching

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

Fully distributed actor-critic architecture for multitask deep reinforcement learning

The Knowledge Engineering Review ◽

10.1017/s0269888921000023 ◽

2021 ◽

Vol 36 ◽

Author(s):

Sergio Valcarcel Macua ◽

Ian Davies ◽

Aleksi Tukiainen ◽

Enrique Munoz de Cote

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Duality Theory ◽

Deep Neural Network ◽

Original Problem ◽

Almost Sure Convergence ◽

Continuous Control ◽

Access Data ◽

Central Station ◽

Common Policy

Abstract We propose a fully distributed actor-critic architecture, named diffusion-distributed-actor-critic Diff-DAC, with application to multitask reinforcement learning (MRL). During the learning process, agents communicate their value and policy parameters to their neighbours, diffusing the information across a network of agents with no need for a central station. Each agent can only access data from its local task, but aims to learn a common policy that performs well for the whole set of tasks. The architecture is scalable, since the computational and communication cost per agent depends on the number of neighbours rather than the overall number of agents. We derive Diff-DAC from duality theory and provide novel insights into the actor-critic framework, showing that it is actually an instance of the dual-ascent method. We prove almost sure convergence of Diff-DAC to a common policy under general assumptions that hold even for deep neural network approximations. For more restrictive assumptions, we also prove that this common policy is a stationary point of an approximation of the original problem. Numerical results on multitask extensions of common continuous control benchmarks demonstrate that Diff-DAC stabilises learning and has a regularising effect that induces higher performance and better generalisation properties than previous architectures.

Download Full-text

Advising reinforcement learning toward scaling agents in continuous control environments with sparse rewards

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.103515 ◽

2020 ◽

Vol 90 ◽

pp. 103515

Author(s):

Hailin Ren ◽

Pinhas Ben-Tzvi

Keyword(s):

Reinforcement Learning ◽

Continuous Control

Download Full-text

Reinforcement Learning for Cloud Computing Digital Library

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.105 ◽

2014 ◽

Vol 571-572 ◽

pp. 105-108

Author(s):

Lin Xu

Keyword(s):

Artificial Intelligence ◽

Cloud Computing ◽

Reinforcement Learning ◽

Digital Library ◽

Learning Algorithms ◽

Experimental Results ◽

Current Status ◽

Self Learning ◽

New Framework

This paper proposes a new framework of combining reinforcement learning with cloud computing digital library. Unified self-learning algorithms, which includes reinforcement learning, artificial intelligence and etc, have led to many essential advances. Given the current status of highly-available models, analysts urgently desire the deployment of write-ahead logging. In this paper we examine how DNS can be applied to the investigation of superblocks, and introduce the reinforcement learning to improve the quality of current cloud computing digital library. The experimental results show that the method works more efficiency.

Download Full-text

Autonomous Surface Craft Continuous Control with Reinforcement Learning

10.1109/aict52784.2021.9620297 ◽

2021 ◽

Author(s):

Sorokin Andrey ◽

Farkhadov Mais Pasha Ogli

Keyword(s):

Reinforcement Learning ◽

Continuous Control

Download Full-text

The Influence of Strategic Leadership on Building Employees’ Entrepreneurial Orientation: A Field Study at the Jordanian Public Sector

International Business Research ◽

10.5539/ibr.v10n6p62 ◽

2017 ◽

Vol 10 (6) ◽

pp. 62

Author(s):

Sahar Mohammad Abu Bakir

Keyword(s):

Public Sector ◽

Entrepreneurial Orientation ◽

Change Agent ◽

Strategic Leadership ◽

Successful Implementation ◽

Continuous Control ◽

The Public ◽

Governmental Organizations ◽

The Impact

The public sector in Jordan is confronting many problems; reports show that citizens are not contented with the number and quality of current services. Consequently; persistent initiatives to uphold the sector performance took place at all levels, relying on the inventive employees and leadership to achieve the intended improvement. So this study seeks to test the impact of strategic leadership (charismatic, visionary, change agent and servant) on building entrepreneurial orientation (proactiveness, innovativeness and risk taking) in Jordanian public sector employees.A random sample was selected of 500 employees working at health, education, agriculture and other service governmental organizations. To obtain the required results multiple regression was calculated using (21) SPSS version.It was found that the charismatic, change agent, servant styles positively influence employees proaciveness, with no influence on the other two entrepreneurship dimensions. While visionary style has no significant influence on all entrepreneurship dimensions. However the public sector reform is achievable, through comprehensive strategies, successful implementation, and effective continuous control. Innovative departments need to be established and financed away from bureaucratic environments.

Download Full-text

Continuous Control With a Combination of Supervised and Reinforcement Learning

2018 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2018.8489702 ◽

2018 ◽

Author(s):

Dmitry Kangin ◽

Nicolas Pugeault

Keyword(s):

Reinforcement Learning ◽

Continuous Control

Download Full-text

Intelligent Routing Control for MANET Based on Reinforcement Learning

MATEC Web of Conferences ◽

10.1051/matecconf/201823204002 ◽

2018 ◽

Vol 232 ◽

pp. 04002

Author(s):

Fang Dong ◽

Ou Li ◽

Min Tong

Keyword(s):

Reinforcement Learning ◽

Control Algorithm ◽

Rapid Development ◽

Selection Strategy ◽

Performance Simulation ◽

Multiple Parameters ◽

Intelligent Routing ◽

Routing Control ◽

The Cost

With the rapid development and wide use of MANET, the quality of service for various businesses is much higher than before. Aiming at the adaptive routing control with multiple parameters for universal scenes, we propose an intelligent routing control algorithm for MANET based on reinforcement learning, which can constantly optimize the node selection strategy through the interaction with the environment and converge to the optimal transmission paths gradually. There is no need to update the network state frequently, which can save the cost of routing maintenance while improving the transmission performance. Simulation results show that, compared with other algorithms, the proposed approach can choose appropriate paths under constraint conditions, and can obtain better optimization objective.

Download Full-text