Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

2020 ◽  
Author(s):  
Renata Garcia Oliveira ◽  
Wouter Caarls

Deep Reinforcement Learning has been very promising in learning continuous control policies. For complex tasks, Reinforcement Learning with minimal human intervention is still a challenge. This article proposes a study to improve performance and to stabilize the learning curve using the ensemble learning methods. Learning a combined parameterized action function using multiple agents in a single environment, while searching for a better way to learn, regardless of the quality of the parametrization. The action ensemble methods were applied in three environments: pendulum swing-up, cart pole and half cheetah. Their results demonstrated that action ensemble can improve performance with respect to the grid search technique. This article also presents as contribution the comparison of the effectiveness of the aggregation techniques, the analysis considers the use of the separate or the combined policies during training. The latter presents better learning results when used with the data center aggregation strategy.

Author(s):  
Igor Kuznetsov ◽  
Andrey Filchenkov

Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.


2020 ◽  
Vol 14 (3) ◽  
pp. 160
Author(s):  
Ildar Gabitov ◽  
Samat Insafuddinov ◽  
Ildar Badretdinov ◽  
Viktor Pavlenko ◽  
Filyus Safin
Keyword(s):  

Author(s):  
Yuntao Han ◽  
Qibin Zhou ◽  
Fuqing Duan

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.


2021 ◽  
Vol 36 ◽  
Author(s):  
Sergio Valcarcel Macua ◽  
Ian Davies ◽  
Aleksi Tukiainen ◽  
Enrique Munoz de Cote

Abstract We propose a fully distributed actor-critic architecture, named diffusion-distributed-actor-critic Diff-DAC, with application to multitask reinforcement learning (MRL). During the learning process, agents communicate their value and policy parameters to their neighbours, diffusing the information across a network of agents with no need for a central station. Each agent can only access data from its local task, but aims to learn a common policy that performs well for the whole set of tasks. The architecture is scalable, since the computational and communication cost per agent depends on the number of neighbours rather than the overall number of agents. We derive Diff-DAC from duality theory and provide novel insights into the actor-critic framework, showing that it is actually an instance of the dual-ascent method. We prove almost sure convergence of Diff-DAC to a common policy under general assumptions that hold even for deep neural network approximations. For more restrictive assumptions, we also prove that this common policy is a stationary point of an approximation of the original problem. Numerical results on multitask extensions of common continuous control benchmarks demonstrate that Diff-DAC stabilises learning and has a regularising effect that induces higher performance and better generalisation properties than previous architectures.


2014 ◽  
Vol 571-572 ◽  
pp. 105-108
Author(s):  
Lin Xu

This paper proposes a new framework of combining reinforcement learning with cloud computing digital library. Unified self-learning algorithms, which includes reinforcement learning, artificial intelligence and etc, have led to many essential advances. Given the current status of highly-available models, analysts urgently desire the deployment of write-ahead logging. In this paper we examine how DNS can be applied to the investigation of superblocks, and introduce the reinforcement learning to improve the quality of current cloud computing digital library. The experimental results show that the method works more efficiency.


2017 ◽  
Vol 10 (6) ◽  
pp. 62
Author(s):  
Sahar Mohammad Abu Bakir

The public sector in Jordan is confronting many problems; reports show that citizens are not contented with the number and quality of current services. Consequently; persistent initiatives to uphold the sector performance took place at all levels, relying on the inventive employees and leadership to achieve the intended improvement. So this study seeks to test the impact of strategic leadership (charismatic, visionary, change agent and servant) on building entrepreneurial orientation (proactiveness, innovativeness and risk taking) in Jordanian public sector employees.A random sample was selected of 500 employees working at health, education, agriculture and other service governmental organizations. To obtain the required results multiple regression was calculated using (21) SPSS version.It was found that the charismatic, change agent, servant styles positively influence employees proaciveness, with no influence on the other two entrepreneurship dimensions. While visionary style has no significant influence on all entrepreneurship dimensions. However the public sector reform is achievable, through comprehensive strategies, successful implementation, and effective continuous control. Innovative departments need to be established and financed away from bureaucratic environments.


2018 ◽  
Vol 232 ◽  
pp. 04002
Author(s):  
Fang Dong ◽  
Ou Li ◽  
Min Tong

With the rapid development and wide use of MANET, the quality of service for various businesses is much higher than before. Aiming at the adaptive routing control with multiple parameters for universal scenes, we propose an intelligent routing control algorithm for MANET based on reinforcement learning, which can constantly optimize the node selection strategy through the interaction with the environment and converge to the optimal transmission paths gradually. There is no need to update the network state frequently, which can save the cost of routing maintenance while improving the transmission performance. Simulation results show that, compared with other algorithms, the proposed approach can choose appropriate paths under constraint conditions, and can obtain better optimization objective.


Sign in / Sign up

Export Citation Format

Share Document