Continuous action deep reinforcement learning for propofol dosing during general anesthesia

AbstractThe digital curling game is a two-player zero-sum extensive game in a continuous action space. There are some challenging problems that are still not solved well, such as the uncertainty of strategy, the large game tree searching, and the use of large amounts of supervised data, etc. In this work, we combine NFSP and KR-UCT for digital curling games, where NFSP uses two adversary learning networks and can automatically produce supervised data, and KR-UCT can be used for large game tree searching in continuous action space. We propose two reward mechanisms to make reinforcement learning converge quickly. Experimental results validate the proposed method, and show the strategy model can reach the Nash equilibrium.

Download Full-text

Continuous action reinforcement learning automata and their application to adaptive digital filter design

Engineering Applications of Artificial Intelligence ◽

10.1016/s0952-1976(01)00034-3 ◽

2001 ◽

Vol 14 (5) ◽

pp. 549-561 ◽

Cited By ~ 32

Author(s):

M.N. Howell ◽

T.J. Gordon

Keyword(s):

Reinforcement Learning ◽

Digital Filter ◽

Filter Design ◽

Learning Automata ◽

Continuous Action ◽

Digital Filter Design ◽

Reinforcement Learning Automata

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Constructing continuous action space from basis functions for fast and stable reinforcement learning

RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication ◽

10.1109/roman.2009.5326234 ◽

2009 ◽

Cited By ~ 2

Author(s):

Akihiko Yamaguchi ◽

Jun Takamatsu ◽

Tsukasa Ogasawara

Keyword(s):

Reinforcement Learning ◽

Basis Functions ◽

Action Space ◽

Continuous Action

Download Full-text

Function Optimization via a Continuous Action-Set Reinforcement Learning Automata Model

Proceedings of the 2015 International Conference on Communications, Signal Processing, and Systems - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-662-49831-6_102 ◽

2016 ◽

pp. 981-989

Author(s):

Ying Guo ◽

Hao Ge ◽

Fanming Wang ◽

Yuyang Huang ◽

Shenghong Li

Keyword(s):

Reinforcement Learning ◽

Learning Automata ◽

Function Optimization ◽

Continuous Action ◽

Reinforcement Learning Automata

Download Full-text

Precise Evaluation for Continuous Action Control in Reinforcement Learning

Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference on - HPCCT 2019 ◽

10.1145/3341069.3341082 ◽

2019 ◽

Author(s):

Fengkai Ke ◽

Daxing Zhao ◽

Guodong Sun ◽

Wei Feng

Keyword(s):

Reinforcement Learning ◽

Action Control ◽

Continuous Action ◽

Precise Evaluation

Download Full-text

Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014213 ◽

2019 ◽

Vol 33 ◽

pp. 4213-4220 ◽

Cited By ~ 12

Author(s):

Shihui Li ◽

Yi Wu ◽

Xinyue Cui ◽

Honghua Dong ◽

Fei Fang ◽

...

Keyword(s):

Reinforcement Learning ◽

Gradient Algorithm ◽

Training Environment ◽

Local Optima ◽

Continuous Action ◽

Agent Learning ◽

Policy Gradient ◽

Multi Agent ◽

Continuous Actions ◽

Computational Intractability

Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.

Download Full-text

Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces

Journal of Marine Science and Technology ◽

10.1007/s00773-020-00755-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ryohei Sawada ◽

Keiji Sato ◽

Takahiro Majima

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Continuous Action ◽

Ship Collision ◽

Action Spaces

Download Full-text

Applying continuous action reinforcement learning automata(CARLA) to global training of hidden Markov models

International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004. ◽

10.1109/itcc.2004.1286725 ◽

2004 ◽

Cited By ~ 24

Author(s):

J. Kabudian ◽

M.R. Meybodi ◽

M.M. Homayounpour

Keyword(s):

Reinforcement Learning ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Learning Automata ◽

Continuous Action ◽

Global Training ◽

Reinforcement Learning Automata

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text