discrete action Latest Research Papers

2021 ◽

pp. 1-10

Author(s):

Wei Zhou ◽

Xing Jiang ◽

Bingli Guo (Member, IEEE) ◽

Lingyu Meng

Keyword(s):

Software Defined Network ◽

Training Time ◽

Good Convergence ◽

Reward Function ◽

Routing Optimization ◽

Policy Gradient ◽

Discrete Action ◽

Policy Optimization ◽

Optimization Mechanism ◽

Network Pattern

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).

Download Full-text

Learning to solve sequential physical reasoning problems from a scene image

The International Journal of Robotics Research ◽

10.1177/02783649211056967 ◽

2021 ◽

Vol 40 (12-14) ◽

pp. 1435-1466

Author(s):

Danny Driess ◽

Jung-Su Ha ◽

Marc Toussaint

Keyword(s):

Neural Network ◽

Motion Planning ◽

Trajectory Optimization ◽

Dynamic Models ◽

Combinatorial Complexity ◽

Planning Problem ◽

Initial Image ◽

Scene Image ◽

Action Sequences ◽

Discrete Action

In this article, we propose deep visual reasoning, which is a convolutional recurrent neural network that predicts discrete action sequences from an initial scene image for sequential manipulation problems that arise, for example, in task and motion planning (TAMP). Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g., first-order logic) with continuous motion planning such as nonlinear trajectory optimization. The action sequences represent the discrete decisions on a symbolic level, which, in turn, parameterize a nonlinear trajectory optimization problem. Owing to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we introduce deep visual reasoning: based on a segmented initial image of the scene, a neural network directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. Our method generalizes to scenes with many and varying numbers of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene and the goal in (segmented) images as input to the neural network, instead of a fixed feature vector. We show that the framework can not only handle kinematic problems such as pick-and-place (as typical in TAMP), but also tool-use scenarios for planar pushing under quasi-static dynamic models. Here, the image-based representation enables generalization to other shapes than during training. Results show runtime improvements of several orders of magnitudes by, in many cases, removing the need to search over the discrete action sequences.

Download Full-text

Multiple Brain Sources Are Differentially Engaged in the Inhibition of Distinct Action Types

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01794 ◽

2021 ◽

pp. 1-15

Author(s):

Mario Hervault ◽

Pier-Giorgio Zanone ◽

Jean-Christophe Buisson ◽

Raoul Huys

Keyword(s):

Inhibitory Control ◽

Cingulate Cortex ◽

Brain Network ◽

Continuous Action ◽

Inhibitory Network ◽

Eeg Data ◽

Discrete Action ◽

Continuous Actions ◽

Increased Activity ◽

The Brain

Abstract Most studies contributing to identify the brain network for inhibitory control have investigated the cancelation of prepared–discrete actions, thus focusing on an isolated and short-lived chunk of human behavior. Aborting ongoing–continuous actions is an equally crucial ability but remains little explored. Although discrete and ongoing–continuous rhythmic actions are associated with partially overlapping yet largely distinct brain activations, it is unknown whether the inhibitory network operates similarly in both situations. Thus, distinguishing between action types constitutes a powerful means to investigate whether inhibition is a generic function. We, therefore, used independent component analysis (ICA) of EEG data and show that canceling a discrete action and aborting a rhythmic action rely on independent brain components. The ICA showed that a delta/theta power increase generically indexed inhibitory activity, whereas N2 and P3 ERP waves did so in an action-specific fashion. The action-specific components were generated by partially distinct brain sources, which indicates that the inhibitory network is engaged differently when canceling a prepared–discrete action versus aborting an ongoing–continuous action. In particular, increased activity was estimated in precentral gyri and posterior parts of the cingulate cortex for action canceling, whereas an enhanced activity was found in more frontal gyri and anterior parts of the cingulate cortex for action aborting. Overall, the present findings support the idea that inhibitory control is differentially implemented according to the type of action to revise.

Download Full-text

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

10.36227/techrxiv.16869119.v1 ◽

2021 ◽

Author(s):

Abdeladim Sadiki ◽

Jamal Bentahar ◽

Rachida Dssouli ◽

Abdeslam En-Nouaary

Keyword(s):

Reinforcement Learning ◽

High Performance ◽

Multiple Input Multiple Output ◽

Edge Computing ◽

Base Stations ◽

Computation Offloading ◽

Discrete Action ◽

Computationally Intensive ◽

Multi Access ◽

Policy Optimization

Multi-access Edge Computing (MEC) has recently emerged as a potential technology to serve the needs of mobile devices (MDs) in 5G and 6G cellular networks. By offloading tasks to high-performance servers installed at the edge of the wireless networks, resource-limited MDs can cope with the proliferation of the recent computationally-intensive applications. In this paper, we study the computation offloading problem in a massive multiple-input multiple-output (MIMO)-based MEC system where the base stations are equipped with a large number of antennas. Our objective is to minimize the power consumption and offloading delay at the MDs under the stochastic system environment. To this end, we formulate the problem as a Markov Decision Process (MDP) and propose two Deep Reinforcement Learning (DRL) strategies to learn the optimal offloading policy without any prior knowledge of the environment dynamics. First, a Deep Q-Network (DQN) strategy to solve the curse of the state space explosion is analyzed. Then, a more general Proximal Policy Optimization (PPO) strategy to solve the problem of discrete action space is introduced. Simulation results show that the proposed DRL-based strategies outperform the baseline and state-of-the-art algorithms. Moreover, our PPO algorithm exhibits stable performance and efficient offloading results compared to the benchmark DQN strategy.

Download Full-text

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

10.36227/techrxiv.16869119 ◽

2021 ◽

Author(s):

Abdeladim Sadiki ◽

Jamal Bentahar ◽

Rachida Dssouli ◽

Abdeslam En-Nouaary

Keyword(s):

Reinforcement Learning ◽

High Performance ◽

Multiple Input Multiple Output ◽

Edge Computing ◽

Base Stations ◽

Computation Offloading ◽

Discrete Action ◽

Computationally Intensive ◽

Multi Access ◽

Policy Optimization

Multi-access Edge Computing (MEC) has recently emerged as a potential technology to serve the needs of mobile devices (MDs) in 5G and 6G cellular networks. By offloading tasks to high-performance servers installed at the edge of the wireless networks, resource-limited MDs can cope with the proliferation of the recent computationally-intensive applications. In this paper, we study the computation offloading problem in a massive multiple-input multiple-output (MIMO)-based MEC system where the base stations are equipped with a large number of antennas. Our objective is to minimize the power consumption and offloading delay at the MDs under the stochastic system environment. To this end, we formulate the problem as a Markov Decision Process (MDP) and propose two Deep Reinforcement Learning (DRL) strategies to learn the optimal offloading policy without any prior knowledge of the environment dynamics. First, a Deep Q-Network (DQN) strategy to solve the curse of the state space explosion is analyzed. Then, a more general Proximal Policy Optimization (PPO) strategy to solve the problem of discrete action space is introduced. Simulation results show that the proposed DRL-based strategies outperform the baseline and state-of-the-art algorithms. Moreover, our PPO algorithm exhibits stable performance and efficient offloading results compared to the benchmark DQN strategy.

Download Full-text

An Optimized Path Planning Method for Coastal Ships Based on Improved DDPG and DP

Journal of Advanced Transportation ◽

10.1155/2021/7765130 ◽

2021 ◽

Vol 2021 ◽

pp. 1-23

Author(s):

Yiquan Du ◽

Xiuguo Zhang ◽

Zhiying Cao ◽

Shaobo Wang ◽

Jiacheng Liang ◽

...

Keyword(s):

Path Planning ◽

Short Term Memory ◽

Convergence Speed ◽

Planning Method ◽

Learning Ability ◽

Stability And Convergence ◽

State Information ◽

Reward Function ◽

Discrete Action ◽

The Impact

Deep Reinforcement Learning (DRL) is widely used in path planning with its powerful neural network fitting ability and learning ability. However, existing DRL-based methods use discrete action space and do not consider the impact of historical state information, resulting in the algorithm not being able to learn the optimal strategy to plan the path, and the planned path has arcs or too many corners, which does not meet the actual sailing requirements of the ship. In this paper, an optimized path planning method for coastal ships based on improved Deep Deterministic Policy Gradient (DDPG) and Douglas–Peucker (DP) algorithm is proposed. Firstly, Long Short-Term Memory (LSTM) is used to improve the network structure of DDPG, which uses the historical state information to approximate the current environmental state information, so that the predicted action is more accurate. On the other hand, the traditional reward function of DDPG may lead to low learning efficiency and convergence speed of the model. Hence, this paper improves the reward principle of traditional DDPG through the mainline reward function and auxiliary reward function, which not only helps to plan a better path for ship but also improves the convergence speed of the model. Secondly, aiming at the problem that too many turning points exist in the above-planned path which may increase the navigation risk, an improved DP algorithm is proposed to further optimize the planned path to make the final path more safe and economical. Finally, simulation experiments are carried out to verify the proposed method from the aspects of plan planning effect and convergence trend. Results show that the proposed method can plan safe and economic navigation paths and has good stability and convergence.

Download Full-text

Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

AI ◽

10.3390/ai2030023 ◽

2021 ◽

Vol 2 (3) ◽

pp. 366-382

Author(s):

Zhihan Xue ◽

Tad Gonsalves

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Image Data ◽

Depth Map ◽

Training Environment ◽

Time To Build ◽

Discrete Action ◽

Single Dataset ◽

Action Spaces

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.

Download Full-text

Generalising Discrete Action Spaces with Conditional Action Trees

10.1109/cog52621.2021.9619093 ◽

2021 ◽

Author(s):

Christopher Bamford ◽

Alvaro Ovalle

Keyword(s):

Discrete Action ◽

Action Spaces

Download Full-text

A Study on Discrete Action Sequences Using Deep Emotional Intelligence

Studies in Big Data - Deep Learning in Data Analytics ◽

10.1007/978-3-030-75855-4_1 ◽

2021 ◽

pp. 3-24

Author(s):

R. Santhoshkumar ◽

M. Kalaiselvi Geetha

Keyword(s):

Emotional Intelligence ◽

Action Sequences ◽

Discrete Action

Download Full-text

Solving Continuous Control with Episodic Memory

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/365 ◽

2021 ◽

Author(s):

Igor Kuznetsov ◽

Andrey Filchenkov

Keyword(s):

Reinforcement Learning ◽

Episodic Memory ◽

Data Structures ◽

State Of The Art ◽

Learning Algorithms ◽

Continuous Control ◽

Improve Performance ◽

The Past ◽

Model Free ◽

Discrete Action

Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.

Download Full-text

discrete action
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

Learning to solve sequential physical reasoning problems from a scene image

Multiple Brain Sources Are Differentially Engaged in the Inhibition of Distinct Action Types

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

An Optimized Path Planning Method for Coastal Ships Based on Improved DDPG and DP

Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

Generalising Discrete Action Spaces with Conditional Action Trees

A Study on Discrete Action Sequences Using Deep Emotional Intelligence

Solving Continuous Control with Episodic Memory

Export Citation Format

discrete actionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

Learning to solve sequential physical reasoning problems from a scene image

Multiple Brain Sources Are Differentially Engaged in the Inhibition of Distinct Action Types

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

Deep Reinforcement Learning for the Computation Offloading in MIMO-based Edge Computing

An Optimized Path Planning Method for Coastal Ships Based on Improved DDPG and DP

Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

Generalising Discrete Action Spaces with Conditional Action Trees

A Study on Discrete Action Sequences Using Deep Emotional Intelligence

Solving Continuous Control with Episodic Memory

discrete action
Recently Published Documents