discrete action
Recently Published Documents


TOTAL DOCUMENTS

78
(FIVE YEARS 41)

H-INDEX

12
(FIVE YEARS 3)

2021 ◽  
pp. 1-10
Author(s):  
Wei Zhou ◽  
Xing Jiang ◽  
Bingli Guo (Member, IEEE) ◽  
Lingyu Meng

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).


2021 ◽  
Vol 40 (12-14) ◽  
pp. 1435-1466
Author(s):  
Danny Driess ◽  
Jung-Su Ha ◽  
Marc Toussaint

In this article, we propose deep visual reasoning, which is a convolutional recurrent neural network that predicts discrete action sequences from an initial scene image for sequential manipulation problems that arise, for example, in task and motion planning (TAMP). Typical TAMP problems are formalized by combining reasoning on a symbolic, discrete level (e.g., first-order logic) with continuous motion planning such as nonlinear trajectory optimization. The action sequences represent the discrete decisions on a symbolic level, which, in turn, parameterize a nonlinear trajectory optimization problem. Owing to the great combinatorial complexity of possible discrete action sequences, a large number of optimization/motion planning problems have to be solved to find a solution, which limits the scalability of these approaches. To circumvent this combinatorial complexity, we introduce deep visual reasoning: based on a segmented initial image of the scene, a neural network directly predicts promising discrete action sequences such that ideally only one motion planning problem has to be solved to find a solution to the overall TAMP problem. Our method generalizes to scenes with many and varying numbers of objects, although being trained on only two objects at a time. This is possible by encoding the objects of the scene and the goal in (segmented) images as input to the neural network, instead of a fixed feature vector. We show that the framework can not only handle kinematic problems such as pick-and-place (as typical in TAMP), but also tool-use scenarios for planar pushing under quasi-static dynamic models. Here, the image-based representation enables generalization to other shapes than during training. Results show runtime improvements of several orders of magnitudes by, in many cases, removing the need to search over the discrete action sequences.


2021 ◽  
pp. 1-15
Author(s):  
Mario Hervault ◽  
Pier-Giorgio Zanone ◽  
Jean-Christophe Buisson ◽  
Raoul Huys

Abstract Most studies contributing to identify the brain network for inhibitory control have investigated the cancelation of prepared–discrete actions, thus focusing on an isolated and short-lived chunk of human behavior. Aborting ongoing–continuous actions is an equally crucial ability but remains little explored. Although discrete and ongoing–continuous rhythmic actions are associated with partially overlapping yet largely distinct brain activations, it is unknown whether the inhibitory network operates similarly in both situations. Thus, distinguishing between action types constitutes a powerful means to investigate whether inhibition is a generic function. We, therefore, used independent component analysis (ICA) of EEG data and show that canceling a discrete action and aborting a rhythmic action rely on independent brain components. The ICA showed that a delta/theta power increase generically indexed inhibitory activity, whereas N2 and P3 ERP waves did so in an action-specific fashion. The action-specific components were generated by partially distinct brain sources, which indicates that the inhibitory network is engaged differently when canceling a prepared–discrete action versus aborting an ongoing–continuous action. In particular, increased activity was estimated in precentral gyri and posterior parts of the cingulate cortex for action canceling, whereas an enhanced activity was found in more frontal gyri and anterior parts of the cingulate cortex for action aborting. Overall, the present findings support the idea that inhibitory control is differentially implemented according to the type of action to revise.


2021 ◽  
Author(s):  
Abdeladim Sadiki ◽  
Jamal Bentahar ◽  
Rachida Dssouli ◽  
Abdeslam En-Nouaary

Multi-access Edge Computing (MEC) has recently emerged as a potential technology to serve the needs of mobile devices (MDs) in 5G and 6G cellular networks. By offloading tasks to high-performance servers installed at the edge of the wireless networks, resource-limited MDs can cope with the proliferation of the recent computationally-intensive applications. In this paper, we study the computation offloading problem in a massive multiple-input multiple-output (MIMO)-based MEC system where the base stations are equipped with a large number of antennas. Our objective is to minimize the power consumption and offloading delay at the MDs under the stochastic system environment. To this end, we formulate the problem as a Markov Decision Process (MDP) and propose two Deep Reinforcement Learning (DRL) strategies to learn the optimal offloading policy without any prior knowledge of the environment dynamics. First, a Deep Q-Network (DQN) strategy to solve the curse of the state space explosion is analyzed. Then, a more general Proximal Policy Optimization (PPO) strategy to solve the problem of discrete action space is introduced. Simulation results show that the proposed DRL-based strategies outperform the baseline and state-of-the-art algorithms. Moreover, our PPO algorithm exhibits stable performance and efficient offloading results compared to the benchmark DQN strategy.


2021 ◽  
Author(s):  
Abdeladim Sadiki ◽  
Jamal Bentahar ◽  
Rachida Dssouli ◽  
Abdeslam En-Nouaary

Multi-access Edge Computing (MEC) has recently emerged as a potential technology to serve the needs of mobile devices (MDs) in 5G and 6G cellular networks. By offloading tasks to high-performance servers installed at the edge of the wireless networks, resource-limited MDs can cope with the proliferation of the recent computationally-intensive applications. In this paper, we study the computation offloading problem in a massive multiple-input multiple-output (MIMO)-based MEC system where the base stations are equipped with a large number of antennas. Our objective is to minimize the power consumption and offloading delay at the MDs under the stochastic system environment. To this end, we formulate the problem as a Markov Decision Process (MDP) and propose two Deep Reinforcement Learning (DRL) strategies to learn the optimal offloading policy without any prior knowledge of the environment dynamics. First, a Deep Q-Network (DQN) strategy to solve the curse of the state space explosion is analyzed. Then, a more general Proximal Policy Optimization (PPO) strategy to solve the problem of discrete action space is introduced. Simulation results show that the proposed DRL-based strategies outperform the baseline and state-of-the-art algorithms. Moreover, our PPO algorithm exhibits stable performance and efficient offloading results compared to the benchmark DQN strategy.


2021 ◽  
Vol 2021 ◽  
pp. 1-23
Author(s):  
Yiquan Du ◽  
Xiuguo Zhang ◽  
Zhiying Cao ◽  
Shaobo Wang ◽  
Jiacheng Liang ◽  
...  

Deep Reinforcement Learning (DRL) is widely used in path planning with its powerful neural network fitting ability and learning ability. However, existing DRL-based methods use discrete action space and do not consider the impact of historical state information, resulting in the algorithm not being able to learn the optimal strategy to plan the path, and the planned path has arcs or too many corners, which does not meet the actual sailing requirements of the ship. In this paper, an optimized path planning method for coastal ships based on improved Deep Deterministic Policy Gradient (DDPG) and Douglas–Peucker (DP) algorithm is proposed. Firstly, Long Short-Term Memory (LSTM) is used to improve the network structure of DDPG, which uses the historical state information to approximate the current environmental state information, so that the predicted action is more accurate. On the other hand, the traditional reward function of DDPG may lead to low learning efficiency and convergence speed of the model. Hence, this paper improves the reward principle of traditional DDPG through the mainline reward function and auxiliary reward function, which not only helps to plan a better path for ship but also improves the convergence speed of the model. Secondly, aiming at the problem that too many turning points exist in the above-planned path which may increase the navigation risk, an improved DP algorithm is proposed to further optimize the planned path to make the final path more safe and economical. Finally, simulation experiments are carried out to verify the proposed method from the aspects of plan planning effect and convergence trend. Results show that the proposed method can plan safe and economic navigation paths and has good stability and convergence.


AI ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 366-382
Author(s):  
Zhihan Xue ◽  
Tad Gonsalves

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.


Author(s):  
Igor Kuznetsov ◽  
Andrey Filchenkov

Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.


Sign in / Sign up

Export Citation Format

Share Document