policy gradient Latest Research Papers

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Multimedia Tools and Applications ◽

10.1007/s11042-021-11437-3 ◽

2022 ◽

Author(s):

Óscar Pérez-Gil ◽

Rafael Barea ◽

Elena López-Guillén ◽

Luis M. Bergasa ◽

Carlos Gómez-Huélamo ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Vehicle Control ◽

Data Sources ◽

Simulation Environment ◽

Urban Simulation ◽

Policy Gradient ◽

Almost All ◽

Control Layer

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.

Download Full-text

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

The Aeronautical Journal ◽

10.1017/aer.2021.112 ◽

2022 ◽

pp. 1-20

Author(s):

D. Xu ◽

G. Chen

Keyword(s):

Reinforcement Learning ◽

Safety Factor ◽

Cooperative Control ◽

Learning Framework ◽

Control Stage ◽

Autonomous Planning ◽

Operational Safety ◽

Policy Gradient ◽

Multi Agent ◽

Reward Mechanism

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Download Full-text

Optimal Operation of a Microgrid with Hydrogen Storage Based on Deep Reinforcement Learning

Electronics ◽

10.3390/electronics11020196 ◽

2022 ◽

Vol 11 (2) ◽

pp. 196

Author(s):

Zhenshan Zhu ◽

Zhimin Weng ◽

Hailin Zheng

Keyword(s):

Hydrogen Storage ◽

Interpolation Method ◽

Linear Interpolation ◽

Optimal Operation ◽

Gradient Algorithm ◽

Sequential Decision ◽

Generalization Capability ◽

Characteristic Model ◽

Decision Making Problem ◽

Policy Gradient

Microgrid with hydrogen storage is an effective way to integrate renewable energy and reduce carbon emissions. This paper proposes an optimal operation method for a microgrid with hydrogen storage. The electrolyzer efficiency characteristic model is established based on the linear interpolation method. The optimal operation model of microgrid is incorporated with the electrolyzer efficiency characteristic model. The sequential decision-making problem of the optimal operation of microgrid is solved by a deep deterministic policy gradient algorithm. Simulation results show that the proposed method can reduce about 5% of the operation cost of the microgrid compared with traditional algorithms and has a certain generalization capability.

Download Full-text

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

10.36227/techrxiv.17868425.v1 ◽

2022 ◽

Author(s):

Dariel Pereira-Ruisánchez ◽

Óscar Fresnedo ◽

Darian Pérez-Adán ◽

Luis Castedo

Keyword(s):

Reinforcement Learning ◽

Phase Shift ◽

Communication Systems ◽

Mimo System ◽

Multiple Input Multiple Output ◽

Joint Optimization ◽

Mimo Communication ◽

Continuous State ◽

Policy Gradient ◽

Sum Rate

<div>The deep reinforcement learning (DRL)-based deep deterministic policy gradient (DDPG) framework is proposed to solve the joint optimization of the IRS phase-shift matrix and the precoding matrix in an IRS-assisted multi-stream multi-user MIMO communication.<br></div><div><br></div><div>The combination of multiple-input multiple-output(MIMO) communications and intelligent reflecting surfaces(IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6Gsystems. In this work, we develop an innovative deep reinforcement learning (DRL)-based approach to the joint optimization of the MIMO precoders and the IRS phase-shift matrices that is proved to be efficient in high dimensional systems. The proposed approach is termed deep deterministic policy gradient (DDPG)and maximizes the sum rate of an IRS-assisted multi-stream(MS) multi-user MIMO (MU-MIMO) system by learning the best matrix configuration through online trial-and-error interactions. The proposed approach is formulated in terms of continuous state and action spaces, and a sum-rate-based reward function. The computational complexity is reduced by using artificial neural networks (ANNs) for function approximations and it is shown that the proposed solution scales better than other state-of-the-art methods, while reaching a competitive performance.<br></div>

Download Full-text

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

10.36227/techrxiv.17868425 ◽

2022 ◽

Author(s):

Dariel Pereira-Ruisánchez ◽

Óscar Fresnedo ◽

Darian Pérez-Adán ◽

Luis Castedo

Keyword(s):

Reinforcement Learning ◽

Phase Shift ◽

Communication Systems ◽

Mimo System ◽

Multiple Input Multiple Output ◽

Joint Optimization ◽

Mimo Communication ◽

Continuous State ◽

Policy Gradient ◽

Sum Rate

<div>The deep reinforcement learning (DRL)-based deep deterministic policy gradient (DDPG) framework is proposed to solve the joint optimization of the IRS phase-shift matrix and the precoding matrix in an IRS-assisted multi-stream multi-user MIMO communication.<br></div><div><br></div><div>The combination of multiple-input multiple-output(MIMO) communications and intelligent reflecting surfaces(IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6Gsystems. In this work, we develop an innovative deep reinforcement learning (DRL)-based approach to the joint optimization of the MIMO precoders and the IRS phase-shift matrices that is proved to be efficient in high dimensional systems. The proposed approach is termed deep deterministic policy gradient (DDPG)and maximizes the sum rate of an IRS-assisted multi-stream(MS) multi-user MIMO (MU-MIMO) system by learning the best matrix configuration through online trial-and-error interactions. The proposed approach is formulated in terms of continuous state and action spaces, and a sum-rate-based reward function. The computational complexity is reduced by using artificial neural networks (ANNs) for function approximations and it is shown that the proposed solution scales better than other state-of-the-art methods, while reaching a competitive performance.<br></div>

Download Full-text

Obstacle avoidance strategy for an autonomous surface vessel based on modified deep deterministic policy gradient

Ocean Engineering ◽

10.1016/j.oceaneng.2021.110166 ◽

2022 ◽

Vol 243 ◽

pp. 110166

Author(s):

Chang Zhou ◽

Yiting Wang ◽

Lei Wang ◽

Huacheng He

Keyword(s):

Obstacle Avoidance ◽

Policy Gradient ◽

Avoidance Strategy ◽

Surface Vessel

Download Full-text

Reinforcement Learning Energy Management for Hybrid Electric Tracked Vehicle with Deep Deterministic Policy Gradient

Proceedings of China SAE Congress 2020: Selected Papers - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-16-2090-4_53 ◽

2022 ◽

pp. 879-893

Author(s):

Bin Zhang ◽

Jinlong Wu ◽

Yuan Zou ◽

Xudong Zhang

Keyword(s):

Reinforcement Learning ◽

Energy Management ◽

Tracked Vehicle ◽

Policy Gradient ◽

Hybrid Electric

Download Full-text

Optimal pricing and ordering strategies for dual-channel retailing with different shipping policies

Journal of Industrial and Management Optimization ◽

10.3934/jimo.2021227 ◽

2022 ◽

Vol 0 (0) ◽

pp. 0

Author(s):

Ning Li ◽

Zheng Wang

Keyword(s):

Optimal Pricing ◽

Dual Channel ◽

Ordering Policy ◽

Maximum Utility ◽

Policy Gradient ◽

Online Channel ◽

Free Shipping ◽

Shipping Policy ◽

Offline Channels ◽

Better Than

<p style='text-indent:20px;'>In this paper, considering dual-channel retailing (online channel and offline channel), we study the pricing and ordering problem under different shipping policies. In this research, we mainly consider three shipping policies: without shipping price (OSP), with shipping price (WSP) and conditional free shipping (CFP). Based on the principle of maximum utility, we firstly obtain the probability of demand for the online and offline channels and further model the pricing and ordering problem under the three shipping policies. Further, avoiding the curse of dimensionality, the deep deterministic policy gradient (DDPG) method is employed to solve the problem to obtain the optimal pricing and ordering policy. Finally, we conduct some numerical experiments to compare the optimal pricing and ordering quantity under the three different shipping policies and reveal some managerial insights. The results show that the conditional free shipping policy is better than the other two policies, and stimulates the increase of demand to gain more profit.</p>

Download Full-text

A Modified Deep Deterministic Policy Gradient Algorithm for Data-Driven Inventory Management

Journal of the Korean Society of Supply Chain Management ◽

10.25052/kscm.2021.12.21.3.71 ◽

2021 ◽

Vol 21 (3) ◽

pp. 71-89

Author(s):

Byeongkwon Lee ◽

Kun-Soo Park ◽

Se-Youn Jung

Keyword(s):

Inventory Management ◽

Gradient Algorithm ◽

Data Driven ◽

Policy Gradient

Download Full-text

Controlling Agents by Constrained Policy Updates

SYSTEM THEORY, CONTROL AND COMPUTING JOURNAL ◽

10.52846/stccj.2021.1.2.24 ◽

2021 ◽

Vol 1 (2) ◽

pp. 33-39

Author(s):

Mónika Farsang ◽

Luca Szegletes

Keyword(s):

Gradient Methods ◽

Poor Performance ◽

High Dimensional ◽

Complex Behavior ◽

Clear Trend ◽

Learned Behavior ◽

Optimal Behavior ◽

Policy Gradient ◽

Low Dimensional ◽

Policy Optimization

Learning the optimal behavior is the ultimate goal in reinforcement learning. This can be achieved by many different approaches, the most successful of them are policy gradient methods. However, they can suffer from undesirably large updates of policies, leading to poor performance. In recent years there has been a clear trend toward designing more reliable algorithms. This paper addresses to examine different restriction strategies applied to the widely used Proximal Policy Optimization (PPO-Clip) technique. We also question whether the analyzed methods are able to adapt not only to low-dimensional tasks but also to complex, high-dimensional problems in control and robotic domains. The analysis of the learned behavior shows that these methods can lead to better performance compared to the original PPO-Clip algorithm, moreover, they are also able to achieve complex behavior and policies in high-dimensional environments.

Download Full-text

policy gradient
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

Optimal Operation of a Microgrid with Hydrogen Storage Based on Deep Reinforcement Learning

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

Obstacle avoidance strategy for an autonomous surface vessel based on modified deep deterministic policy gradient

Reinforcement Learning Energy Management for Hybrid Electric Tracked Vehicle with Deep Deterministic Policy Gradient

Optimal pricing and ordering strategies for dual-channel retailing with different shipping policies

A Modified Deep Deterministic Policy Gradient Algorithm for Data-Driven Inventory Management

Controlling Agents by Constrained Policy Updates

Export Citation Format

policy gradientRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deep reinforcement learning based control for Autonomous Vehicles in CARLA

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

Optimal Operation of a Microgrid with Hydrogen Storage Based on Deep Reinforcement Learning

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

A Deep Reinforcement Learning Approach to IRS-assisted MU-MIMO Communication Systems

Obstacle avoidance strategy for an autonomous surface vessel based on modified deep deterministic policy gradient

Reinforcement Learning Energy Management for Hybrid Electric Tracked Vehicle with Deep Deterministic Policy Gradient

Optimal pricing and ordering strategies for dual-channel retailing with different shipping policies

A Modified Deep Deterministic Policy Gradient Algorithm for Data-Driven Inventory Management

Controlling Agents by Constrained Policy Updates

policy gradient
Recently Published Documents