policy gradient
Recently Published Documents


TOTAL DOCUMENTS

600
(FIVE YEARS 427)

H-INDEX

17
(FIVE YEARS 7)

Author(s):  
Óscar Pérez-Gil ◽  
Rafael Barea ◽  
Elena López-Guillén ◽  
Luis M. Bergasa ◽  
Carlos Gómez-Huélamo ◽  
...  

AbstractNowadays, Artificial Intelligence (AI) is growing by leaps and bounds in almost all fields of technology, and Autonomous Vehicles (AV) research is one more of them. This paper proposes the using of algorithms based on Deep Learning (DL) in the control layer of an autonomous vehicle. More specifically, Deep Reinforcement Learning (DRL) algorithms such as Deep Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) are implemented in order to compare results between them. The aim of this work is to obtain a trained model, applying a DRL algorithm, able of sending control commands to the vehicle to navigate properly and efficiently following a determined route. In addition, for each of the algorithms, several agents are presented as a solution, so that each of these agents uses different data sources to achieve the vehicle control commands. For this purpose, an open-source simulator such as CARLA is used, providing to the system with the ability to perform a multitude of tests without any risk into an hyper-realistic urban simulation environment, something that is unthinkable in the real world. The results obtained show that both DQN and DDPG reach the goal, but DDPG obtains a better performance. DDPG perfoms trajectories very similar to classic controller as LQR. In both cases RMSE is lower than 0.1m following trajectories with a range 180-700m. To conclude, some conclusions and future works are commented.


2022 ◽  
pp. 1-20
Author(s):  
D. Xu ◽  
G. Chen

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 196
Author(s):  
Zhenshan Zhu ◽  
Zhimin Weng ◽  
Hailin Zheng

Microgrid with hydrogen storage is an effective way to integrate renewable energy and reduce carbon emissions. This paper proposes an optimal operation method for a microgrid with hydrogen storage. The electrolyzer efficiency characteristic model is established based on the linear interpolation method. The optimal operation model of microgrid is incorporated with the electrolyzer efficiency characteristic model. The sequential decision-making problem of the optimal operation of microgrid is solved by a deep deterministic policy gradient algorithm. Simulation results show that the proposed method can reduce about 5% of the operation cost of the microgrid compared with traditional algorithms and has a certain generalization capability.


2022 ◽  
Author(s):  
Dariel Pereira-Ruisánchez ◽  
Óscar Fresnedo ◽  
Darian Pérez-Adán ◽  
Luis Castedo

<div>The deep reinforcement learning (DRL)-based deep deterministic policy gradient (DDPG) framework is proposed to solve the joint optimization of the IRS phase-shift matrix and the precoding matrix in an IRS-assisted multi-stream multi-user MIMO communication.<br></div><div><br></div><div>The combination of multiple-input multiple-output(MIMO) communications and intelligent reflecting surfaces(IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6Gsystems. In this work, we develop an innovative deep reinforcement learning (DRL)-based approach to the joint optimization of the MIMO precoders and the IRS phase-shift matrices that is proved to be efficient in high dimensional systems. The proposed approach is termed deep deterministic policy gradient (DDPG)and maximizes the sum rate of an IRS-assisted multi-stream(MS) multi-user MIMO (MU-MIMO) system by learning the best matrix configuration through online trial-and-error interactions. The proposed approach is formulated in terms of continuous state and action spaces, and a sum-rate-based reward function. The computational complexity is reduced by using artificial neural networks (ANNs) for function approximations and it is shown that the proposed solution scales better than other state-of-the-art methods, while reaching a competitive performance.<br></div>


2022 ◽  
Author(s):  
Dariel Pereira-Ruisánchez ◽  
Óscar Fresnedo ◽  
Darian Pérez-Adán ◽  
Luis Castedo

<div>The deep reinforcement learning (DRL)-based deep deterministic policy gradient (DDPG) framework is proposed to solve the joint optimization of the IRS phase-shift matrix and the precoding matrix in an IRS-assisted multi-stream multi-user MIMO communication.<br></div><div><br></div><div>The combination of multiple-input multiple-output(MIMO) communications and intelligent reflecting surfaces(IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6Gsystems. In this work, we develop an innovative deep reinforcement learning (DRL)-based approach to the joint optimization of the MIMO precoders and the IRS phase-shift matrices that is proved to be efficient in high dimensional systems. The proposed approach is termed deep deterministic policy gradient (DDPG)and maximizes the sum rate of an IRS-assisted multi-stream(MS) multi-user MIMO (MU-MIMO) system by learning the best matrix configuration through online trial-and-error interactions. The proposed approach is formulated in terms of continuous state and action spaces, and a sum-rate-based reward function. The computational complexity is reduced by using artificial neural networks (ANNs) for function approximations and it is shown that the proposed solution scales better than other state-of-the-art methods, while reaching a competitive performance.<br></div>


2022 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Ning Li ◽  
Zheng Wang

<p style='text-indent:20px;'>In this paper, considering dual-channel retailing (online channel and offline channel), we study the pricing and ordering problem under different shipping policies. In this research, we mainly consider three shipping policies: without shipping price (OSP), with shipping price (WSP) and conditional free shipping (CFP). Based on the principle of maximum utility, we firstly obtain the probability of demand for the online and offline channels and further model the pricing and ordering problem under the three shipping policies. Further, avoiding the curse of dimensionality, the deep deterministic policy gradient (DDPG) method is employed to solve the problem to obtain the optimal pricing and ordering policy. Finally, we conduct some numerical experiments to compare the optimal pricing and ordering quantity under the three different shipping policies and reveal some managerial insights. The results show that the conditional free shipping policy is better than the other two policies, and stimulates the increase of demand to gain more profit.</p>


2021 ◽  
Vol 1 (2) ◽  
pp. 33-39
Author(s):  
Mónika Farsang ◽  
Luca Szegletes

Learning the optimal behavior is the ultimate goal in reinforcement learning. This can be achieved by many different approaches, the most successful of them are policy gradient methods. However, they can suffer from undesirably large updates of policies, leading to poor performance. In recent years there has been a clear trend toward designing more reliable algorithms. This paper addresses to examine different restriction strategies applied to the widely used Proximal Policy Optimization (PPO-Clip) technique. We also question whether the analyzed methods are able to adapt not only to low-dimensional tasks but also to complex, high-dimensional problems in control and robotic domains. The analysis of the learned behavior shows that these methods can lead to better performance compared to the original PPO-Clip algorithm, moreover, they are also able to achieve complex behavior and policies in high-dimensional environments.


Sign in / Sign up

Export Citation Format

Share Document