policy optimization Latest Research Papers

Policy optimization for vibration isolator stiffness control during agile attitude maneuvers

Mechanical Systems and Signal Processing ◽

10.1016/j.ymssp.2021.108279 ◽

2022 ◽

Vol 165 ◽

pp. 108279

Author(s):

Xuexuan Zhao ◽

Zhaokui Wang ◽

Gangtie Zheng

Keyword(s):

Vibration Isolator ◽

Stiffness Control ◽

Policy Optimization

An Efficient Hyperparameter Control Method for a Network Intrusion Detection System Based on Proximal Policy Optimization

Symmetry ◽

10.3390/sym14010161 ◽

2022 ◽

Vol 14 (1) ◽

pp. 161

Author(s):

Hyojoon Han ◽

Hyukho Kim ◽

Yangwoo Kim

Keyword(s):

Reinforcement Learning ◽

Intrusion Detection ◽

Network Traffic ◽

Network Intrusion Detection ◽

Network Environment ◽

Network Intrusion ◽

Feature Extractor ◽

Attack Data ◽

Policy Optimization ◽

Reinforcement Learning Model

The complexity of network intrusion detection systems (IDSs) is increasing due to the continuous increases in network traffic, various attacks and the ever-changing network environment. In addition, network traffic is asymmetric with few attack data, but the attack data are so complex that it is difficult to detect one. Many studies on improving intrusion detection performance using feature engineering have been conducted. These studies work well in the dataset environment; however, it is challenging to cope with a changing network environment. This paper proposes an intrusion detection hyperparameter control system (IDHCS) that controls and trains a deep neural network (DNN) feature extractor and k-means clustering module as a reinforcement learning model based on proximal policy optimization (PPO). An IDHCS controls the DNN feature extractor to extract the most valuable features in the network environment, and identifies intrusion through k-means clustering. Through iterative learning using the PPO-based reinforcement learning model, the system is optimized to improve performance automatically according to the network environment, where the IDHCS is used. Experiments were conducted to evaluate the system performance using the CICIDS2017 and UNSW-NB15 datasets. In CICIDS2017, an F1-score of 0.96552 was achieved and UNSW-NB15 achieved an F1-score of 0.94268. An experiment was conducted by merging the two datasets to build a more extensive and complex test environment. By merging datasets, the attack types in the experiment became more diverse and their patterns became more complex. An F1-score of 0.93567 was achieved in the merged dataset, indicating 97% to 99% performance compared with CICIDS2017 and UNSW-NB15. The results reveal that the proposed IDHCS improved the performance of the IDS by automating learning new types of attacks by managing intrusion detection features regardless of the network environment changes through continuous learning.

A Hybrid Technique for Active SLAM Based on RPPO Model with Transfer Learning

10.21203/rs.3.rs-1229897/v1 ◽

2022 ◽

Author(s):

Shuhuan Wen ◽

Zhixin Ji ◽

Ahmad B. Rad ◽

Zhengzheng Guo

Keyword(s):

Transfer Learning ◽

Autonomous Navigation ◽

Action Plan ◽

A Priori ◽

Hybrid Technique ◽

Free Exploration ◽

Unknown Environments ◽

Localization And Mapping ◽

Slam Algorithm ◽

Policy Optimization

Abstract The problem of exploration in unknown environments is still a great challenge for autonomous mobile robots due to the lack of a priori knowledge. Active Simultaneous Localization and Mapping (SLAM) is an effective method to realize obstacle avoidance and autonomous navigation. Traditional Active SLAM is usually complex to model and difficult to adapt automatically to new operating areas. This paper presents a novel Active SLAM algorithm based on Deep Reinforcement Learning (DRL). The Relational Proximal Policy Optimization (RPPO) model with deep separable convolution and data batch processing is used to predict the action strategy and generate the action plan through the acquired environment RGB images, so as to realize the autonomous collision free exploration of the environment. Meanwhile, Gmapping is applied to locate and map the environment. Then, based on Transfer Learning, Active SLAM algorithm is applied to complex unknown environments with various dynamic and static obstacles. Finally, we present several experiments to demonstrate the advantages and feasibility of the proposed Active SLAM algorithm.

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Deep Reinforcement Learning Techniques For Solving Hybrid Flow Shop Scheduling Problems: Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C)

10.24251/hicss.2022.206 ◽

2022 ◽

Author(s):

Abdulrahman Nahhas ◽

Andrey Kharitonov ◽

Klaus Turowski

Keyword(s):

Reinforcement Learning ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Hybrid Flow Shop ◽

Scheduling Problems ◽

Shop Scheduling ◽

Learning Techniques ◽

Hybrid Flow Shop Scheduling ◽

Policy Optimization

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Communications in Computer and Information Science - Cognitive Systems and Information Processing ◽

10.1007/978-981-16-9247-5_28 ◽

2022 ◽

pp. 360-373

Author(s):

Guoyu Zuo ◽

Zhipeng Tian ◽

Shuai Huang ◽

Daoxiong Gong

Keyword(s):

Reinforcement Learning ◽

Policy Optimization ◽

Dynamics Models

Controlling Agents by Constrained Policy Updates

SYSTEM THEORY, CONTROL AND COMPUTING JOURNAL ◽

10.52846/stccj.2021.1.2.24 ◽

2021 ◽

Vol 1 (2) ◽

pp. 33-39

Author(s):

Mónika Farsang ◽

Luca Szegletes

Keyword(s):

Gradient Methods ◽

Poor Performance ◽

High Dimensional ◽

Complex Behavior ◽

Clear Trend ◽

Learned Behavior ◽

Optimal Behavior ◽

Policy Gradient ◽

Low Dimensional ◽

Policy Optimization

Learning the optimal behavior is the ultimate goal in reinforcement learning. This can be achieved by many different approaches, the most successful of them are policy gradient methods. However, they can suffer from undesirably large updates of policies, leading to poor performance. In recent years there has been a clear trend toward designing more reliable algorithms. This paper addresses to examine different restriction strategies applied to the widely used Proximal Policy Optimization (PPO-Clip) technique. We also question whether the analyzed methods are able to adapt not only to low-dimensional tasks but also to complex, high-dimensional problems in control and robotic domains. The analysis of the learned behavior shows that these methods can lead to better performance compared to the original PPO-Clip algorithm, moreover, they are also able to achieve complex behavior and policies in high-dimensional environments.

Evaluating the Effectiveness of Regional Ecological Civilization Policy: Evidence from Jiangsu Province, China

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph19010388 ◽

2021 ◽

Vol 19 (1) ◽

pp. 388

Author(s):

Lingyun Mi ◽

Tianwen Jia ◽

Yang Yang ◽

Lulu Jiang ◽

Bangjun Wang ◽

...

Keyword(s):

Economic Development ◽

Environmental Protection ◽

Social Life ◽

Evaluation Model ◽

Jiangsu Province ◽

Evaluation Index System ◽

Ecological Civilization ◽

Growth Trend ◽

Policy Optimization ◽

Textual Content

Evaluating the effectiveness of ecological civilization policies is the basis from which policymakers can optimize policies. From the perspective of the overall effectiveness of regional policies, and taking Jiangsu Province as an example, this study constructed a quantitative evaluation model of eco-civilization policy text and an eco-civilization evaluation index system. Using these tools, this paper evaluates the effectiveness of 53 ecological civilization policies issued by Jiangsu Province during 2004–2019 to promote the construction of ecological civilization in the four fields of resource utilization, environmental protection, economic development, and social life. There are three key findings. (1) During the period of 2004–2019, the effectiveness of the textual content of ecological civilization policies in Jiangsu Province generally showed a fluctuating upward trend. (2) The construction effectiveness indexes of the four fields of eco-civilization all showed a growth trend, but the construction effect varied greatly. The index of economic development had grown rapidly, while environmental protection had grown slowly. (3) Ecological civilization policies in Jiangsu Province were effective in promoting the construction of ecological civilization. However, the effects of different policy dimensions on ecological civilization development in the four fields were significantly different. Finally, based on these results, powerful recommendations are provided for the optimization of eco-civilization policies in Jiangsu Province. Moreover, Jiangsu is the first province in China to launch a provincial-level ecological civilization construction plan. Its policy optimization to promote ecological civilization construction can also provide an example and realistic basis for reference for the construction of eco-civilization in other provinces in China.

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

10.21203/rs.3.rs-1156014/v1 ◽

2021 ◽

Author(s):

Zikai Feng ◽

Yuanyuan Wu ◽

Mengxing Huang ◽

Di Wu

Keyword(s):

Reinforcement Learning ◽

Stackelberg Game ◽

Learning Algorithm ◽

Computational Cost ◽

Stackelberg Equilibrium ◽

Large State Space ◽

Comparable Performance ◽

Aerial Vehicle ◽

Multi Agent ◽

Policy Optimization

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211787 ◽

2021 ◽

pp. 1-10

Author(s):

Wei Zhou ◽

Xing Jiang ◽

Bingli Guo (Member, IEEE) ◽

Lingyu Meng

Keyword(s):

Software Defined Network ◽

Training Time ◽

Good Convergence ◽

Reward Function ◽

Routing Optimization ◽

Policy Gradient ◽

Discrete Action ◽

Policy Optimization ◽

Optimization Mechanism ◽

Network Pattern

Currently, Quality-of-Service (QoS)-aware routing is one of the crucial challenges in Software Defined Network (SDN). The QoS performances, e.g. latency, packet loss ratio and throughput, must be optimized to improve the performance of network. Traditional static routing algorithms based on Open Shortest Path First (OSPF) could not adapt to traffic fluctuation, which may cause severe network congestion and service degradation. Central intelligence of SDN controller and recent breakthroughs of Deep Reinforcement Learning (DRL) pose a promising solution to tackle this challenge. Thus, we propose an on-policy DRL mechanism, namely the PPO-based (Proximal Policy Optimization) QoS-aware Routing Optimization Mechanism (PQROM), to achieve a general and re-customizable routing optimization. PQROM can dynamically update the routing calculation by adjusting the reward function according to different optimization objectives, and it is independent of any specific network pattern. Additionally, as a black-box one-step optimization, PQROM is qualified for both continuous and discrete action space with high-dimensional input and output. The OMNeT ++ simulation experiment results show that PQROM not only has good convergence, but also has better stability compared with OSPF, less training time and simpler hyper-parameters adjustment than Deep Deterministic Policy Gradient (DDPG) and less hardware consumption than Asynchronous Advantage Actor-Critic (A3C).

policy optimization
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Policy optimization for vibration isolator stiffness control during agile attitude maneuvers

An Efficient Hyperparameter Control Method for a Network Intrusion Detection System Based on Proximal Policy Optimization

A Hybrid Technique for Active SLAM Based on RPPO Model with Transfer Learning

Proximal policy optimization with model-based methods

Deep Reinforcement Learning Techniques For Solving Hybrid Flow Shop Scheduling Problems: Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C)

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Controlling Agents by Constrained Policy Updates

Evaluating the Effectiveness of Regional Ecological Civilization Policy: Evidence from Jiangsu Province, China

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

Export Citation Format

policy optimizationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Policy optimization for vibration isolator stiffness control during agile attitude maneuvers

An Efficient Hyperparameter Control Method for a Network Intrusion Detection System Based on Proximal Policy Optimization

A Hybrid Technique for Active SLAM Based on RPPO Model with Transfer Learning

Proximal policy optimization with model-based methods

Deep Reinforcement Learning Techniques For Solving Hybrid Flow Shop Scheduling Problems: Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C)

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Controlling Agents by Constrained Policy Updates

Evaluating the Effectiveness of Regional Ecological Civilization Policy: Evidence from Jiangsu Province, China

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

PQROM: To optimize software defined network QoS-aware routing with proximal policy optimization

policy optimization
Recently Published Documents