Gradient-Aware Model-Based Policy Search

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

Download Full-text

A Novel Model-Based Security Scheme for LoRa Key Generation

Proceedings of the 20th International Conference on Information Processing in Sensor Networks (co-located with CPS-IoT Week 2021) ◽

10.1145/3412382.3458256 ◽

2021 ◽

Author(s):

Jiayao Gao ◽

Weitao Xu ◽

Salil Kanhere ◽

Sanjay Jha ◽

Jun Young Kim ◽

...

Keyword(s):

Key Generation ◽

Security Scheme ◽

Model Based ◽

Novel Model

Download Full-text

Novel model based path planning for multi-axle steered heavy load vehicles

16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013) ◽

10.1109/itsc.2013.6728268 ◽

2013 ◽

Cited By ~ 10

Author(s):

Susann Beyersdorfer ◽

Sebastian Wagner

Keyword(s):

Path Planning ◽

Heavy Load ◽

Model Based ◽

Novel Model

Download Full-text

A novel model-based pitch conversion method for Mandarin speech

10.21437/interspeech.2009-495 ◽

2009 ◽

Author(s):

Hsin-Te Hwang ◽

Chen-Yu Chiang ◽

Po-Yi Sung ◽

Sin-Horng Chen

Keyword(s):

Model Based ◽

Conversion Method ◽

Novel Model

Download Full-text

Proximal policy optimization with model-based methods

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211935 ◽

2022 ◽

pp. 1-12

Author(s):

Shuailong Li ◽

Wei Zhang ◽

Huiwen Zhang ◽

Xin Zhang ◽

Yuquan Leng

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Transition Model ◽

Practical Applications ◽

Original Algorithm ◽

Policy Performance ◽

Model Based ◽

Model Free ◽

Future State ◽

Policy Optimization

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.

Download Full-text