Multiple-Model Based Defense for Deep Reinforcement Learning Against Adversarial Attack

2021 ◽  
pp. 42-53
Author(s):  
Patrick P. K. Chan ◽  
Yaxuan Wang ◽  
Natasha Kees ◽  
Daniel S. Yeung
2002 ◽  
Vol 14 (6) ◽  
pp. 1347-1369 ◽  
Author(s):  
Kenji Doya ◽  
Kazuyuki Samejima ◽  
Ken-ichi Katagiri ◽  
Mitsuo Kawato

We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The “responsibility signal,” which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.


Author(s):  
Kazuyuki Samejima ◽  
Ken'Ichi Katagiri ◽  
Kenji Doya ◽  
Mitsuo Kawato

2007 ◽  
Vol 20 (6) ◽  
pp. 668-675 ◽  
Author(s):  
Mathieu Bertin ◽  
Nicolas Schweighofer ◽  
Kenji Doya

2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2085
Author(s):  
Xue-Bo Jin ◽  
Ruben Jonhson Robert RobertJeremiah ◽  
Ting-Li Su ◽  
Yu-Ting Bai ◽  
Jian-Lei Kong

State estimation is widely used in various automated systems, including IoT systems, unmanned systems, robots, etc. In traditional state estimation, measurement data are instantaneous and processed in real time. With modern systems’ development, sensors can obtain more and more signals and store them. Therefore, how to use these measurement big data to improve the performance of state estimation has become a hot research issue in this field. This paper reviews the development of state estimation and future development trends. First, we review the model-based state estimation methods, including the Kalman filter, such as the extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc. Particle filters and Gaussian mixture filters that can handle mixed Gaussian noise are discussed, too. These methods have high requirements for models, while it is not easy to obtain accurate system models in practice. The emergence of robust filters, the interacting multiple model (IMM), and adaptive filters are also mentioned here. Secondly, the current research status of data-driven state estimation methods is introduced based on network learning. Finally, the main research results for hybrid filters obtained in recent years are summarized and discussed, which combine model-based methods and data-driven methods. This paper is based on state estimation research results and provides a more detailed overview of model-driven, data-driven, and hybrid-driven approaches. The main algorithm of each method is provided so that beginners can have a clearer understanding. Additionally, it discusses the future development trends for researchers in state estimation.


2020 ◽  
Vol 68 (8) ◽  
pp. 612-624
Author(s):  
Max Pritzkoleit ◽  
Robert Heedt ◽  
Carsten Knoll ◽  
Klaus Röbenack

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.


Sign in / Sign up

Export Citation Format

Share Document