scholarly journals Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices

Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1359 ◽  
Author(s):  
Hyun-Kyo Lim ◽  
Ju-Bong Kim ◽  
Joo-Seong Heo ◽  
Youn-Hee Han

Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor–critic proximal policy optimization (Actor–Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.

Author(s):  
Wen Wu ◽  
Kate Saul ◽  
He (Helen) Huang

Abstract Reinforcement learning (RL) has potential to provide innovative solutions to existing challenges in estimating joint moments in motion analysis, such as kinematic or electromyography (EMG) noise and unknown model parameters. Here we explore feasibility of RL to assist joint moment estimation for biomechanical applications. Forearm and hand kinematics and forearm EMGs from 4 muscles during free finger and wrist movement were collected from six healthy subjects. Using the Proximal Policy Optimization approach, we trained and tested two types of RL agents that estimated joint moment based on measured kinematics or measured EMGs, respectively. To quantify the performance of RL agents, the estimated joint moment was used to drive a forward dynamic model for estimating kinematics, which were then compared with measured kinematics. The results demonstrated that both RL agents can accurately reproduce wrist and metacarpophalangeal joint motion. The correlation coefficients between estimated and measured kinematics, derived from the kinematics-driven agent and subject-specific EMG-driven agents, were 0.98±0.01 and 0.94±0.03 for the wrist, respectively, and were 0.95±0.02 and 0.84±0.06 for the metacarpophalangeal joint, respectively. In addition, a biomechanically reasonable joint moment-angle-EMG relationship (i.e. dependence of joint moment on joint angle and EMG) was predicted using only 15 seconds of collected data. In conclusion, this study serves as a proof of concept that an RL approach can assist in biomechanical analysis and human-machine interface applications by deriving joint moments from kinematic or EMG data.


2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Bo Dong ◽  
Yuanchun Li

A novel decentralized reinforcement learning robust optimal tracking control theory for time varying constrained reconfigurable modular robots based on action-critic-identifier (ACI) and state-action value function (Q-function) has been presented to solve the problem of the continuous time nonlinear optimal control policy for strongly coupled uncertainty robotic system. The dynamics of time varying constrained reconfigurable modular robot is described as a synthesis of interconnected subsystem, and continuous time state equation andQ-function have been designed in this paper. Combining with ACI and RBF network, the global uncertainty of the subsystem and the HJB (Hamilton-Jacobi-Bellman) equation have been estimated, where critic-NN and action-NN are used to approximate the optimalQ-function and the optimal control policy, and the identifier is adopted to identify the global uncertainty as well as RBF-NN which is used to update the weights of ACI-NN. On this basis, a novel decentralized robust optimal tracking controller of the subsystem is proposed, so that the subsystem can track the desired trajectory and the tracking error can converge to zero in a finite time. The stability of ACI and the robust optimal tracking controller are confirmed by Lyapunov theory. Finally, comparative simulation examples are presented to illustrate the effectiveness of the proposed ACI and decentralized control theory.


2021 ◽  
Vol 11 (1) ◽  
pp. 6637-6644
Author(s):  
H. El Fazazi ◽  
M. Elgarej ◽  
M. Qbadou ◽  
K. Mansouri

Adaptive e-learning systems are created to facilitate the learning process. These systems are able to suggest the student the most suitable pedagogical strategy and to extract the information and characteristics of the learners. A multi-agent system is a collection of organized and independent agents that communicate with each other to resolve a problem or complete a well-defined objective. These agents are always in communication and they can be homogeneous or heterogeneous and may or may not have common objectives. The application of the multi-agent approach in adaptive e-learning systems can enhance the learning process quality by customizing the contents to students’ needs. The agents in these systems collaborate to provide a personalized learning experience. In this paper, a design of an adaptative e-learning system based on a multi-agent approach and reinforcement learning is presented. The main objective of this system is the recommendation to the students of a learning path that meets their characteristics and preferences using the Q-learning algorithm. The proposed system is focused on three principal characteristics, the learning style according to the Felder-Silverman learning style model, the knowledge level, and the student's possible disabilities. Three types of disabilities were taken into account, namely hearing impairments, visual impairments, and dyslexia. The system will be able to provide the students with a sequence of learning objects that matches their profiles for a personalized learning experience.


2021 ◽  
Author(s):  
Xinglong Zhang ◽  
Yaoqian Peng ◽  
Biao Luo ◽  
Wei Pan ◽  
Xin Xu ◽  
...  

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>


2021 ◽  
Vol 8 ◽  
Author(s):  
S. M. Nahid Mahmud ◽  
Scott A. Nivison ◽  
Zachary I. Bell ◽  
Rushikesh Kamalapurkar

Reinforcement learning has been established over the past decade as an effective tool to find optimal control policies for dynamical systems, with recent focus on approaches that guarantee safety during the learning and/or execution phases. In general, safety guarantees are critical in reinforcement learning when the system is safety-critical and/or task restarts are not practically feasible. In optimal control theory, safety requirements are often expressed in terms of state and/or control constraints. In recent years, reinforcement learning approaches that rely on persistent excitation have been combined with a barrier transformation to learn the optimal control policies under state constraints. To soften the excitation requirements, model-based reinforcement learning methods that rely on exact model knowledge have also been integrated with the barrier transformation framework. The objective of this paper is to develop safe reinforcement learning method for deterministic nonlinear systems, with parametric uncertainties in the model, to learn approximate constrained optimal policies without relying on stringent excitation conditions. To that end, a model-based reinforcement learning technique that utilizes a novel filtered concurrent learning method, along with a barrier transformation, is developed in this paper to realize simultaneous learning of unknown model parameters and approximate optimal state-constrained control policies for safety-critical systems.


Internet of Things (IoT) devices under cloud assistance is deployed in different distributed environment. It collects sensed data and outsources the data to remote server and user for sharing. As IoT is used in important fields like healthcare, business and research, the sensed data are sensitive information which needs to be protected. Encryption is usual technique to protect data from adversaries. A fine grained access control is essential for heterogeneous device involved social network. The existing access control policies were defined for predefined identity and role which needs to be changed in dynamic situations. Moreover, all the necessary policies cannot be defined in advance and new policies were demanded for new situational context. To solve these issues, this work design a model which calculate final trust value based on semantic information dynamically referring to ontology. a access control policy is also designed on semantic role of the device. The semantic technology is used for high level reasoning of the context situation


AIChE Journal ◽  
2021 ◽  
Author(s):  
Max R. Mowbray ◽  
Robin Smith ◽  
Ehecatl A. Del Rio‐Chanona ◽  
Dongda Zhang

2021 ◽  
Author(s):  
Xinglong Zhang ◽  
Yaoqian Peng ◽  
Biao Luo ◽  
Wei Pan ◽  
Xin Xu ◽  
...  

<div>Recently, barrier function-based safe reinforcement learning (RL) with the actor-critic structure for continuous control tasks has received increasing attention. It is still challenging to learn a near-optimal control policy with safety and convergence guarantees. Also, few works have addressed the safe RL algorithm design under time-varying safety constraints. This paper proposes a model-based safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. In the proposed approach, we construct a novel barrier-based control policy structure that can guarantee control safety. A multi-step policy evaluation mechanism is proposed to predict the policy's safety risk under time-varying safety constraints and guide the policy to update safely. Theoretical results on stability and robustness are proven. Also, the convergence of the actor-critic learning algorithm is analyzed. The performance of the proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. Furthermore, the approach is applied to the integrated path following and collision avoidance problem for two real-world intelligent vehicles. A differential-drive vehicle and an Ackermann-drive one are used to verify the offline deployment performance and the online learning performance, respectively. Our approach shows an impressive sim-to-real transfer capability and a satisfactory online control performance in the experiment.</div>


Sign in / Sign up

Export Citation Format

Share Document