scholarly journals Accelerating Reinforcement Learning through Implicit Imitation

2003 ◽  
Vol 19 ◽  
pp. 569-629 ◽  
Author(s):  
B. Price ◽  
C. Boutilier

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


2021 ◽  
Vol 2 (1) ◽  
pp. 1-25
Author(s):  
Yongsen Ma ◽  
Sheheryar Arshad ◽  
Swetha Muniraju ◽  
Eric Torkildson ◽  
Enrico Rantala ◽  
...  

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.



2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Yong Song ◽  
Yibin Li ◽  
Xiaoli Wang ◽  
Xin Ma ◽  
Jiuhong Ruan

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.



2021 ◽  
pp. 1-15
Author(s):  
Theresa Ziemke ◽  
Lucas N. Alegre ◽  
Ana L.C. Bazzan

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.





2021 ◽  
Author(s):  
Nanda Kishore Sreenivas ◽  
Shrisha Rao

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.



2020 ◽  
Vol 10 (12) ◽  
pp. 4088
Author(s):  
Andreas Verleysen ◽  
Thomas Holvoet ◽  
Remko Proesmans ◽  
Cedric Den Haese ◽  
Francis wyffels

Deformable objects such as ropes, wires, and clothing are omnipresent in society and industry but are little researched in robotics research. This is due to the infinite amount of possible state configurations caused by the deformations of the deformable object. Engineered approaches try to cope with this by implementing highly complex operations in order to estimate the state of the deformable object. This complexity can be circumvented by utilizing learning-based approaches, such as reinforcement learning, which can deal with the intrinsic high-dimensional state space of deformable objects. However, the reward function in reinforcement learning needs to measure the state configuration of the highly deformable object. Vision-based reward functions are difficult to implement, given the high dimensionality of the state and complex dynamic behavior. In this work, we propose the consideration of concepts beyond vision and incorporate other modalities which can be extracted from deformable objects. By integrating tactile sensor cells into a textile piece, proprioceptive capabilities are gained that are valuable as they provide a reward function to a reinforcement learning agent. We demonstrate on a low-cost dual robotic arm setup that a physical agent can learn on a single CPU core to fold a rectangular patch of textile in the real world based on a learned reward function from tactile information.



2021 ◽  
Author(s):  
Nanda Kishore Sreenivas ◽  
Shrisha Rao

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.



Sign in / Sign up

Export Citation Format

Share Document