Accelerating Reinforcement Learning through Implicit Imitation

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

2P1-L5 Segmentation of the State Space based on Bayesian Discrimination for Reinforcement Learning

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2001.64_1 ◽

2001 ◽

Vol 2001 (0) ◽

pp. 64

Author(s):

K. Yamada ◽

K. Ookura ◽

K. Ueda

Keyword(s):

Reinforcement Learning ◽

State Space ◽

The State

A Temporal Difference GNG-Based Approach for the State Space Quantization in Reinforcement Learning Environments

2013 IEEE 25th International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2013.89 ◽

2013 ◽

Cited By ~ 1

Author(s):

Davi Carnauba De Lima Vieira ◽

Paulo Jorge Leitao Adeodato ◽

Paulo Mauricio Goncalves

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Environments ◽

The State ◽

Temporal Difference

An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

Journal of Control Science and Engineering ◽

10.1155/2014/270548 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Yong Song ◽

Yibin Li ◽

Xiaoli Wang ◽

Xin Ma ◽

Jiuhong Ruan

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Convergence Speed ◽

Exponential Increase ◽

Cooperative Behaviors ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

AI Communications ◽

10.3233/aic-201580 ◽

2021 ◽

pp. 1-15

Author(s):

Theresa Ziemke ◽

Lucas N. Alegre ◽

Ana L.C. Bazzan

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Traffic Signals ◽

The State ◽

Signal Control ◽

Traffic Signal Control ◽

Rule Based ◽

Fourier Basis ◽

Linear Function Approximation

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.

The State-space Design Research of MPPT based on Reinforcement Learning in PV System

10.1109/peas53589.2021.9628732 ◽

2021 ◽

Author(s):

Dingyi Lin ◽

Xingshuo Li ◽

Shuye Ding

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Design Research ◽

The State ◽

Pv System ◽

Space Design

Safe Deployment of a Reinforcement Learning Robot Using Self Stabilization

10.36227/techrxiv.14842245.v1 ◽

2021 ◽

Author(s):

Nanda Kishore Sreenivas ◽

Shrisha Rao

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Autonomous Vehicles ◽

Safe Space ◽

Training Phase ◽

Prior Work ◽

Industrial Systems ◽

Learning Agent ◽

Improved Performance ◽

Action Spaces

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.

Simpler Learning of Robotic Manipulation of Clothing by Utilizing DIY Smart Textile Technology

Applied Sciences ◽

10.3390/app10124088 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4088

Author(s):

Andreas Verleysen ◽

Thomas Holvoet ◽

Remko Proesmans ◽

Cedric Den Haese ◽

Francis wyffels

Keyword(s):

Reinforcement Learning ◽

Low Cost ◽

Tactile Sensor ◽

The State ◽

Learning Needs ◽

Deformable Objects ◽

Deformable Object ◽

Reward Function ◽

Rectangular Patch ◽

Learning Agent

Deformable objects such as ropes, wires, and clothing are omnipresent in society and industry but are little researched in robotics research. This is due to the infinite amount of possible state configurations caused by the deformations of the deformable object. Engineered approaches try to cope with this by implementing highly complex operations in order to estimate the state of the deformable object. This complexity can be circumvented by utilizing learning-based approaches, such as reinforcement learning, which can deal with the intrinsic high-dimensional state space of deformable objects. However, the reward function in reinforcement learning needs to measure the state configuration of the highly deformable object. Vision-based reward functions are difficult to implement, given the high dimensionality of the state and complex dynamic behavior. In this work, we propose the consideration of concepts beyond vision and incorporate other modalities which can be extracted from deformable objects. By integrating tactile sensor cells into a textile piece, proprioceptive capabilities are gained that are valuable as they provide a reward function to a reinforcement learning agent. We demonstrate on a low-cost dual robotic arm setup that a physical agent can learn on a single CPU core to fold a rectangular patch of textile in the real world based on a learned reward function from tactile information.

1A1-D10 Cooperative Behavior Acquisition with Reinforcement Learning Robots Based on the Mechanism of Selecting the State Space Representations(Evolution and Learning for Robotics(1))

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2012._1a1-d10_1 ◽

2012 ◽

Vol 2012 (0) ◽

pp. _1A1-D10_1-_1A1-D10_4

Author(s):

Koki KAGE ◽

Junki SAKANOUE ◽

Toshiyuki YASUDA ◽

Kazuhiro OHKURA

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Cooperative Behavior ◽

The State

Safe Deployment of a Reinforcement Learning Robot Using Self Stabilization

10.36227/techrxiv.14842245 ◽

2021 ◽

Author(s):

Nanda Kishore Sreenivas ◽

Shrisha Rao

Keyword(s):

Reinforcement Learning ◽

Finite Number ◽

Autonomous Vehicles ◽

Safe Space ◽

Training Phase ◽

Prior Work ◽

Industrial Systems ◽

Learning Agent ◽

Improved Performance ◽

Action Spaces

In toy environments like video games, a reinforcement learning agent is deployed and operates within the same state space in which it was trained. However, in robotics applications such as industrial systems or autonomous vehicles, this cannot be guaranteed. A robot can be pushed out of its training space by some unforeseen perturbation, which may cause it to go into an unknown state from which it has not been trained to move towards its goal. While most prior work in the area of RL safety focuses on ensuring safety in the training phase, this paper focuses on ensuring the safe deployment of a robot that has already been trained to operate within a safe space. This work defines a condition on the state and action spaces, that if satisfied, guarantees the robot's recovery to safety independently. We also propose a strategy and design that facilitate this recovery within a finite number of steps after perturbation. This is implemented and tested against a standard RL model, and the results indicate a much-improved performance.