Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space -- and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action space do not deal appropriately with such situations. The selective desensitization neural network (SDNN) has high generalization ability and robustness against noise and redundant input. We therefore propose an SDNN-based value-function approximator for Q-learning in continuous state-action space, and evaluate its performance in terms of robustness against redundant input and sensor noise. Results show that our proposal is strongly robust against noise and redundant input and enables the agent to take better actions by using additional inputs without degrading learning efficiency. These properties are eminently advantageous in real-world applications such as in robotic systems.

Download Full-text

Q-learning in continuous state-action space with redundant dimensions by using a selective desensitization neural network

2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS) ◽

10.1109/scis-isis.2014.7044714 ◽

2014 ◽

Cited By ~ 1

Author(s):

Takaaki Kobayashi ◽

Takeshi Shibuya ◽

Masahiko Morita

Keyword(s):

Neural Network ◽

Action Space ◽

Q Learning ◽

State Action ◽

Continuous State

Download Full-text

Reinforcement Distribution in Continuous State Action Space Fuzzy Q–Learning: A Novel Approach

Fuzzy Logic and Applications - Lecture Notes in Computer Science ◽

10.1007/11676935_5 ◽

2006 ◽

pp. 40-45

Author(s):

Andrea Bonarini ◽

Francesco Montrone ◽

Marcello Restelli

Keyword(s):

Action Space ◽

Q Learning ◽

State Action ◽

Novel Approach ◽

Continuous State ◽

Reinforcement Distribution

Download Full-text

Autonomous control of real snake-like robot using reinforcement learning; Abstraction of state-action space using properties of real world

2007 3rd International Conference on Intelligent Sensors, Sensor Networks and Information ◽

10.1109/issnip.2007.4496875 ◽

2007 ◽

Cited By ~ 5

Author(s):

Kazuyuki Ito ◽

Yoshitaka Fukumori ◽

Akihiro Takayama

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Autonomous Control ◽

Action Space ◽

State Action

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Mathematics ◽

10.3390/math8091479 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1479

Author(s):

Francisco Martinez-Gil ◽

Miguel Lozano ◽

Ignacio García-Fernández ◽

Pau Romero ◽

Dolors Serra ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Machine Learning Techniques ◽

Inverse Reinforcement Learning ◽

The Real ◽

Q Learning ◽

Learning Framework ◽

Entropy Principle ◽

Real Behavior ◽

Function Approximator

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Download Full-text

Adaptive Object Tracking via Multi-Angle Analysis Collaboration

Sensors ◽

10.3390/s18113606 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3606 ◽

Cited By ~ 1

Author(s):

Wanli Xue ◽

Zhiyong Feng ◽

Chao Xu ◽

Zhaopeng Meng ◽

Chengwei Zhang

Keyword(s):

Object Tracking ◽

Learning Algorithm ◽

Action Space ◽

Selection Strategy ◽

Multiple Perspectives ◽

Strategic Framework ◽

Practical Applications ◽

Q Learning ◽

State Action ◽

Speed And Accuracy

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer’s attention and object’s motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and ϵ -greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker.

Download Full-text

Swarm Reinforcement Learning Methods for Problems with Continuous State-action Space

Transactions of the Society of Instrument and Control Engineers ◽

10.9746/sicetr.48.790 ◽

2012 ◽

Vol 48 (11) ◽

pp. 790-798

Author(s):

Hitoshi IIMA ◽

Yasuaki KUROE

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Methods ◽

State Action ◽

Continuous State

Download Full-text

Ramp Metering for a Distant Downstream Bottleneck Using Reinforcement Learning with Value Function Approximation

Journal of Advanced Transportation ◽

10.1155/2020/8813467 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yue Zhou ◽

Kaan Ozbay ◽

Pushkin Kachroo ◽

Fan Zuo

Keyword(s):

Time Delay ◽

Traffic Flow ◽

Value Function ◽

Ramp Metering ◽

Lookup Table ◽

Long Distance ◽

Q Learning ◽

Continuous State ◽

Delay Effects ◽

Flow Evolution

Ramp metering for a bottleneck located far downstream of the ramp is more challenging than for a bottleneck that is near the ramp. This is because under the control of a conventional linear feedback-type ramp metering strategy, when metered traffic from the ramp arrive at the distant downstream bottleneck, the state of the bottleneck may have significantly changed from when it is sampled for computing the metering rate; due to the considerable time, these traffic will have to take to traverse the long distance between the ramp and the bottleneck. As a result of such time-delay effects, significant stability issue can arise. Previous studies have mainly resorted to compensating for the time-delay effects by incorporating predictors of traffic flow evolution into the control systems. This paper presents an alternative approach. The problem of ramp metering for a distant downstream bottleneck is formulated as a Q-learning problem, in which an intelligent ramp meter agent learns a nonlinear optimal ramp metering policy such that the capacity of the distant downstream bottleneck can be fully utilized, but not to be exceeded to cause congestion. The learned policy is in pure feedback form in that only the current state of the environment is needed to determine the optimal metering rate for the current time. No prediction is needed, as anticipation of traffic flow evolution has been instilled into the nonlinear feedback policy via learning. To deal with the intimidating computational cost associated with the multidimensional continuous state space, the value function of actions is approximated by an artificial neural network, rather than a lookup table. The mechanism and development of the approximate value function and how learning of its parameters is integrated into the Q-learning process are well explained. Through experiments, the learned ramp metering policy has demonstrated effectiveness and benign stability and some level of robustness to demand uncertainties.

Download Full-text

Contextual Behaviors and Internal Representations Acquired by Reinforcement Learning with a Recurrent Neural Network in a Continuous State and Action Space Task

Advances in Neuro-Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-03040-6_118 ◽

2009 ◽

pp. 970-978 ◽

Cited By ~ 7

Author(s):

Hiroki Utsunomiya ◽

Katsunari Shibata

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Recurrent Neural Network ◽

Action Space ◽

Internal Representations ◽

Continuous State

Download Full-text

SpeeChin

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3494987 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-23

Author(s):

Ruidong Zhang ◽

Mingyang Chen ◽

Benjamin Steeper ◽

Yaxuan Li ◽

Zihan Yan ◽

...

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Real World ◽

User Study ◽

Imaging System ◽

User Studies ◽

Ir Imaging ◽

Real World Applications ◽

Challenges And Opportunities ◽

Silent Speech

This paper presents SpeeChin, a smart necklace that can recognize 54 English and 44 Chinese silent speech commands. A customized infrared (IR) imaging system is mounted on a necklace to capture images of the neck and face from under the chin. These images are first pre-processed and then deep learned by an end-to-end deep convolutional-recurrent-neural-network (CRNN) model to infer different silent speech commands. A user study with 20 participants (10 participants for each language) showed that SpeeChin could recognize 54 English and 44 Chinese silent speech commands with average cross-session accuracies of 90.5% and 91.6%, respectively. To further investigate the potential of SpeeChin in recognizing other silent speech commands, we conducted another study with 10 participants distinguishing between 72 one-syllable nonwords. Based on the results from the user studies, we further discuss the challenges and opportunities of deploying SpeeChin in real-world applications.

Download Full-text