GA-Based Q-CMAC Applied to Airship Evasion Problem

The purpose of this research is to acquire an adaptive control policy of an airship in a dynamic, continuous environment based on reinforcement learning combined with evolutionary construction. The state space for reinforcement learning becomes huge because the airship has great inertia and must sense huge amounts of information from a continuous environment to behave appropriately. To reduce and suitably segment state space, we propose combining CMAC-based Q-learning and its evolutionary state space layer construction. Simulation showed the acquisition of state space segmentation enabling airships to learn effectively.

Download Full-text

Extended Q-Learning: Reinforcement Learning Using Self-Organized State Space

RoboCup 2000: Robot Soccer World Cup IV - Lecture Notes in Computer Science ◽

10.1007/3-540-45324-5_11 ◽

2001 ◽

pp. 129-138 ◽

Cited By ~ 2

Author(s):

Shuichi Enokida ◽

Takeshi Ohasi ◽

Takaichi Yoshida ◽

Toshiaki Ejima

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Q Learning ◽

Self Organized ◽

Learning Reinforcement

Download Full-text

Accelerating Reinforcement Learning through Implicit Imitation

Journal of Artificial Intelligence Research ◽

10.1613/jair.898 ◽

2003 ◽

Vol 19 ◽

pp. 569-629 ◽

Cited By ~ 72

Author(s):

B. Price ◽

C. Boutilier

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Formal Model ◽

The State ◽

Learning Agent ◽

Relative Value ◽

Multiagent Environments ◽

Improved Performance ◽

Prioritized Sweeping ◽

Extract Information

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.

Download Full-text

2P1-L5 Segmentation of the State Space based on Bayesian Discrimination for Reinforcement Learning

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2001.64_1 ◽

2001 ◽

Vol 2001 (0) ◽

pp. 64

Author(s):

K. Yamada ◽

K. Ookura ◽

K. Ueda

Keyword(s):

Reinforcement Learning ◽

State Space ◽

The State

Download Full-text

Multi-Agent Reinforcement Learning Based on K-Means Clustering in Multi-Robot Cooperative Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.75 ◽

2011 ◽

Vol 216 ◽

pp. 75-80 ◽

Cited By ~ 1

Author(s):

Chang An Liu ◽

Fei Liu ◽

Chun Yang Liu ◽

Hua Wu

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Experimental Results ◽

Learning Ability ◽

Learning Method ◽

Q Learning ◽

State Space Explosion ◽

Multi Agent ◽

Robot Cooperation ◽

Multi Robot

To solve the curse of dimensionality problem in multi-agent reinforcement learning, a learning method based on k-means is presented in this paper. In this method, the environmental state is represented as key state factors. The state space explosion is avoided by classifying states into different clusters using k-means. The learning rate is improved by assigning different states to existent clusters, as well as corresponding strategy. Compared to traditional Q-learning, our experimental results of the multi-robot cooperation show that our scheme improves the team learning ability efficiently. Meanwhile, the cooperation efficiency can be enhanced successfully.

Download Full-text

A Temporal Difference GNG-Based Approach for the State Space Quantization in Reinforcement Learning Environments

2013 IEEE 25th International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2013.89 ◽

2013 ◽

Cited By ~ 1

Author(s):

Davi Carnauba De Lima Vieira ◽

Paulo Jorge Leitao Adeodato ◽

Paulo Mauricio Goncalves

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Environments ◽

The State ◽

Temporal Difference

Download Full-text

An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

Journal of Control Science and Engineering ◽

10.1155/2014/270548 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Yong Song ◽

Yibin Li ◽

Xiaoli Wang ◽

Xin Ma ◽

Jiuhong Ruan

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Convergence Speed ◽

Exponential Increase ◽

Cooperative Behaviors ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

AI Communications ◽

10.3233/aic-201580 ◽

2021 ◽

pp. 1-15

Author(s):

Theresa Ziemke ◽

Lucas N. Alegre ◽

Ana L.C. Bazzan

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Traffic Signals ◽

The State ◽

Signal Control ◽

Traffic Signal Control ◽

Rule Based ◽

Fourier Basis ◽

Linear Function Approximation

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.

Download Full-text

Complementary Meta-Reinforcement Learning for Fault-Adaptive Control

Annual Conference of the PHM Society ◽

10.36001/phmconf.2020.v12i1.1289 ◽

2020 ◽

Vol 12 (1) ◽

pp. 8

Author(s):

Ibrahim Ahmed ◽

Marcos Quiñones-Grueiro ◽

Gautam Biswas

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

Fault Tolerant ◽

Control Policy ◽

Time Constraints ◽

Learning Approach ◽

Fuel Tanks ◽

Meta Learning ◽

New Policies ◽

Abrupt Faults

Faults are endemic to all systems. Adaptive fault-tolerant control accepts degraded performance under faults in exchange for continued operation. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt fast to system changes. We present a meta-reinforcement learning approach that quickly adapts control policy. The approach builds upon model-agnostic meta learning (MAML). The controller maintains a complement of prior policies learned under system faults. This ``library" is evaluated on a system after a new fault to initialize the new policy. This contrasts with MAML where the controller samples new policies from a distribution of similar systems at each update step to achieve the new policy. Our approach improves sample efficiency of the reinforcement learning process. We evaluate this on a model of fuel tanks under abrupt faults.

Download Full-text

Q-Learning as Failure

Frontiers in Artificial Intelligence and Applications - Information Modelling and Knowledge Bases XXXII ◽

10.3233/faia200839 ◽

2020 ◽

Author(s):

Kei Takahata ◽

Takao Miura

Keyword(s):

Kalman Filter ◽

Reinforcement Learning ◽

The State ◽

Training Data ◽

Q Learning ◽

One Step

Reinforcement Learning allows us to acquire knowledge without any training data. However, for learning it takes time. In this work, we propose a method to perform Reverse action by using Retrospective Kalman Filter that estimates the state one step before. We show an experience by a Hunter Prey problem. And discuss the usefulness of our proposed method.

Download Full-text

The State-space Design Research of MPPT based on Reinforcement Learning in PV System

10.1109/peas53589.2021.9628732 ◽

2021 ◽

Author(s):

Dingyi Lin ◽

Xingshuo Li ◽

Shuye Ding

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Design Research ◽

The State ◽

Pv System ◽

Space Design

Download Full-text