The State-space Design Research of MPPT based on Reinforcement Learning in PV System

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.

Download Full-text

2P1-L5 Segmentation of the State Space based on Bayesian Discrimination for Reinforcement Learning

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2001.64_1 ◽

2001 ◽

Vol 2001 (0) ◽

pp. 64

Author(s):

K. Yamada ◽

K. Ookura ◽

K. Ueda

Keyword(s):

Reinforcement Learning ◽

State Space ◽

The State

Download Full-text

A Temporal Difference GNG-Based Approach for the State Space Quantization in Reinforcement Learning Environments

2013 IEEE 25th International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2013.89 ◽

2013 ◽

Cited By ~ 1

Author(s):

Davi Carnauba De Lima Vieira ◽

Paulo Jorge Leitao Adeodato ◽

Paulo Mauricio Goncalves

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Learning Environments ◽

The State ◽

Temporal Difference

Download Full-text

An Improved Reinforcement Learning Algorithm for Cooperative Behaviors of Mobile Robots

Journal of Control Science and Engineering ◽

10.1155/2014/270548 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Yong Song ◽

Yibin Li ◽

Xiaoli Wang ◽

Xin Ma ◽

Jiuhong Ruan

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Knowledge Sharing ◽

State Space ◽

Learning Algorithm ◽

The State ◽

Convergence Speed ◽

Exponential Increase ◽

Cooperative Behaviors ◽

Reinforcement Learning Algorithm

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.

Download Full-text

Reinforcement learning vs. rule-based adaptive traffic signal control: A Fourier basis linear function approximation for traffic signal control

AI Communications ◽

10.3233/aic-201580 ◽

2021 ◽

pp. 1-15

Author(s):

Theresa Ziemke ◽

Lucas N. Alegre ◽

Ana L.C. Bazzan

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Function Approximation ◽

Traffic Signals ◽

The State ◽

Signal Control ◽

Traffic Signal Control ◽

Rule Based ◽

Fourier Basis ◽

Linear Function Approximation

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.

Download Full-text

1A1-D10 Cooperative Behavior Acquisition with Reinforcement Learning Robots Based on the Mechanism of Selecting the State Space Representations(Evolution and Learning for Robotics(1))

The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) ◽

10.1299/jsmermd.2012._1a1-d10_1 ◽

2012 ◽

Vol 2012 (0) ◽

pp. _1A1-D10_1-_1A1-D10_4

Author(s):

Koki KAGE ◽

Junki SAKANOUE ◽

Toshiyuki YASUDA ◽

Kazuhiro OHKURA

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Cooperative Behavior ◽

The State

Download Full-text

Topological Visualization Method for Understanding the Landscape of Value Functions and Structure of the State Space in Reinforcement Learning

Proceedings of the 12th International Conference on Agents and Artificial Intelligence ◽

10.5220/0008913303700377 ◽

2020 ◽

Author(s):

Yuki Nakamura ◽

Takeshi Shibuya

Keyword(s):

Reinforcement Learning ◽

State Space ◽

The State ◽

Value Functions ◽

Visualization Method

Download Full-text

GA-Based Q-CMAC Applied to Airship Evasion Problem

Journal of Robotics and Mechatronics ◽

10.20965/jrm.1998.p0431 ◽

1998 ◽

Vol 10 (5) ◽

pp. 431-438 ◽

Cited By ~ 1

Author(s):

Yuka Akisato ◽

◽

Keiji Suzuki ◽

Azuma Ohuchi

Keyword(s):

Adaptive Control ◽

Reinforcement Learning ◽

State Space ◽

Control Policy ◽

The State ◽

Space Layer ◽

Construction Simulation ◽

Q Learning ◽

Evolutionary State ◽

Evasion Problem

The purpose of this research is to acquire an adaptive control policy of an airship in a dynamic, continuous environment based on reinforcement learning combined with evolutionary construction. The state space for reinforcement learning becomes huge because the airship has great inertia and must sense huge amounts of information from a continuous environment to behave appropriately. To reduce and suitably segment state space, we propose combining CMAC-based Q-learning and its evolutionary state space layer construction. Simulation showed the acquisition of state space segmentation enabling airships to learn effectively.

Download Full-text

On the state-space design of optimal controllers for distributed systems with finite communication speed

2008 47th IEEE Conference on Decision and Control ◽

10.1109/cdc.2008.4739424 ◽

2008 ◽

Cited By ~ 6

Author(s):

Makan Fardad ◽

Mihailo R. Jovanovic

Keyword(s):

Distributed Systems ◽

State Space ◽

The State ◽

Space Design ◽

Optimal Controllers

Download Full-text

Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action Usefulness

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/406 ◽

2021 ◽

Author(s):

Mathieu Seurin ◽

Florian Strub ◽

Philippe Preux ◽

Olivier Pietquin

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

State Space ◽

State Of The Art ◽

The State ◽

Sample Complexity ◽

Art Methods ◽

New States

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant actions. While most actions consistently change the state when used, e.g. moving the agent, some actions are only effective in specific states, e.g., opening a door, grabbing an object. DoWhaM detects and rewards actions that seldom affect the environment. We evaluate DoWhaM on the procedurally-generated environment MiniGrid against state-of-the-art methods. Experiments consistently show that DoWhaM greatly reduces sample complexity, installing the new state-of-the-art in MiniGrid.

Download Full-text