The State-space Design Research of MPPT based on Reinforcement Learning in PV System

Author(s):  
Dingyi Lin ◽  
Xingshuo Li ◽  
Shuye Ding
2003 ◽  
Vol 19 ◽  
pp. 569-629 ◽  
Author(s):  
B. Price ◽  
C. Boutilier

Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Yong Song ◽  
Yibin Li ◽  
Xiaoli Wang ◽  
Xin Ma ◽  
Jiuhong Ruan

Reinforcement learning algorithm for multirobot will become very slow when the number of robots is increasing resulting in an exponential increase of state space. A sequentialQ-learning based on knowledge sharing is presented. The rule repository of robots behaviors is firstly initialized in the process of reinforcement learning. Mobile robots obtain present environmental state by sensors. Then the state will be matched to determine if the relevant behavior rule has been stored in the database. If the rule is present, an action will be chosen in accordance with the knowledge and the rules, and the matching weight will be refined. Otherwise the new rule will be appended to the database. The robots learn according to a given sequence and share the behavior database. We examine the algorithm by multirobot following-surrounding behavior, and find that the improved algorithm can effectively accelerate the convergence speed.


2021 ◽  
pp. 1-15
Author(s):  
Theresa Ziemke ◽  
Lucas N. Alegre ◽  
Ana L.C. Bazzan

Reinforcement learning is an efficient, widely used machine learning technique that performs well when the state and action spaces have a reasonable size. This is rarely the case regarding control-related problems, as for instance controlling traffic signals. Here, the state space can be very large. In order to deal with the curse of dimensionality, a rough discretization of such space can be employed. However, this is effective just up to a certain point. A way to mitigate this is to use techniques that generalize the state space such as function approximation. In this paper, a linear function approximation is used. Specifically, SARSA ( λ ) with Fourier basis features is implemented to control traffic signals in the agent-based transport simulation MATSim. The results are compared not only to trivial controllers such as fixed-time, but also to state-of-the-art rule-based adaptive methods. It is concluded that SARSA ( λ ) with Fourier basis features is able to outperform such methods, especially in scenarios with varying traffic demands or unexpected events.


1998 ◽  
Vol 10 (5) ◽  
pp. 431-438 ◽  
Author(s):  
Yuka Akisato ◽  
◽  
Keiji Suzuki ◽  
Azuma Ohuchi

The purpose of this research is to acquire an adaptive control policy of an airship in a dynamic, continuous environment based on reinforcement learning combined with evolutionary construction. The state space for reinforcement learning becomes huge because the airship has great inertia and must sense huge amounts of information from a continuous environment to behave appropriately. To reduce and suitably segment state space, we propose combining CMAC-based Q-learning and its evolutionary state space layer construction. Simulation showed the acquisition of state space segmentation enabling airships to learn effectively.


Author(s):  
Mathieu Seurin ◽  
Florian Strub ◽  
Philippe Preux ◽  
Olivier Pietquin

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant actions. While most actions consistently change the state when used, e.g. moving the agent, some actions are only effective in specific states, e.g., opening a door, grabbing an object. DoWhaM detects and rewards actions that seldom affect the environment. We evaluate DoWhaM on the procedurally-generated environment MiniGrid against state-of-the-art methods. Experiments consistently show that DoWhaM greatly reduces sample complexity, installing the new state-of-the-art in MiniGrid.


Sign in / Sign up

Export Citation Format

Share Document