Visualization of Learning Process in “State and Action” Space Using Self-Organizing Maps

Author(s):  
Akira Notsu ◽  
◽  
Yuichi Hattori ◽  
Seiki Ubukata ◽  
Katsuhiro Honda ◽  
...  

In reinforcement learning, agents can learn appropriate actions for each situation based on the consequences of these actions after interacting with the environment. Reinforcement learning is compatible with self-organizing maps that accomplish unsupervised learning by reacting to impulses and strengthening neurons. Therefore, numerous studies have investigated the topic of reinforcement learning in which agents learn the state space using self-organizing maps. In this study, while we intended to apply these previous studies to transfer the learning and visualization of the human learning process, we introduced self-organizing maps into reinforcement learning and attempted to make their “state and action” learning process visible. We performed numerical experiments with the 2D goal-search problem; our model visualized the learning process of the agent.

Author(s):  
Peng Zhang ◽  
Jianye Hao ◽  
Weixun Wang ◽  
Hongyao Tang ◽  
Yi Ma ◽  
...  

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.


Author(s):  
Marley Vellasco ◽  
Marco Pacheco ◽  
Karla Figueiredo ◽  
Flavio Souza

This paper describes a new class of neuro-fuzzy models, called Reinforcement Learning Hierarchical Neuro- Fuzzy Systems (RL-HNF). These models employ the BSP (Binary Space Partitioning) and Politree partitioning of the input space [Chrysanthou,1992] and have been developed in order to bypass traditional drawbacks of neuro-fuzzy systems: the reduced number of allowed inputs and the poor capacity to create their own structure and rules (ANFIS [Jang,1997], NEFCLASS [Kruse,1995] and FSOM [Vuorimaa,1994]). These new models, named Reinforcement Learning Hierarchical Neuro-Fuzzy BSP (RL-HNFB) and Reinforcement Learning Hierarchical Neuro-Fuzzy Politree (RL-HNFP), descend from the original HNFB that uses Binary Space Partitioning (see Hierarchical Neuro-Fuzzy Systems Part I). By using hierarchical partitioning, together with the Reinforcement Learning (RL) methodology, a new class of Neuro-Fuzzy Systems (SNF) was obtained, which executes, in addition to automatically learning its structure, the autonomous learning of the actions to be taken by an agent, dismissing a priori information (number of rules, fuzzy rules and sets) relative to the learning process. These characteristics represent an important differential when compared with existing intelligent agents learning systems, because in applications involving continuous environments and/or environments considered to be highly dimensional, the use of traditional Reinforcement Learning methods based on lookup tables (a table that stores value functions for a small or discrete state space) is no longer possible, since the state space becomes too large. This second part of hierarchical neuro-fuzzy systems focus on the use of reinforcement learning process. The first part presented HNFB models based on supervised learning methods. The RL-HNFB and RL-HNFP models were evaluated in a benchmark control application and a simulated Khepera robot environment with multiple obstacles.


2018 ◽  
Vol 27 (2) ◽  
pp. 111-126 ◽  
Author(s):  
Thommen George Karimpanal ◽  
Roland Bouffanais

The idea of reusing or transferring information from previously learned tasks (source tasks) for the learning of new tasks (target tasks) has the potential to significantly improve the sample efficiency of a reinforcement learning agent. In this work, we describe a novel approach for reusing previously acquired knowledge by using it to guide the exploration of an agent while it learns new tasks. In order to do so, we employ a variant of the growing self-organizing map algorithm, which is trained using a measure of similarity that is defined directly in the space of the vectorized representations of the value functions. In addition to enabling transfer across tasks, the resulting map is simultaneously used to enable the efficient storage of previously acquired task knowledge in an adaptive and scalable manner. We empirically validate our approach in a simulated navigation environment and also demonstrate its utility through simple experiments using a mobile micro-robotics platform. In addition, we demonstrate the scalability of this approach and analytically examine its relation to the proposed network growth mechanism. Furthermore, we briefly discuss some of the possible improvements and extensions to this approach, as well as its relevance to real-world scenarios in the context of continual learning.


2019 ◽  
Vol 25 (1) ◽  
pp. 73-80
Author(s):  
Nobuhito Manome ◽  
Shuji Shinohara ◽  
Kouta Suzuki ◽  
Yu Chen ◽  
Shunji Mitsuyoshi

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Georgios Detorakis ◽  
Antoine Chaillet ◽  
Nicolas P. Rougier

AbstractWe provide theoretical conditions guaranteeing that a self-organizing map efficiently develops representations of the input space. The study relies on a neural field model of spatiotemporal activity in area 3b of the primary somatosensory cortex. We rely on Lyapunov’s theory for neural fields to derive theoretical conditions for stability. We verify the theoretical conditions by numerical experiments. The analysis highlights the key role played by the balance between excitation and inhibition of lateral synaptic coupling and the strength of synaptic gains in the formation and maintenance of self-organizing maps.


2012 ◽  
Vol 2012 ◽  
pp. 1-13 ◽  
Author(s):  
Wei Wu ◽  
Atlas Khan

Self-organizing map (SOM) neural networks have been widely applied in information sciences. In particular, Su and Zhao proposes in (2009) an SOM-based optimization (SOMO) algorithm in order to find a wining neuron, through a competitive learning process, that stands for the minimum of an objective function. In this paper, we generalize the SOM-based optimization (SOMO) algorithm to so-called SOMO-malgorithm withmwinning neurons. Numerical experiments show that, form>1, SOMO-malgorithm converges faster than SOM-based optimization (SOMO) algorithm when used for finding the minimum of functions. More importantly, SOMO-malgorithm withm≥2can be used to find two or more minimums simultaneously in a single learning iteration process, while the original SOM-based optimization (SOMO) algorithm has to fulfil the same task much less efficiently by restarting the learning iteration process twice or more times.


Sign in / Sign up

Export Citation Format

Share Document