AN IMPROVED INTERNAL MODEL FOR SWARM FORMATION AND ADAPTIVE SWARM BEHAVIOR ACQUISITION

2009 ◽  
Vol 18 (08) ◽  
pp. 1517-1531 ◽  
Author(s):  
TAKASHI KUREMOTO ◽  
YUKI YAMANO ◽  
MASANAO OBAYASHI ◽  
KUNIKAZU KOBAYASHI

To form a swarm and acquire swarm behaviors adaptive to the environment, we proposed a neuro-fuzzy learning system as a common internal model of each individual recently. The proposed swarm behavior learning system showed its efficient accomplishment in the simulation experiments of goal-exploration problems. However, the input information observed from the environment in our conventional methods was given by coordinate spaces (discrete or continuous) which were difficult to be obtained in the real world by the individuals. This paper intends to improve our previous neuro-fuzzy learning system to deal with the local-limited observation, i.e., usually being a Partially Observable Markov Decision Process (POMDP), by adopting eligibility traces and balancing trade-off between exploration and exploitation to the conventional learning algorithm. Simulations of goal-oriented problems for swarm learning were executed and the results showed the effectiveness of the improved learning system.

2017 ◽  
Vol 7 (1.5) ◽  
pp. 274
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research work presents analysis of Modified Sarsa learning algorithm. Modified Sarsa algorithm.  State-Action-Reward-State-Action (SARSA) is an technique for learning a Markov decision process (MDP) strategy, used in for reinforcement learning int the field of artificial intelligence (AI) and machine learning (ML). The Modified SARSA Algorithm makes better actions to get better rewards.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance. This modified   SARSA learning algorithm can   be more suitable in EMCAP architecture.  The experiments are conducted the modified   SARSA Learning system gets   more rewards compare to existing  SARSA algorithm.


2017 ◽  
Vol 7 (1.5) ◽  
pp. 269
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture.  The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


2010 ◽  
Vol 180 (9) ◽  
pp. 1630-1642 ◽  
Author(s):  
Wei Wu ◽  
Long Li ◽  
Jie Yang ◽  
Yan Liu

2013 ◽  
Vol 133 (5) ◽  
pp. 1076-1085 ◽  
Author(s):  
Takashi Kuremoto ◽  
Yuki Yamano ◽  
Liang-Bing Feng ◽  
Kunikazu Kobayashi ◽  
Masanao Obayashi

2015 ◽  
Vol 25 (3) ◽  
pp. 597-615 ◽  
Author(s):  
Hideaki Itoh ◽  
Hisao Fukumoto ◽  
Hiroshi Wakuya ◽  
Tatsuya Furukawa

AbstractThe theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments


Sign in / Sign up

Export Citation Format

Share Document