Emotion Behavior Learning System Based on Meta-Parameter Control of Q-Learning with plural Q-values

Author(s):  
Shunsuke Akiguchi ◽  
Yoichiro Maeda
2009 ◽  
Vol 35 (2) ◽  
pp. 214-219 ◽  
Author(s):  
Xue-Song WANG ◽  
Xi-Lan TIAN ◽  
Yu-Hu CHENG ◽  
Jian-Qiang YI

2017 ◽  
Vol 7 (1.5) ◽  
pp. 269
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture.  The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.


Sensors ◽  
2019 ◽  
Vol 19 (17) ◽  
pp. 3672 ◽  
Author(s):  
Chao Lu ◽  
Jianwei Gong ◽  
Chen Lv ◽  
Xin Chen ◽  
Dongpu Cao ◽  
...  

As the main component of an autonomous driving system, the motion planner plays an essential role for safe and efficient driving. However, traditional motion planners cannot make full use of the on-board sensing information and lack the ability to efficiently adapt to different driving scenes and behaviors of different drivers. To overcome this limitation, a personalized behavior learning system (PBLS) is proposed in this paper to improve the performance of the traditional motion planner. This system is based on the neural reinforcement learning (NRL) technique, which can learn from human drivers online based on the on-board sensing information and realize human-like longitudinal speed control (LSC) through the learning from demonstration (LFD) paradigm. Under the LFD framework, the desired speed of human drivers can be learned by PBLS and converted to the low-level control commands by a proportion integration differentiation (PID) controller. Experiments using driving simulator and real driving data show that PBLS can adapt to different drivers by reproducing their driving behaviors for LSC in different scenes. Moreover, through a comparative experiment with the traditional adaptive cruise control (ACC) system, the proposed PBLS demonstrates a superior performance in maintaining driving comfort and smoothness.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


Author(s):  
Hikaru Sasaki ◽  
Tadashi Horiuchi ◽  
Satoru Kato ◽  
◽  
◽  
...  

Deep Q-network (DQN) is one of the most famous methods of deep reinforcement learning. DQN approximates the action-value function using Convolutional Neural Network (CNN) and updates it using Q-learning. In this study, we applied DQN to robot behavior learning in a simulation environment. We constructed the simulation environment for a two-wheeled mobile robot using the robot simulation software, Webots. The mobile robot acquired good behavior such as avoiding walls and moving along a center line by learning from high-dimensional visual information supplied as input data. We propose a method that reuses the best target network so far when the learning performance suddenly falls. Moreover, we incorporate Profit Sharing method into DQN in order to accelerate learning. Through the simulation experiment, we confirmed that our method is effective.


2020 ◽  
Vol 170 ◽  
pp. 1198-1203
Author(s):  
Mohamed Boussakssou ◽  
Bader Hssina ◽  
Mohammed Erittali

Sign in / Sign up

Export Citation Format

Share Document