scholarly journals Self-Organized Fuzzy Reinforcement Learning System

2003 ◽  
Vol 39 (7) ◽  
pp. 699-701
Author(s):  
Kosuke UMESAKO ◽  
Masanao OBAYASHI ◽  
Kunikazu KOBAYASHI
Author(s):  
Chang-Shing Lee ◽  
Mei-Hui Wang ◽  
Yi-Lin Tsai ◽  
Wei-Shan Chang ◽  
Marek Reformat ◽  
...  

The currently observed developments in Artificial Intelligence (AI) and its influence on different types of industries mean that human-robot cooperation is of special importance. Various types of robots have been applied to the so-called field of Edutainment, i.e., the field that combines education with entertainment. This paper introduces a novel fuzzy-based system for a human-robot cooperative Edutainment. This co-learning system includes a brain-computer interface (BCI) ontology model and a Fuzzy Markup Language (FML)-based Reinforcement Learning Agent (FRL-Agent). The proposed FRL-Agent is composed of (1) a human learning agent, (2) a robotic teaching agent, (3) a Bayesian estimation agent, (4) a robotic BCI agent, (5) a fuzzy machine learning agent, and (6) a fuzzy BCI ontology. In order to verify the effectiveness of the proposed system, the FRL-Agent is used as a robot teacher in a number of elementary schools, junior high schools, and at a university to allow robot teachers and students to learn together in the classroom. The participated students use handheld devices to indirectly or directly interact with the robot teachers to learn English. Additionally, a number of university students wear a commercial EEG device with eight electrode channels to learn English and listen to music. In the experiments, the robotic BCI agent analyzes the collected signals from the EEG device and transforms them into five physiological indices when the students are learning or listening. The Bayesian estimation agent and fuzzy machine learning agent optimize the parameters of the FRL agent and store them in the fuzzy BCI ontology. The experimental results show that the robot teachers motivate students to learn and stimulate their progress. The fuzzy machine learning agent is able to predict the five physiological indices based on the eight-channel EEG data and the trained model. In addition, we also train the model to predict the other students’ feelings based on the analyzed physiological indices and labeled feelings. The FRL agent is able to provide personalized learning content based on the developed human and robot cooperative edutainment approaches. To our knowledge, the FRL agent has not applied to the teaching fields such as elementary schools before and it opens up a promising new line of research in human and robot co-learning. In the future, we hope the FRL agent will solve such an existing problem in the classroom that the high-performing students feel the learning contents are too simple to motivate their learning or the low-performing students are unable to keep up with the learning progress to choose to give up learning.


2000 ◽  
Vol 47 (5) ◽  
pp. 1185-1188 ◽  
Author(s):  
Kao-Shing Hwang ◽  
Horag-Jen Chao

1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


Sign in / Sign up

Export Citation Format

Share Document