Adaptive reinforcement learning system for linearization control

2000 ◽  
Vol 47 (5) ◽  
pp. 1185-1188 ◽  
Author(s):  
Kao-Shing Hwang ◽  
Horag-Jen Chao
2003 ◽  
Vol 39 (7) ◽  
pp. 699-701
Author(s):  
Kosuke UMESAKO ◽  
Masanao OBAYASHI ◽  
Kunikazu KOBAYASHI

Author(s):  
Chang-Shing Lee ◽  
Mei-Hui Wang ◽  
Yi-Lin Tsai ◽  
Wei-Shan Chang ◽  
Marek Reformat ◽  
...  

The currently observed developments in Artificial Intelligence (AI) and its influence on different types of industries mean that human-robot cooperation is of special importance. Various types of robots have been applied to the so-called field of Edutainment, i.e., the field that combines education with entertainment. This paper introduces a novel fuzzy-based system for a human-robot cooperative Edutainment. This co-learning system includes a brain-computer interface (BCI) ontology model and a Fuzzy Markup Language (FML)-based Reinforcement Learning Agent (FRL-Agent). The proposed FRL-Agent is composed of (1) a human learning agent, (2) a robotic teaching agent, (3) a Bayesian estimation agent, (4) a robotic BCI agent, (5) a fuzzy machine learning agent, and (6) a fuzzy BCI ontology. In order to verify the effectiveness of the proposed system, the FRL-Agent is used as a robot teacher in a number of elementary schools, junior high schools, and at a university to allow robot teachers and students to learn together in the classroom. The participated students use handheld devices to indirectly or directly interact with the robot teachers to learn English. Additionally, a number of university students wear a commercial EEG device with eight electrode channels to learn English and listen to music. In the experiments, the robotic BCI agent analyzes the collected signals from the EEG device and transforms them into five physiological indices when the students are learning or listening. The Bayesian estimation agent and fuzzy machine learning agent optimize the parameters of the FRL agent and store them in the fuzzy BCI ontology. The experimental results show that the robot teachers motivate students to learn and stimulate their progress. The fuzzy machine learning agent is able to predict the five physiological indices based on the eight-channel EEG data and the trained model. In addition, we also train the model to predict the other students’ feelings based on the analyzed physiological indices and labeled feelings. The FRL agent is able to provide personalized learning content based on the developed human and robot cooperative edutainment approaches. To our knowledge, the FRL agent has not applied to the teaching fields such as elementary schools before and it opens up a promising new line of research in human and robot co-learning. In the future, we hope the FRL agent will solve such an existing problem in the classroom that the high-performing students feel the learning contents are too simple to motivate their learning or the low-performing students are unable to keep up with the learning progress to choose to give up learning.


1995 ◽  
Vol 4 (1) ◽  
pp. 3-28 ◽  
Author(s):  
Mance E. Harmon ◽  
Leemon C. Baird ◽  
A. Harry Klopf

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.


2016 ◽  
Vol 115 (6) ◽  
pp. 3195-3203 ◽  
Author(s):  
Simon Dunne ◽  
Arun D'Souza ◽  
John P. O'Doherty

A major open question is whether computational strategies thought to be used during experiential learning, specifically model-based and model-free reinforcement learning, also support observational learning. Furthermore, the question of how observational learning occurs when observers must learn about the value of options from observing outcomes in the absence of choice has not been addressed. In the present study we used a multi-armed bandit task that encouraged human participants to employ both experiential and observational learning while they underwent functional magnetic resonance imaging (fMRI). We found evidence for the presence of model-based learning signals during both observational and experiential learning in the intraparietal sulcus. However, unlike during experiential learning, model-free learning signals in the ventral striatum were not detectable during this form of observational learning. These results provide insight into the flexibility of the model-based learning system, implicating this system in learning during observation as well as from direct experience, and further suggest that the model-free reinforcement learning system may be less flexible with regard to its involvement in observational learning.


Sign in / Sign up

Export Citation Format

Share Document