scholarly journals Cortical mechanisms for reinforcement learning in competitive games

2008 ◽  
Vol 363 (1511) ◽  
pp. 3845-3857 ◽  
Author(s):  
Hyojung Seo ◽  
Daeyeol Lee

Game theory analyses optimal strategies for multiple decision makers interacting in a social group. However, the behaviours of individual humans and animals often deviate systematically from the optimal strategies described by game theory. The behaviours of rhesus monkeys ( Macaca mulatta ) in simple zero-sum games showed similar patterns, but their departures from the optimal strategies were well accounted for by a simple reinforcement-learning algorithm. During a computer-simulated zero-sum game, neurons in the dorsolateral prefrontal cortex often encoded the previous choices of the animal and its opponent as well as the animal's reward history. By contrast, the neurons in the anterior cingulate cortex predominantly encoded the animal's reward history. Using simple competitive games, therefore, we have demonstrated functional specialization between different areas of the primate frontal cortex involved in outcome monitoring and action selection. Temporally extended signals related to the animal's previous choices might facilitate the association between choices and their delayed outcomes, whereas information about the choices of the opponent might be used to estimate the reward expected from a particular action. Finally, signals related to the reward history might be used to monitor the overall success of the animal's current decision-making strategy.

2020 ◽  
Vol 42 (15) ◽  
pp. 2919-2928
Author(s):  
He Ren ◽  
Jing Dai ◽  
Huaguang Zhang ◽  
Kun Zhang

Benefitting from the technology of integral reinforcement learning, the nonzero sum (NZS) game for distributed parameter systems is effectively solved in this paper when the information of system dynamics are unavailable. The Karhunen-Loève decomposition (KLD) is employed to convert the partial differential equation (PDE) systems into high-order ordinary differential equation (ODE) systems. Moreover, the off-policy IRL technology is introduced to design the optimal strategies for the NZS game. To confirm that the presented algorithm will converge to the optimal value functions, the traditional adaptive dynamic programming (ADP) method is first discussed. Then, the equivalence between the traditional ADP method and the presented off-policy method is proved. For implementing the presented off-policy IRL method, actor and critic neural networks are utilized to approach the value functions and control strategies in the iteration process, individually. Finally, a numerical simulation is shown to illustrate the effectiveness of the proposal off-policy algorithm.


2015 ◽  
Vol 113 (10) ◽  
pp. 3459-3461 ◽  
Author(s):  
Chong Chen

Our understanding of the neural basis of reinforcement learning and intelligence, two key factors contributing to human strivings, has progressed significantly recently. However, the overlap of these two lines of research, namely, how intelligence affects neural responses during reinforcement learning, remains uninvestigated. A mini-review of three existing studies suggests that higher IQ (especially fluid IQ) may enhance the neural signal of positive prediction error in dorsolateral prefrontal cortex, dorsal anterior cingulate cortex, and striatum, several brain substrates of reinforcement learning or intelligence.


2019 ◽  
Vol 9 (7) ◽  
pp. 174 ◽  
Author(s):  
Burak Erdeniz ◽  
John Done

Reinforcement learning studies in rodents and primates demonstrate that goal-directed and habitual choice behaviors are mediated through different fronto-striatal systems, but the evidence is less clear in humans. In this study, functional magnetic resonance imaging (fMRI) data were collected whilst participants (n = 20) performed a conditional associative learning task in which blocks of novel conditional stimuli (CS) required a deliberate choice, and blocks of familiar CS required an intuitive choice. Using standard subtraction analysis for fMRI event-related designs, activation shifted from the dorso-fronto-parietal network, which involves dorsolateral prefrontal cortex (DLPFC) for deliberate choice of novel CS, to ventro-medial frontal (VMPFC) and anterior cingulate cortex for intuitive choice of familiar CS. Supporting this finding, psycho-physiological interaction (PPI) analysis, using the peak active areas within the PFC for novel and familiar CS as seed regions, showed functional coupling between caudate and DLPFC when processing novel CS and VMPFC when processing familiar CS. These findings demonstrate separable systems for deliberate and intuitive processing, which is in keeping with rodent and primate reinforcement learning studies, although in humans they operate in a dynamic, possibly synergistic, manner particularly at the level of the striatum.


Author(s):  
João P. Hespanha

This book is aimed at students interested in using game theory as a design methodology for solving problems in engineering and computer science. The book shows that such design challenges can be analyzed through game theoretical perspectives that help to pinpoint each problem's essence: Who are the players? What are their goals? Will the solution to “the game” solve the original design problem? Using the fundamentals of game theory, the book explores these issues and more. The use of game theory in technology design is a recent development arising from the intrinsic limitations of classical optimization-based designs. In optimization, one attempts to find values for parameters that minimize suitably defined criteria—such as monetary cost, energy consumption, or heat generated. However, in most engineering applications, there is always some uncertainty as to how the selected parameters will affect the final objective. Through a sequential and easy-to-understand discussion, the book examines how to make sure that the selection leads to acceptable performance, even in the presence of uncertainty—the unforgiving variable that can wreck engineering designs. The book looks at such standard topics as zero-sum, non-zero-sum, and dynamic games and includes a MATLAB guide to coding. This book offers students a fresh way of approaching engineering and computer science applications.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


Sign in / Sign up

Export Citation Format

Share Document