Enhancing metacognitive reinforcement learning using reward structures and feedback

2021 ◽  
Author(s):  
Paul Krueger ◽  
Falk Lieder ◽  
Tom Griffiths

One of the most remarkable aspects of the human mind is its ability to improve itself based on experience. Such learning occurs in a range of domains, from simple stimulus-response mappings, motor skills, and perceptual abilities, to problem-solving, cognitive control, and learning itself. Demonstrations of cognitive and brain plasticity have inspired cognitive training programs. The success of cognitive training has been mixed and the underlying learning mechanisms are not well understood. Feedback is an important component of many effective cognitive training programs, but it remains unclear what makes some feedback structures more effective than others. To address these problems, we model cognitive plasticity as metacognitive reinforcement learning. Here, we develop a metacognitive reinforcement learning model of how people learn how many steps to plan ahead in sequential decision problems, and test its predictions experimentally.The results of our first experiment suggested that our model can discern which reward structures are more conducive to metacognitive learning. This suggests that our model could be used to design feedback structures that make existing environments more conducive to cognitive growth. A follow-up experiment confirmed that feedback structures designed according to our model can indeed accelerate learning to plan. These results suggest that modeling metacognitive learning is a promising step towards building a theoretical foundation for promoting cognitive growth through cognitive training and other interventions.

2017 ◽  
Author(s):  
Falk Lieder ◽  
Paul M. Krueger ◽  
Frederick Callaway ◽  
Tom Griffiths

The human mind has an impressive ability to improve itself based on experience, but this potential for cognitive growth is rarely fully realized. Cognitive training programs seek to tap into this unrealized potential but their theoretical foundation is incomplete and the scientific findings on their effectiveness are mixed. Recent work suggests that mechanisms by which people learn to think and decide better can be understood in terms of metacognitive reinforcement learning. This perspective allow us to translate the theory of reward shaping developed in machine learning into a computational method for designing feedback structures for effective cognitive training. Concretely, our method applies the shaping theorem for accelerating model-free reinforcement learning to a meta-decision problem whose actions are computations that update the decision-maker’s probabilistic beliefs about the returns of alternative courses of action. As a proof of concept, we show that our method can be applied to accelerate learning to plan in an environment similar to a grid worldwhere every location contained a reward. To measure and give feedback on people’s planning process, each reward was initially occluded and had to be revealed by clicking on the corresponding location. We found that participants in the feedback condition learned faster to deliberate more and consequently reaped higher rewards and identified the optimal sequence of moves more frequently. These findings inspire optimism that meta-level reward shaping might provide a principled theoretical foundation for cognitive training and enable more effective interventions for improving the human mind by giving feedback that is optimized for promoting metacognitive reinforcement learning.


2013 ◽  
Author(s):  
Harry Wilmer ◽  
Kara Blacker ◽  
Walter Schneider ◽  
Jason Chein

2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 325-325
Author(s):  
Erin Harrell ◽  
Nelson Roque

Abstract One modifiable risk factor of dementia is cognitive inactivity. Given cognitive ability is closely tied to continual performance of instrumental activities of daily living, cognitive training programs continue to be explored as a way to boost cognition and allow older adults to remain independent longer. While the efficacy of cognitive training is controversial, identifying activities older adults are willing to limit in exchange for cognitive training provides valuable information in relation to designing cognitive training programs that appeal to older adults. Using a qualitative approach, this study highlights activities older adults (ages 64+) noted as contributing to decreased gameplay of a cognitive training program on a tablet device. We found that respondents (61%) noted playing less as a result of entertainment activities (i.e., reading and playing games), social activities (31%) and travel (27%). Findings have implications for device form factor in administering cognitive training and other programs.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Gorin ◽  
V. Klucharev ◽  
A. Ossadtchi ◽  
I. Zubarev ◽  
V. Moiseeva ◽  
...  

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Sign in / Sign up

Export Citation Format

Share Document