Learning at Variable Attentional Load Requires Cooperation of Working Memory, Meta-learning and Attention-augmented Reinforcement Learning

2021 ◽  
pp. 1-29
Author(s):  
Thilo Womelsdorf ◽  
Marcus R. Watson ◽  
Paul Tiesinga

Abstract Flexible learning of changing reward contingencies can be realized with different strategies. A fast learning strategy involves using working memory of recently rewarded objects to guide choices. A slower learning strategy uses prediction errors to gradually update value expectations to improve choices. How the fast and slow strategies work together in scenarios with real-world stimulus complexity is not well known. Here, we aim to disentangle their relative contributions in rhesus monkeys while they learned the relevance of object features at variable attentional load. We found that learning behavior across six monkeys is consistently best predicted with a model combining (i) fast working memory and (ii) slower reinforcement learning from differently weighted positive and negative prediction errors as well as (iii) selective suppression of nonchosen feature values and (iv) a meta-learning mechanism that enhances exploration rates based on a memory trace of recent errors. The optimal model parameter settings suggest that these mechanisms cooperate differently at low and high attentional loads. Whereas working memory was essential for efficient learning at lower attentional loads, enhanced weighting of negative prediction errors and meta-learning were essential for efficient learning at higher attentional loads. Together, these findings pinpoint a canonical set of learning mechanisms and suggest how they may cooperate when subjects flexibly adjust to environments with variable real-world attentional demands.

2020 ◽  
Author(s):  
Thilo Womelsdorf ◽  
Marcus R. Watson ◽  
Paul Tiesinga

AbstractFlexible learning of changing reward contingencies can be realized with different strategies. A fast learning strategy involves using working memory of recently rewarded objects to guide choices. A slower learning strategy uses prediction errors to gradually update value expectations to improve choices. How the fast and slow strategies work together in scenarios with real-world stimulus complexity is not well known. Here, we disentangle their relative contributions in rhesus monkeys while they learned the relevance of object features at variable attentional load. We found that learning behavior across six subjects is consistently best predicted with a model combining (i) fast working memory (ii) slower reinforcement learning from differently weighted positive and negative prediction errors, as well as (iii) selective suppression of non-chosen feature values and (iv) a meta-learning mechanism adjusting exploration rates based on a memory trace of recent errors. These mechanisms cooperate differently at low and high attentional loads. While working memory was essential for efficient learning at lower attentional loads, enhanced weighting of negative prediction errors and meta-learning were essential for efficient learning at higher attentional loads. Together, these findings pinpoint a canonical set of learning mechanisms and demonstrate how they cooperate when subjects flexibly adjust to environments with variable real-world attentional demands.Significance statementLearning which visual features are relevant for achieving our goals is challenging in real-world scenarios with multiple distracting features and feature dimensions. It is known that in such scenarios learning benefits significantly from attentional prioritization. Here we show that beyond attention, flexible learning uses a working memory system, a separate learning gain for avoiding negative outcomes, and a meta-learning process that adaptively increases exploration rates whenever errors accumulate. These subcomponent processes of cognitive flexibility depend on distinct learning signals that operate at varying timescales, including the most recent reward outcome (for working memory), memories of recent outcomes (for adjusting exploration), and reward prediction errors (for attention augmented reinforcement learning). These results illustrate the specific mechanisms that cooperate during cognitive flexibility.


2018 ◽  
Vol 30 (10) ◽  
pp. 1422-1432 ◽  
Author(s):  
Anne G. E. Collins

Learning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity-limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long-term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus participant choices depended on reinforcement learning. Counterintuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: Using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning and may have important applications for education and computational psychiatry.


2018 ◽  
Vol 26 (1) ◽  
pp. 43-66 ◽  
Author(s):  
Uday Kamath ◽  
Carlotta Domeniconi ◽  
Kenneth De Jong

Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.


2021 ◽  
Author(s):  
Simon Ciranka ◽  
Juan Linde-Domingo ◽  
Ivan Padezhki ◽  
Clara Wicharz ◽  
Charley M Wu ◽  
...  

Humans and other animals are capable of inferring never-experienced relations (e.g., A>C) from other relational observations (e.g., A>B and B>C). The processes behind such transitive inference are subject to intense research. Here, we demonstrate a new aspect of relational learning, building on previous evidence that transitive inference can be accomplished through simple reinforcement learning mechanisms. We show in simulations that inference of novel relations benefits from an asymmetric learning policy, where observers update only their belief about the winner (or loser) in a pair. Across 4 experiments (n=145), we find substantial empirical support for such asymmetries in inferential learning. The learning policy favoured by our simulations and experiments gives rise to a compression of values which is routinely observed in psychophysics and behavioural economics. In other words, a seemingly biased learning strategy that yields well-known cognitive distortions can be beneficial for transitive inferential judgments.


2020 ◽  
Author(s):  
Yuki Sakai ◽  
Yutaka Sakai ◽  
Yoshinari Abe ◽  
Jin Narumoto ◽  
Saori C. Tanaka

AbstractNobody wants to experience anxiety. However, anxiety may be induced by our own implicit choices that are mis-reinforced by some imbalance in reinforcement learning. Here we focused on obsessive-compulsive disorder (OCD) as a candidate for implicitly learned anxiety. Simulations in the reinforcement learning framework showed that agents implicitly learn to become anxious when the memory trace signal for past actions decays differently for positive and negative prediction errors. In empirical data, we confirmed that OCD patients showed extremely imbalanced traces, which were normalized by serotonin enhancers. We also used fMRI to identify the neural signature of OCD and healthy participants with imbalanced traces. Beyond the spectrum of clinical phenotypes, these behavioral and neural characteristics can be generalized to variations in the healthy population.


2017 ◽  
Author(s):  
Anne G.E. Collins

AbstractLearning to make rewarding choices in response to stimuli depends on a slow but steady process, reinforcement learning, and a fast and flexible, but capacity limited process, working memory. Using both systems in parallel, with their contributions weighted based on performance, should allow us to leverage the best of each system: rapid early learning, supplemented by long term robust acquisition. However, this assumes that using one process does not interfere with the other. We use computational modeling to investigate the interactions between the two processes in a behavioral experiment, and show that working memory interferes with reinforcement learning. Previous research showed that neural representations of reward prediction errors, a key marker of reinforcement learning, were blunted when working memory was used for learning. We thus predicted that arbitrating in favor of working memory to learn faster in simple problems would weaken the reinforcement learning process. We tested this by measuring performance in a delayed testing phase where the use of working memory was impossible, and thus subject choices depended on reinforcement learning. Counter-intuitively, but confirming our predictions, we observed that associations learned most easily were retained worse than associations learned slower: using working memory to learn quickly came at the cost of long-term retention. Computational modeling confirmed that this could only be accounted for by working memory interference in reinforcement learning computations. These results further our understanding of how multiple systems contribute in parallel to human learning, and may have important applications for education and computational psychiatry.


Author(s):  
Siyi Li ◽  
Tianbo Liu ◽  
Chi Zhang ◽  
Dit-Yan Yeung ◽  
Shaojie Shen

While deep reinforcement learning (RL) methods have achieved unprecedented successes in a range of challenging problems, their applicability has been mainly limited to simulation or game domains due to the high sample complexity of the trial-and-error learning process. However, real-world robotic applications often need a data-efficient learning process with safety-critical constraints. In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire a strategy that combines perception and control, we represent the policy by a convolutional neural network. We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic failure. The neural network is trained by a combination of supervised learning from raw images and reinforcement learning from games of self-play. We show that the proposed approach can learn a target following policy in a simulator efficiently and the learned behavior can be successfully transferred to the DJI quadrotor platform for real-world UAV control. 


2016 ◽  
Author(s):  
Stefano Palminteri ◽  
Germain Lefebvre ◽  
Emma J. Kilford ◽  
Sarah-Jayne Blakemore

AbstractPrevious studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two cohorts of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valance influences learning. Concerning factual learning, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice


Sign in / Sign up

Export Citation Format

Share Document