credit assignment
Recently Published Documents


TOTAL DOCUMENTS

215
(FIVE YEARS 70)

H-INDEX

23
(FIVE YEARS 4)

2022 ◽  
Vol 119 (3) ◽  
pp. e2106028118
Author(s):  
Raphael Köster ◽  
Dylan Hadfield-Menell ◽  
Richard Everett ◽  
Laura Weidinger ◽  
Gillian K. Hadfield ◽  
...  

How do societies learn and maintain social norms? Here we use multiagent reinforcement learning to investigate the learning dynamics of enforcement and compliance behaviors. Artificial agents populate a foraging environment and need to learn to avoid a poisonous berry. Agents learn to avoid eating poisonous berries better when doing so is taboo, meaning the behavior is punished by other agents. The taboo helps overcome a credit assignment problem in discovering delayed health effects. Critically, introducing an additional taboo, which results in punishment for eating a harmless berry, further improves overall returns. This “silly rule” counterintuitively has a positive effect because it gives agents more practice in learning rule enforcement. By probing what individual agents have learned, we demonstrate that normative behavior relies on a sequence of learned skills. Learning rule compliance builds upon prior learning of rule enforcement by other agents. Our results highlight the benefit of employing a multiagent reinforcement learning computational model focused on learning to implement complex actions.


2021 ◽  
Author(s):  
Phillip P Witkowski ◽  
Seongmin A Park ◽  
Erie D Boorman

Animals have been proposed to abstract compact representations of a task's structure that could, in principle, support accelerated learning and flexible behavior. Whether and how such abstracted representations may be used to assign credit for inferred, but unobserved, relationships in structured environments are unknown. Here, we develop a novel hierarchical reversal-learning task and Bayesian learning model to assess the computational and neural mechanisms underlying how humans infer specific choice-outcome associations via structured knowledge. We find that the medial prefrontal cortex (mPFC) efficiently represents hierarchically related choice-outcome associations governed by the same latent cause, using a generalized code to assign credit for both experienced and inferred outcomes. Furthermore, mPFC and lateral orbital frontal cortex track the inferred current "position" within a latent association space that generalizes over stimuli. Collectively, these findings demonstrate the importance both of tracking the current position in an abstracted task space and efficient, generalizable representations in prefrontal cortex for supporting flexible learning and inference in structured environments.


2021 ◽  
Vol 7 (51) ◽  
Author(s):  
Davide Folloni ◽  
Elsa Fouragnan ◽  
Marco K. Wittmann ◽  
Lea Roumazeilles ◽  
Lev Tankelevitch ◽  
...  

2021 ◽  
Vol 118 (51) ◽  
pp. e2111821118
Author(s):  
Yuhan Helena Liu ◽  
Stephen Smith ◽  
Stefan Mihalas ◽  
Eric Shea-Brown ◽  
Uygar Sümbül

Brains learn tasks via experience-driven differential adjustment of their myriad individual synaptic connections, but the mechanisms that target appropriate adjustment to particular connections remain deeply enigmatic. While Hebbian synaptic plasticity, synaptic eligibility traces, and top-down feedback signals surely contribute to solving this synaptic credit-assignment problem, alone, they appear to be insufficient. Inspired by new genetic perspectives on neuronal signaling architectures, here, we present a normative theory for synaptic learning, where we predict that neurons communicate their contribution to the learning outcome to nearby neurons via cell-type–specific local neuromodulation. Computational tests suggest that neuron-type diversity and neuron-type–specific local neuromodulation may be critical pieces of the biological credit-assignment puzzle. They also suggest algorithms for improved artificial neural network learning efficiency.


Author(s):  
Markel Sanz Ausin ◽  
Hamoon Azizsoltani ◽  
Song Ju ◽  
Yeo Jin Kim ◽  
Min Chi

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Lorenz Deserno ◽  
Rani Moran ◽  
Jochen Michely ◽  
Ying Lee ◽  
Peter Dayan ◽  
...  

Dopamine is implicated in representing model-free (MF) reward prediction errors a as well as influencing model-based (MB) credit assignment and choice. Putative cooperative interactions between MB and MF systems include a guidance of MF credit assignment by MB inference. Here, we used a double-blind, placebo-controlled, within-subjects design to test an hypothesis that enhancing dopamine levels boosts the guidance of MF credit assignment by MB inference. In line with this, we found that levodopa enhanced guidance of MF credit assignment by MB inference, without impacting MF and MB influences directly. This drug effect correlated negatively with a dopamine-dependent change in purely MB credit assignment, possibly reflecting a trade-off between these two MB components of behavioural control. Our findings of a dopamine boost in MB inference guidance of MF learning highlights a novel DA influence on MB-MF cooperative interactions.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260761
Author(s):  
Mohamed Kentour ◽  
Joan Lu

Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a “black-box” and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features’ extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets’ source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users’ preferences (i.e., frequency degree) and via the activation’s derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc.


2021 ◽  
Author(s):  
Daniel Nelson Scott ◽  
Michael J Frank

Two key problems that span biological and industrial neural network research are how networks can be trained to generalize well and to minimize destructive interference between tasks. Both hinge on credit assignment, the targeting of specific network weights for change. In artificial networks, credit assignment is typically governed by gradient descent. Biological learning is thus often analyzed as a means to approximate gradients. We take the complementary perspective that biological learning rules likely confer advantages when they aren't gradient approximations. Further, we hypothesized that noise correlations, often considered detrimental, could usefully shape this learning. Indeed, we show that noise and three-factor plasticity interact to compute directional derivatives of reward, which can improve generalization, robustness to interference, and multi-task learning. This interaction also provides a method for routing learning quasi-independently of activity and connectivity, and demonstrates how biologically inspired inductive biases can be fruitfully embedded in learning algorithms.


2021 ◽  
pp. 108466
Author(s):  
Dong Yan ◽  
Jiayi Weng ◽  
Shiyu Huang ◽  
Chongxuan Li ◽  
Yichi Zhou ◽  
...  

2021 ◽  
Author(s):  
Johannes Algermissen ◽  
Jennifer C. Swart ◽  
Rene Scheeringa ◽  
Roshan Cools ◽  
Hanneke E. M. den Ouden

Actions are biased by the outcomes they can produce: Humans are more likely to show action under reward prospect, but hold back under punishment prospect. Such motivational biases derive not only from biased response selection, but also from biased learning: humans tend to attribute rewards to their own actions, but are reluctant to attribute punishments to having held back. The neural origin of these biases is unclear; in particular, it remains open whether motivational biases arise solely from an evolutionarily old, subcortical architecture or also due to younger, cortical influences. Simultaneous EEG-fMRI allowed us to track which regions encoded biased prediction errors in which order. Biased prediction errors occurred in cortical regions (ACC, vmPFC, PCC) before subcortical regions (striatum). These results highlight that biased learning is not a mere feature of the basal ganglia, but arises through prefrontal cortical contributions, revealing motivational biases to be a potentially flexible, sophisticated mechanism.


Sign in / Sign up

Export Citation Format

Share Document