scholarly journals Training a spiking neuronal network model of visual-motor cortex to play a virtual racket-ball game using reinforcement learning

2021 ◽  
Author(s):  
Haroon Anwar ◽  
Simon Caby ◽  
Salvador Dura-Bernal ◽  
David D'Onofrio ◽  
Daniel Hasegan ◽  
...  

Recent models of spiking neuronal networks have been trained to perform behaviors in static environments using a variety of learning rules, with varying degrees of biological realism. Most of these models have not been tested in dynamic visual environments where models must make predictions on future states and adjust their behavior accordingly. The models using these learning rules are often treated as black boxes, with little analysis on circuit architectures and learning mechanisms supporting optimal performance. Here we developed visual/motor spiking neuronal network models and trained them to play a virtual racket-ball game using several reinforcement learning algorithms inspired by the dopaminergic reward system. We systematically investigated how different architectures and circuit-motifs (feed-forward, recurrent, feedback) contributed to learning and performance. We also developed a new biologically-inspired learning rule that significantly enhanced performance, while reducing training time. Our models included visual areas encoding game inputs and relaying the information to motor areas, which used this information to learn to move the racket to hit the ball. Neurons in the early visual area relayed information encoding object location and motion direction across the network. Neuronal association areas encoded spatial relationships between objects in the visual scene. Motor populations received inputs from visual and association areas representing the dorsal pathway. Two populations of motor neurons generated commands to move the racket up or down. Model-generated actions updated the environment and triggered reward or punishment signals that adjusted synaptic weights so that the models could learn which actions led to reward. Here we demonstrate that our biologically-plausible learning rules were effective in training spiking neuronal network models to solve problems in dynamic environments. We used our models to dissect the circuit architectures and learning rules most effective for learning. Our models offer novel predictions on the biological mechanisms supporting learning behaviors.

1994 ◽  
Vol 1 (1) ◽  
pp. 1-33
Author(s):  
P R Montague ◽  
T J Sejnowski

Some forms of synaptic plasticity depend on the temporal coincidence of presynaptic activity and postsynaptic response. This requirement is consistent with the Hebbian, or correlational, type of learning rule used in many neural network models. Recent evidence suggests that synaptic plasticity may depend in part on the production of a membrane permeant-diffusible signal so that spatial volume may also be involved in correlational learning rules. This latter form of synaptic change has been called volume learning. In both Hebbian and volume learning rules, interaction among synaptic inputs depends on the degree of coincidence of the inputs and is otherwise insensitive to their exact temporal order. Conditioning experiments and psychophysical studies have shown, however, that most animals are highly sensitive to the temporal order of the sensory inputs. Although these experiments assay the behavior of the entire animal or perceptual system, they raise the possibility that nervous systems may be sensitive to temporally ordered events at many spatial and temporal scales. We suggest here the existence of a new class of learning rule, called a predictive Hebbian learning rule, that is sensitive to the temporal ordering of synaptic inputs. We show how this predictive learning rule could act at single synaptic connections and through diffuse neuromodulatory systems.


2018 ◽  
Author(s):  
Jake P. Stroud ◽  
Mason A. Porter ◽  
Guillaume Hennequin ◽  
Tim P. Vogels

AbstractMotor cortex (M1) exhibits a rich repertoire of activities to support the generation of complex movements. Although recent neuronal-network models capture many qualitative aspects of M1 dynamics, they can generate only a few distinct movements. Additionally, it is unclear how M1 efficiently controls movements over a wide range of shapes and speeds. We demonstrate that simple modulation of neuronal input–output gains in recurrent neuronal-network models with fixed architecture can dramatically reorganize neuronal activity and thus downstream muscle outputs. Consistent with the observation of diffuse neuromodulatory projections to M1, we show that a relatively small number of modulatory control units provide sufficient flexibility to adjust high-dimensional network activity using a simple reward-based learning rule. Furthermore, it is possible to assemble novel movements from previously learned primitives, and one can separately change movement speed while preserving movement shape. Our results provide a new perspective on the role of modulatory systems in controlling recurrent cortical activity.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Gorin ◽  
V. Klucharev ◽  
A. Ossadtchi ◽  
I. Zubarev ◽  
V. Moiseeva ◽  
...  

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.


2009 ◽  
Vol 5 (8) ◽  
pp. e1000456 ◽  
Author(s):  
Eilen Nordlie ◽  
Marc-Oliver Gewaltig ◽  
Hans Ekkehard Plesser

2013 ◽  
Vol 109 (1) ◽  
pp. 202-215 ◽  
Author(s):  
Jordan A. Taylor ◽  
Laura L. Hieber ◽  
Richard B. Ivry

Generalization provides a window into the representational changes that occur during motor learning. Neural network models have been integral in revealing how the neural representation constrains the extent of generalization. Specifically, two key features are thought to define the pattern of generalization. First, generalization is constrained by the properties of the underlying neural units; with directionally tuned units, the extent of generalization is limited by the width of the tuning functions. Second, error signals are used to update a sensorimotor map to align the desired and actual output, with a gradient-descent learning rule ensuring that the error produces changes in those units responsible for the error. In prior studies, task-specific effects in generalization have been attributed to differences in neural tuning functions. Here we ask whether differences in generalization functions may arise from task-specific error signals. We systematically varied visual error information in a visuomotor adaptation task and found that this manipulation led to qualitative differences in generalization. A neural network model suggests that these differences are the result of error feedback processing operating on a homogeneous and invariant set of tuning functions. Consistent with novel predictions derived from the model, increasing the number of training directions led to specific distortions of the generalization function. Taken together, the behavioral and modeling results offer a parsimonious account of generalization that is based on the utilization of feedback information to update a sensorimotor map with stable tuning functions.


2007 ◽  
Vol 19 (6) ◽  
pp. 1468-1502 ◽  
Author(s):  
Răzvan V. Florian

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.


Sign in / Sign up

Export Citation Format

Share Document