Training a spiking neuronal network model of visual-motor cortex to play a virtual racket-ball game using reinforcement learning

Recent models of spiking neuronal networks have been trained to perform behaviors in static environments using a variety of learning rules, with varying degrees of biological realism. Most of these models have not been tested in dynamic visual environments where models must make predictions on future states and adjust their behavior accordingly. The models using these learning rules are often treated as black boxes, with little analysis on circuit architectures and learning mechanisms supporting optimal performance. Here we developed visual/motor spiking neuronal network models and trained them to play a virtual racket-ball game using several reinforcement learning algorithms inspired by the dopaminergic reward system. We systematically investigated how different architectures and circuit-motifs (feed-forward, recurrent, feedback) contributed to learning and performance. We also developed a new biologically-inspired learning rule that significantly enhanced performance, while reducing training time. Our models included visual areas encoding game inputs and relaying the information to motor areas, which used this information to learn to move the racket to hit the ball. Neurons in the early visual area relayed information encoding object location and motion direction across the network. Neuronal association areas encoded spatial relationships between objects in the visual scene. Motor populations received inputs from visual and association areas representing the dorsal pathway. Two populations of motor neurons generated commands to move the racket up or down. Model-generated actions updated the environment and triggered reward or punishment signals that adjusted synaptic weights so that the models could learn which actions led to reward. Here we demonstrate that our biologically-plausible learning rules were effective in training spiking neuronal network models to solve problems in dynamic environments. We used our models to dissect the circuit architectures and learning rules most effective for learning. Our models offer novel predictions on the biological mechanisms supporting learning behaviors.

Download Full-text

The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms.

Learning & Memory ◽

10.1101/lm.1.1.1 ◽

1994 ◽

Vol 1 (1) ◽

pp. 1-33

Author(s):

P R Montague ◽

T J Sejnowski

Keyword(s):

Synaptic Plasticity ◽

Temporal Order ◽

Hebbian Learning ◽

Learning Rule ◽

Network Models ◽

Learning Mechanisms ◽

Neural Network Models ◽

Spatial And Temporal Scales ◽

Synaptic Inputs ◽

Learning Rules

Some forms of synaptic plasticity depend on the temporal coincidence of presynaptic activity and postsynaptic response. This requirement is consistent with the Hebbian, or correlational, type of learning rule used in many neural network models. Recent evidence suggests that synaptic plasticity may depend in part on the production of a membrane permeant-diffusible signal so that spatial volume may also be involved in correlational learning rules. This latter form of synaptic change has been called volume learning. In both Hebbian and volume learning rules, interaction among synaptic inputs depends on the degree of coincidence of the inputs and is otherwise insensitive to their exact temporal order. Conditioning experiments and psychophysical studies have shown, however, that most animals are highly sensitive to the temporal order of the sensory inputs. Although these experiments assay the behavior of the entire animal or perceptual system, they raise the possibility that nervous systems may be sensitive to temporally ordered events at many spatial and temporal scales. We suggest here the existence of a new class of learning rule, called a predictive Hebbian learning rule, that is sensitive to the temporal ordering of synaptic inputs. We show how this predictive learning rule could act at single synaptic connections and through diffuse neuromodulatory systems.

Download Full-text

Motor primitives in space and time via targeted gain modulation in cortical networks

10.1101/451054 ◽

2018 ◽

Author(s):

Jake P. Stroud ◽

Mason A. Porter ◽

Guillaume Hennequin ◽

Tim P. Vogels

Keyword(s):

Neuronal Network ◽

Learning Rule ◽

Network Models ◽

Network Activity ◽

Movement Speed ◽

Control Units ◽

Dimensional Network ◽

Wide Range ◽

New Perspective

AbstractMotor cortex (M1) exhibits a rich repertoire of activities to support the generation of complex movements. Although recent neuronal-network models capture many qualitative aspects of M1 dynamics, they can generate only a few distinct movements. Additionally, it is unclear how M1 efficiently controls movements over a wide range of shapes and speeds. We demonstrate that simple modulation of neuronal input–output gains in recurrent neuronal-network models with fixed architecture can dramatically reorganize neuronal activity and thus downstream muscle outputs. Consistent with the observation of diffuse neuromodulatory projections to M1, we show that a relatively small number of modulatory control units provide sufficient flexibility to adjust high-dimensional network activity using a simple reward-based learning rule. Furthermore, it is possible to assemble novel movements from previously learned primitives, and one can separately change movement speed while preserving movement shape. Our results provide a new perspective on the role of modulatory systems in controlling recurrent cortical activity.

Download Full-text

MEG signatures of long-term effects of agreement and disagreement with the majority

Scientific Reports ◽

10.1038/s41598-021-82670-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Gorin ◽

V. Klucharev ◽

A. Ossadtchi ◽

I. Zubarev ◽

V. Moiseeva ◽

...

Keyword(s):

Reinforcement Learning ◽

Social Influence ◽

Temporal Dynamics ◽

Peer Group ◽

Long Term Effects ◽

Learning Mechanisms ◽

Source Imaging ◽

The Face ◽

First Session

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.

Download Full-text

Towards Reproducible Descriptions of Neuronal Network Models

PLoS Computational Biology ◽

10.1371/journal.pcbi.1000456 ◽

2009 ◽

Vol 5 (8) ◽

pp. e1000456 ◽

Cited By ~ 94

Author(s):

Eilen Nordlie ◽

Marc-Oliver Gewaltig ◽

Hans Ekkehard Plesser

Keyword(s):

Neuronal Network ◽

Network Models

Download Full-text

Specification and generation of structured neuronal network models with the NEST Topology Module

BMC Neuroscience ◽

10.1186/1471-2202-10-s1-p56 ◽

2009 ◽

Vol 10 (S1) ◽

Cited By ~ 5

Author(s):

Hans E Plesser ◽

Kittel Austvoll

Keyword(s):

Neuronal Network ◽

Network Models

Download Full-text

An Extremely Simple Reinforcement Learning Rule for Neural Networks

Advances in Neural Networks – ISNN 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-72383-7_51 ◽

2007 ◽

pp. 434-440

Author(s):

Xiaolong Ma

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Learning Rule

Download Full-text

Performance Comparison between Regression and Neuronal Network Models for Forecasting Pacific Sardine (Sardinops caeruleus) Biomass

Artificial Neuronal Networks ◽

10.1007/978-3-642-57030-8_11 ◽

2000 ◽

pp. 157-164

Author(s):

M. A. Cisneros-Mata ◽

T. Brey ◽

A. Jarre-Teichmann

Keyword(s):

Neuronal Network ◽

Network Models ◽

Performance Comparison ◽

Pacific Sardine

Download Full-text

Bifurcations and Dynamics in Modified Two Population Neuronal Network Models

The interdisciplinary journal of Discontinuity Nonlinearity and Complexity ◽

10.5890/dnc.2021.06.006 ◽

2021 ◽

Vol 10 (2) ◽

pp. 237-257

Author(s):

S. Roy Choudhury ◽

Gizem S. Oztepe

Keyword(s):

Neuronal Network ◽

Network Models

Download Full-text

Feedback-dependent generalization

Journal of Neurophysiology ◽

10.1152/jn.00247.2012 ◽

2013 ◽

Vol 109 (1) ◽

pp. 202-215 ◽

Cited By ~ 28

Author(s):

Jordan A. Taylor ◽

Laura L. Hieber ◽

Richard B. Ivry

Keyword(s):

Neural Network ◽

Learning Rule ◽

Network Models ◽

Visuomotor Adaptation ◽

Neural Representation ◽

Neural Network Models ◽

Feedback Processing ◽

Feedback Information ◽

Specific Effects ◽

Sensorimotor Map

Generalization provides a window into the representational changes that occur during motor learning. Neural network models have been integral in revealing how the neural representation constrains the extent of generalization. Specifically, two key features are thought to define the pattern of generalization. First, generalization is constrained by the properties of the underlying neural units; with directionally tuned units, the extent of generalization is limited by the width of the tuning functions. Second, error signals are used to update a sensorimotor map to align the desired and actual output, with a gradient-descent learning rule ensuring that the error produces changes in those units responsible for the error. In prior studies, task-specific effects in generalization have been attributed to differences in neural tuning functions. Here we ask whether differences in generalization functions may arise from task-specific error signals. We systematically varied visual error information in a visuomotor adaptation task and found that this manipulation led to qualitative differences in generalization. A neural network model suggests that these differences are the result of error feedback processing operating on a homogeneous and invariant set of tuning functions. Consistent with novel predictions derived from the model, increasing the number of training directions led to specific distortions of the generalization function. Taken together, the behavioral and modeling results offer a parsimonious account of generalization that is based on the utilization of feedback information to update a sensorimotor map with stable tuning functions.

Download Full-text

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Neural Computation ◽

10.1162/neco.2007.19.6.1468 ◽

2007 ◽

Vol 19 (6) ◽

pp. 1468-1502 ◽

Cited By ~ 159

Author(s):

Răzvan V. Florian

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Neural Model ◽

Spike Timing ◽

Spike Response ◽

Learning Rules ◽

Xor Problem ◽

Eligibility Trace ◽

Spike Response Model ◽

Intrinsic Plasticity

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.

Download Full-text