Spike Timing Neural Model of Eye Movement Motor Response with Reinforcement Learning

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Neural Computation ◽

10.1162/neco.2007.19.6.1468 ◽

2007 ◽

Vol 19 (6) ◽

pp. 1468-1502 ◽

Cited By ~ 159

Author(s):

Răzvan V. Florian

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Neural Model ◽

Spike Timing ◽

Spike Response ◽

Learning Rules ◽

Xor Problem ◽

Eligibility Trace ◽

Spike Response Model ◽

Intrinsic Plasticity

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

Saccadic Eye Movement Speed and Motor Response Execution

Research Quarterly American Alliance for Health Physical Education and Recreation ◽

10.1080/10671315.1977.10615466 ◽

1977 ◽

Vol 48 (3) ◽

pp. 598-605

Author(s):

Harriet G. Williams ◽

Janet Helfrich

Keyword(s):

Eye Movement ◽

Motor Response ◽

Movement Speed ◽

Saccadic Eye Movement ◽

Response Execution

Download Full-text

A Neural Model of Multimodal Adaptive Saccadic Eye Movement Control by Superior Colliculus

Journal of Neuroscience ◽

10.1523/jneurosci.17-24-09706.1997 ◽

1997 ◽

Vol 17 (24) ◽

pp. 9706-9725 ◽

Cited By ~ 66

Author(s):

Stephen Grossberg ◽

Karen Roberts ◽

Mario Aguilar ◽

Daniel Bullock

Keyword(s):

Superior Colliculus ◽

Eye Movement ◽

Movement Control ◽

Neural Model ◽

Saccadic Eye Movement ◽

Eye Movement Control

Download Full-text

Reinforcement Learning With Modulated Spike Timing–Dependent Synaptic Plasticity

Journal of Neurophysiology ◽

10.1152/jn.00364.2007 ◽

2007 ◽

Vol 98 (6) ◽

pp. 3648-3665 ◽

Cited By ~ 69

Author(s):

Michael A. Farries ◽

Adrienne L. Fairhall

Keyword(s):

Synaptic Plasticity ◽

Reinforcement Learning ◽

General Purpose ◽

Spike Timing ◽

Neural Population ◽

Minimal Set ◽

Population Responses ◽

Wide Range ◽

Output Neurons ◽

Novel Model

Spike timing–dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

Download Full-text

Using reinforcement learning to understand the emergence of "intelligent" eye-movement behavior during reading.

Psychological Review ◽

10.1037/0033-295x.113.2.390 ◽

2006 ◽

Vol 113 (2) ◽

pp. 390-408 ◽

Cited By ~ 28

Author(s):

Erik D. Reichle ◽

Patryk A. Laurent

Keyword(s):

Reinforcement Learning ◽

Eye Movement ◽

Movement Behavior

Download Full-text

Reinforcement Learning of Two-Joint Virtual Arm Reaching in a Computer Model of Sensorimotor Cortex

Neural Computation ◽

10.1162/neco_a_00521 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3263-3293 ◽

Cited By ~ 23

Author(s):

Samuel A. Neymotin ◽

George L. Chadderdon ◽

Cliff C. Kerr ◽

Joseph T. Francis ◽

William W. Lytton

Keyword(s):

Reinforcement Learning ◽

Information Flow ◽

Sensory Information ◽

Relevant Information ◽

Spike Timing ◽

Proprioceptive Information ◽

Learning Networks ◽

Neuronal Populations ◽

Arm Reaching

Neocortical mechanisms of learning sensorimotor control involve a complex series of interactions at multiple levels, from synaptic mechanisms to cellular dynamics to network connectomics. We developed a model of sensory and motor neocortex consisting of 704 spiking model neurons. Sensory and motor populations included excitatory cells and two types of interneurons. Neurons were interconnected with AMPA/NMDA and GABAA synapses. We trained our model using spike-timing-dependent reinforcement learning to control a two-joint virtual arm to reach to a fixed target. For each of 125 trained networks, we used 200 training sessions, each involving 15 s reaches to the target from 16 starting positions. Learning altered network dynamics, with enhancements to neuronal synchrony and behaviorally relevant information flow between neurons. After learning, networks demonstrated retention of behaviorally relevant memories by using proprioceptive information to perform reach-to-target from multiple starting positions. Networks dynamically controlled which joint rotations to use to reach a target, depending on current arm position. Learning-dependent network reorganization was evident in both sensory and motor populations: learned synaptic weights showed target-specific patterning optimized for particular reach movements. Our model embodies an integrative hypothesis of sensorimotor cortical learning that could be used to interpret future electrophysiological data recorded in vivo from sensorimotor learning experiments. We used our model to make the following predictions: learning enhances synchrony in neuronal populations and behaviorally relevant information flow across neuronal populations, enhanced sensory processing aids task-relevant motor performance and the relative ease of a particular movement in vivo depends on the amount of sensory information required to complete the movement.

Download Full-text

Multi-timescale biological learning algorithms train spiking neuronal network motor control

10.1101/2021.11.20.469405 ◽

2021 ◽

Author(s):

Daniel Hasegan ◽

Matt Deible ◽

Christopher Earl ◽

David D'Onofrio ◽

Hananel Hazan ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Biological Evolution ◽

Spike Timing ◽

Random Perturbations ◽

Learning Mechanisms ◽

Model Learning ◽

Sensory Motor ◽

Biological Learning

Biological learning operates at multiple interlocking timescales, from long evolutionary stretches down to the relatively short time span of an individual's life. While each process has been simulated individually as a basic learning algorithm in the context of spiking neuronal networks (SNNs), the integration of the two has remained limited. In this study, we first train SNNs separately using individual model learning using spike-timing dependent reinforcement learning (STDP-RL) and evolutionary (EVOL) learning algorithms to solve the CartPole reinforcement learning (RL) control problem. We then develop an interleaved algorithm inspired by biological evolution that combines the EVOL and STDP-RL learning in sequence. We use the NEURON simulator with NetPyNE to create an SNN interfaced with the CartPole environment from OpenAI's Gym. In CartPole, the goal is to balance a vertical pole by moving left/right on a 1-D plane. Our SNN contains multiple populations of neurons organized in three layers: sensory layer, association/hidden layer, and motor layer, where neurons are connected by excitatory (AMPA/NMDA) and inhibitory (GABA) synapses. Association and motor layers contain one excitatory (E) population and two inhibitory (I) populations with different synaptic time constants. Each neuron is an event-based integrate-and-fire model with plastic connections between excitatory neurons. In our SNN, the environment activates sensory neurons tuned to specific features of the game state. We split the motor population into subsets representing each movement choice. The subset with more spiking over an interval determines the action. During STDP-RL, we supply intermediary evaluations (reward/punishment) of each action by judging the effectiveness of a move (e.g., moving the CartPole to a balanced position). During EVOL, updates consist of adding together many random perturbations of the connection weights. Each set of random perturbations is weighted by the total episodic reward it achieves when applied independently. We evaluate the performance of each algorithm after training and through the creation of sensory/motor action maps that delineate the network's transformation of sensory inputs into higher-order representations and eventual motor decisions. Both EVOL and STDP-RL training produce SNNs capable of moving the cart left and right and keeping the pole vertical. Compared to the STDP-RL and EVOL algorithms operating on their own, our interleaved training paradigm produced enhanced robustness in performance, with different strategies revealed through analysis of the sensory/motor mappings. Analysis of synaptic weight matrices also shows distributed vs clustered representations after the EVOL and STDP-RL algorithms, respectively. These weight differences also manifest as diffuse vs synchronized firing patterns. Our modeling opens up new capabilities for SNNs in RL and could serve as a testbed for neurobiologists aiming to understand multi-timescale learning mechanisms and dynamics in neuronal circuits.

Download Full-text

Age-related Spike Timing Dependent Plasticity of Brain-inspired Model of Visual Information Processing with Reinforcement Learning

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems ◽

10.15439/2020f141 ◽

2020 ◽

Author(s):

Petia Koprinkova-Hristova ◽

Nadejda Bocheva

Keyword(s):

Information Processing ◽

Reinforcement Learning ◽

Visual Information ◽

Visual Information Processing ◽

Spike Timing ◽

Spike Timing Dependent Plasticity ◽

Dependent Plasticity ◽

Age Related

Download Full-text

Spike Timing Neural Model of Motion Perception and Decision Making

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2019.00020 ◽

2019 ◽

Vol 13 ◽

Cited By ~ 5

Author(s):

Petia D. Koprinkova-Hristova ◽

Nadejda Bocheva ◽

Simona Nedelcheva ◽

Mirsolava Stefanova

Keyword(s):

Decision Making ◽

Motion Perception ◽

Neural Model ◽

Spike Timing

Download Full-text