Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.

Download Full-text

Spike Timing Neural Model of Eye Movement Motor Response with Reinforcement Learning

Advanced Computing in Industrial Mathematics - Studies in Computational Intelligence ◽

10.1007/978-3-030-71616-5_14 ◽

2021 ◽

pp. 139-153

Author(s):

Petia Koprinkova-Hristova ◽

Nadejda Bocheva

Keyword(s):

Reinforcement Learning ◽

Eye Movement ◽

Motor Response ◽

Neural Model ◽

Spike Timing

Download Full-text

Supervised Learning Algorithm for Multilayer Spiking Neural Networks with Long-Term Memory Spike Response Model

Computational Intelligence and Neuroscience ◽

10.1155/2021/8592824 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Xianghong Lin ◽

Mengwei Zhang ◽

Xiangwen Wang

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Learning Algorithm ◽

Spike Trains ◽

Spiking Neural Networks ◽

Long Term Memory ◽

Response Model ◽

Spike Response ◽

Spike Response Model

As a new brain-inspired computational model of artificial neural networks, spiking neural networks transmit and process information via precisely timed spike trains. Constructing efficient learning methods is a significant research field in spiking neural networks. In this paper, we present a supervised learning algorithm for multilayer feedforward spiking neural networks; all neurons can fire multiple spikes in all layers. The feedforward network consists of spiking neurons governed by biologically plausible long-term memory spike response model, in which the effect of earlier spikes on the refractoriness is not neglected to incorporate adaptation effects. The gradient descent method is employed to derive synaptic weight updating rule for learning spike trains. The proposed algorithm is tested and verified on spatiotemporal pattern learning problems, including a set of spike train learning tasks and nonlinear pattern classification problems on four UCI datasets. Simulation results indicate that the proposed algorithm can improve learning accuracy in comparison with other supervised learning algorithms.

Download Full-text

Multi-timescale biological learning algorithms train spiking neuronal network motor control

10.1101/2021.11.20.469405 ◽

2021 ◽

Author(s):

Daniel Hasegan ◽

Matt Deible ◽

Christopher Earl ◽

David D'Onofrio ◽

Hananel Hazan ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Biological Evolution ◽

Spike Timing ◽

Random Perturbations ◽

Learning Mechanisms ◽

Model Learning ◽

Sensory Motor ◽

Biological Learning

Biological learning operates at multiple interlocking timescales, from long evolutionary stretches down to the relatively short time span of an individual's life. While each process has been simulated individually as a basic learning algorithm in the context of spiking neuronal networks (SNNs), the integration of the two has remained limited. In this study, we first train SNNs separately using individual model learning using spike-timing dependent reinforcement learning (STDP-RL) and evolutionary (EVOL) learning algorithms to solve the CartPole reinforcement learning (RL) control problem. We then develop an interleaved algorithm inspired by biological evolution that combines the EVOL and STDP-RL learning in sequence. We use the NEURON simulator with NetPyNE to create an SNN interfaced with the CartPole environment from OpenAI's Gym. In CartPole, the goal is to balance a vertical pole by moving left/right on a 1-D plane. Our SNN contains multiple populations of neurons organized in three layers: sensory layer, association/hidden layer, and motor layer, where neurons are connected by excitatory (AMPA/NMDA) and inhibitory (GABA) synapses. Association and motor layers contain one excitatory (E) population and two inhibitory (I) populations with different synaptic time constants. Each neuron is an event-based integrate-and-fire model with plastic connections between excitatory neurons. In our SNN, the environment activates sensory neurons tuned to specific features of the game state. We split the motor population into subsets representing each movement choice. The subset with more spiking over an interval determines the action. During STDP-RL, we supply intermediary evaluations (reward/punishment) of each action by judging the effectiveness of a move (e.g., moving the CartPole to a balanced position). During EVOL, updates consist of adding together many random perturbations of the connection weights. Each set of random perturbations is weighted by the total episodic reward it achieves when applied independently. We evaluate the performance of each algorithm after training and through the creation of sensory/motor action maps that delineate the network's transformation of sensory inputs into higher-order representations and eventual motor decisions. Both EVOL and STDP-RL training produce SNNs capable of moving the cart left and right and keeping the pole vertical. Compared to the STDP-RL and EVOL algorithms operating on their own, our interleaved training paradigm produced enhanced robustness in performance, with different strategies revealed through analysis of the sensory/motor mappings. Analysis of synaptic weight matrices also shows distributed vs clustered representations after the EVOL and STDP-RL algorithms, respectively. These weight differences also manifest as diffuse vs synchronized firing patterns. Our modeling opens up new capabilities for SNNs in RL and could serve as a testbed for neurobiologists aiming to understand multi-timescale learning mechanisms and dynamics in neuronal circuits.

Download Full-text

Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Robotica ◽

10.1017/s0263574719001632 ◽

2019 ◽

Vol 38 (9) ◽

pp. 1558-1575

Author(s):

Vahid Azimirad ◽

Mohammad Fattahi Sani

Keyword(s):

Reinforcement Learning ◽

Motor Neurons ◽

Learning Algorithm ◽

Experimental Studies ◽

Spike Timing ◽

Mammalian Brain ◽

Behavioral Learning ◽

Experimental Implementation ◽

Robotic Tasks ◽

Izhikevich Model

SUMMARYIn this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.

Download Full-text

Reinforcement learning of a simple control task using the spike response model

Neurocomputing ◽

10.1016/j.neucom.2006.07.002 ◽

2006 ◽

Vol 70 (1-3) ◽

pp. 14-20 ◽

Cited By ~ 8

Author(s):

Murilo Saraiva de Queiroz ◽

Roberto Coelho de Berrêdo ◽

Antônio de Pádua Braga

Keyword(s):

Reinforcement Learning ◽

Response Model ◽

Control Task ◽

Spike Response ◽

Spike Response Model

Download Full-text

Reducing the Variability of Neural Responses: A Computational Theory of Spike-Timing-Dependent Plasticity

Neural Computation ◽

10.1162/neco.2007.19.2.371 ◽

2007 ◽

Vol 19 (2) ◽

pp. 371-403 ◽

Cited By ~ 22

Author(s):

Sander M. Bohte ◽

Michael C. Mozer

Keyword(s):

Experimental Studies ◽

Response Variability ◽

Spike Timing ◽

Spike Timing Dependent Plasticity ◽

Presynaptic Neuron ◽

Spike Response ◽

Dependent Plasticity ◽

Synaptic Modulation ◽

The Face ◽

Spike Response Model

Experimental studies have observed synaptic potentiation when a presynaptic neuron fires shortly before a postsynaptic neuron and synaptic depression when the presynaptic neuron fires shortly after. The dependence of synaptic modulation on the precise timing of the two action potentials is known as spike-timing dependent plasticity (STDP). We derive STDP from a simple computational principle: synapses adapt so as to minimize the postsynaptic neuron's response variability to a given presynaptic input, causing the neuron's output to become more reliable in the face of noise. Using an objective function that minimizes response variability and the biophysically realistic spike-response model of Gerstner (2001), we simulate neurophysiological experiments and obtain the characteristic STDP curve along with other phenomena, including the reduction in synaptic plasticity as synaptic efficacy increases. We compare our account to other efforts to derive STDP from computational principles and argue that our account provides the most comprehensive coverage of the phenomena. Thus, reliability of neural response in the face of noise may be a key goal of unsupervised cortical adaptation.

Download Full-text

Retroactive modulation of spike timing-dependent plasticity by dopamine

eLife ◽

10.7554/elife.09685 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 43

Author(s):

Zuzanna Brzosko ◽

Wolfram Schultz ◽

Ole Paulsen

Keyword(s):

Reinforcement Learning ◽

Nmda Receptors ◽

Positive Reinforcement ◽

Spike Timing ◽

Learning Models ◽

Cellular Mechanisms ◽

Dependent Plasticity ◽

Eligibility Trace ◽

Plausible Mechanism ◽

Reinforcement Learning Models

Most reinforcement learning models assume that the reward signal arrives after the activity that led to the reward, placing constraints on the possible underlying cellular mechanisms. Here we show that dopamine, a positive reinforcement signal, can retroactively convert hippocampal timing-dependent synaptic depression into potentiation. This effect requires functional NMDA receptors and is mediated in part through the activation of the cAMP/PKA cascade. Collectively, our results support the idea that reward-related signaling can act on a pre-established synaptic eligibility trace, thereby associating specific experiences with behaviorally distant, rewarding outcomes. This finding identifies a biologically plausible mechanism for solving the ‘distal reward problem’.

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Reinforcement learning algorithm for one-warehouse multi-retailer inventory problem

Automation, Mechanical and Electrical Engineering ◽

10.2495/amee140161 ◽

2014 ◽

Author(s):

C.Y. Li ◽

X.T. Wang ◽

T.W. Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Inventory Problem ◽

Reinforcement Learning Algorithm

Download Full-text

Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

Symmetry ◽

10.3390/sym13030471 ◽

2021 ◽

Vol 13 (3) ◽

pp. 471

Author(s):

Jai Hoon Park ◽

Kang Hoon Lee

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Design Space ◽

Learning Algorithm ◽

Computational Design ◽

Computational Method ◽

Learning Ability ◽

Modular Robots ◽

Control Mechanisms ◽

Candidate Structure

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.

Download Full-text