Reinforcement learning of a simple control task using the spike response model

2006 ◽  
Vol 70 (1-3) ◽  
pp. 14-20 ◽  
Author(s):  
Murilo Saraiva de Queiroz ◽  
Roberto Coelho de Berrêdo ◽  
Antônio de Pádua Braga
2007 ◽  
Vol 19 (6) ◽  
pp. 1468-1502 ◽  
Author(s):  
Răzvan V. Florian

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Xianghong Lin ◽  
Mengwei Zhang ◽  
Xiangwen Wang

As a new brain-inspired computational model of artificial neural networks, spiking neural networks transmit and process information via precisely timed spike trains. Constructing efficient learning methods is a significant research field in spiking neural networks. In this paper, we present a supervised learning algorithm for multilayer feedforward spiking neural networks; all neurons can fire multiple spikes in all layers. The feedforward network consists of spiking neurons governed by biologically plausible long-term memory spike response model, in which the effect of earlier spikes on the refractoriness is not neglected to incorporate adaptation effects. The gradient descent method is employed to derive synaptic weight updating rule for learning spike trains. The proposed algorithm is tested and verified on spatiotemporal pattern learning problems, including a set of spike train learning tasks and nonlinear pattern classification problems on four UCI datasets. Simulation results indicate that the proposed algorithm can improve learning accuracy in comparison with other supervised learning algorithms.


2011 ◽  
Vol 5 (3) ◽  
pp. 231-243 ◽  
Author(s):  
Thomas Clayton ◽  
Katherine Cameron ◽  
Bruce R. Rae ◽  
Nancy Sabatier ◽  
Edoardo Charbon ◽  
...  

Scholarpedia ◽  
2008 ◽  
Vol 3 (12) ◽  
pp. 1343 ◽  
Author(s):  
Wulfram Gerstner

Sign in / Sign up

Export Citation Format

Share Document