scholarly journals Reinforcement Learning With Modulated Spike Timing–Dependent Synaptic Plasticity

2007 ◽  
Vol 98 (6) ◽  
pp. 3648-3665 ◽  
Author(s):  
Michael A. Farries ◽  
Adrienne L. Fairhall

Spike timing–dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Karim El-Laithy ◽  
Martin Bogdan

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.


2021 ◽  
Vol 17 (9) ◽  
pp. e1009353
Author(s):  
Nimrod Sherf ◽  
Maoz Shamir

Rats and mice use their whiskers to probe the environment. By rhythmically swiping their whiskers back and forth they can detect the existence of an object, locate it, and identify its texture. Localization can be accomplished by inferring the whisker’s position. Rhythmic neurons that track the phase of the whisking cycle encode information about the azimuthal location of the whisker. These neurons are characterized by preferred phases of firing that are narrowly distributed. Consequently, pooling the rhythmic signal from several upstream neurons is expected to result in a much narrower distribution of preferred phases in the downstream population, which however has not been observed empirically. Here, we show how spike timing dependent plasticity (STDP) can provide a solution to this conundrum. We investigated the effect of STDP on the utility of a neural population to transmit rhythmic information downstream using the framework of a modeling study. We found that under a wide range of parameters, STDP facilitated the transfer of rhythmic information despite the fact that all the synaptic weights remained dynamic. As a result, the preferred phase of the downstream neuron was not fixed, but rather drifted in time at a drift velocity that depended on the preferred phase, thus inducing a distribution of preferred phases. We further analyzed how the STDP rule governs the distribution of preferred phases in the downstream population. This link between the STDP rule and the distribution of preferred phases constitutes a natural test for our theory.


2021 ◽  
Author(s):  
Nimrod Sherf ◽  
Maoz Shamir

Rats and mice use their whiskers to probe the environment. By rhythmically swiping their whiskers back and forth they can detect the existence of an object, locate it, and identify its texture. Localization can be accomplished by inferring the position of the whisker. Rhythmic neurons that track the phase of the whisking cycle encode information about the azimuthal location of the whisker. These neurons are characterized by preferred phases of firing that are narrowly distributed. Consequently, pooling the rhythmic signal from several upstream neurons is expected to result in a much narrower distribution of preferred phases in the downstream population, which however has not been observed empirically. Here, we show how spike timing dependent plasticity (STDP) can provide a solution to this conundrum. We investigated the effect of STDP on the utility of a neural population to transmit rhythmic information downstream using the framework of a modeling study. We found that under a wide range of parameters, STDP facilitated the transfer of rhythmic information despite the fact that all the synaptic weights remained dynamic. As a result, the preferred phase of the downstream neuron was not fixed, but rather drifted in time at a drift velocity that depended on the preferred phase, thus inducing a distribution of preferred phases. We further analyzed how the STDP rule governs the distribution of preferred phases in the downstream population. This link between the STDP rule and the distribution of preferred phases constitutes a natural test for our theory.


2006 ◽  
Vol 18 (10) ◽  
pp. 2414-2464 ◽  
Author(s):  
Peter A. Appleby ◽  
Terry Elliott

In earlier work we presented a stochastic model of spike-timing-dependent plasticity (STDP) in which STDP emerges only at the level of temporal or spatial synaptic ensembles. We derived the two-spike interaction function from this model and showed that it exhibits an STDP-like form. Here, we extend this work by examining the general n-spike interaction functions that may be derived from the model. A comparison between the two-spike interaction function and the higher-order interaction functions reveals profound differences. In particular, we show that the two-spike interaction function cannot support stable, competitive synaptic plasticity, such as that seen during neuronal development, without including modifications designed specifically to stabilize its behavior. In contrast, we show that all the higher-order interaction functions exhibit a fixed-point structure consistent with the presence of competitive synaptic dynamics. This difference originates in the unification of our proposed “switch” mechanism for synaptic plasticity, coupling synaptic depression and synaptic potentiation processes together. While three or more spikes are required to probe this coupling, two spikes can never do so. We conclude that this coupling is critical to the presence of competitive dynamics and that multispike interactions are therefore vital to understanding synaptic competition.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1031
Author(s):  
Joseba Gorospe ◽  
Rubén Mulero ◽  
Olatz Arbelaitz ◽  
Javier Muguerza ◽  
Miguel Ángel Antón

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.


2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.


Sign in / Sign up

Export Citation Format

Share Document