Reinforcement Learning With Modulated Spike Timing–Dependent Synaptic Plasticity

Spike timing–dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

STDP and the distribution of preferred phases in the whisker system

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009353 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009353

Author(s):

Nimrod Sherf ◽

Maoz Shamir

Keyword(s):

Drift Velocity ◽

Spike Timing ◽

Neural Population ◽

Spike Timing Dependent Plasticity ◽

Dependent Plasticity ◽

Wide Range ◽

Stdp Rule ◽

Natural Test ◽

Rats And Mice ◽

Whisker System

Rats and mice use their whiskers to probe the environment. By rhythmically swiping their whiskers back and forth they can detect the existence of an object, locate it, and identify its texture. Localization can be accomplished by inferring the whisker’s position. Rhythmic neurons that track the phase of the whisking cycle encode information about the azimuthal location of the whisker. These neurons are characterized by preferred phases of firing that are narrowly distributed. Consequently, pooling the rhythmic signal from several upstream neurons is expected to result in a much narrower distribution of preferred phases in the downstream population, which however has not been observed empirically. Here, we show how spike timing dependent plasticity (STDP) can provide a solution to this conundrum. We investigated the effect of STDP on the utility of a neural population to transmit rhythmic information downstream using the framework of a modeling study. We found that under a wide range of parameters, STDP facilitated the transfer of rhythmic information despite the fact that all the synaptic weights remained dynamic. As a result, the preferred phase of the downstream neuron was not fixed, but rather drifted in time at a drift velocity that depended on the preferred phase, thus inducing a distribution of preferred phases. We further analyzed how the STDP rule governs the distribution of preferred phases in the downstream population. This link between the STDP rule and the distribution of preferred phases constitutes a natural test for our theory.

Download Full-text

STDP and the distribution of preferred phases in the whisker system

10.1101/2021.04.29.442009 ◽

2021 ◽

Author(s):

Nimrod Sherf ◽

Maoz Shamir

Keyword(s):

Drift Velocity ◽

Spike Timing ◽

Neural Population ◽

Spike Timing Dependent Plasticity ◽

Dependent Plasticity ◽

Wide Range ◽

Stdp Rule ◽

Natural Test ◽

Rats And Mice ◽

Whisker System

Rats and mice use their whiskers to probe the environment. By rhythmically swiping their whiskers back and forth they can detect the existence of an object, locate it, and identify its texture. Localization can be accomplished by inferring the position of the whisker. Rhythmic neurons that track the phase of the whisking cycle encode information about the azimuthal location of the whisker. These neurons are characterized by preferred phases of firing that are narrowly distributed. Consequently, pooling the rhythmic signal from several upstream neurons is expected to result in a much narrower distribution of preferred phases in the downstream population, which however has not been observed empirically. Here, we show how spike timing dependent plasticity (STDP) can provide a solution to this conundrum. We investigated the effect of STDP on the utility of a neural population to transmit rhythmic information downstream using the framework of a modeling study. We found that under a wide range of parameters, STDP facilitated the transfer of rhythmic information despite the fact that all the synaptic weights remained dynamic. As a result, the preferred phase of the downstream neuron was not fixed, but rather drifted in time at a drift velocity that depended on the preferred phase, thus inducing a distribution of preferred phases. We further analyzed how the STDP rule governs the distribution of preferred phases in the downstream population. This link between the STDP rule and the distribution of preferred phases constitutes a natural test for our theory.

Download Full-text

Faculty Opinions recommendation of Spike-timing-dependent synaptic plasticity depends on dendritic location.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1024668.290795 ◽

2005 ◽

Author(s):

Florentin Woergoetter

Keyword(s):

Synaptic Plasticity ◽

Spike Timing ◽

Dendritic Location

Download Full-text

Faculty Opinions recommendation of Requirement of dendritic calcium spikes for induction of spike-timing-dependent synaptic plasticity.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1032461.373142 ◽

2006 ◽

Author(s):

John Lisman

Keyword(s):

Synaptic Plasticity ◽

Spike Timing ◽

Calcium Spikes

Download Full-text

Faculty Opinions recommendation of Optimal decoding of correlated neural population responses in the primate visual cortex.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1049165.512785 ◽

2007 ◽

Author(s):

Bruce Cumming

Keyword(s):

Visual Cortex ◽

Neural Population ◽

Population Responses ◽

Primate Visual Cortex ◽

Optimal Decoding

Download Full-text

Faculty Opinions recommendation of Optimal decoding of correlated neural population responses in the primate visual cortex.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1049165.501079 ◽

2006 ◽

Author(s):

Kevan A Martin

Keyword(s):

Visual Cortex ◽

Neural Population ◽

Population Responses ◽

Primate Visual Cortex ◽

Optimal Decoding

Download Full-text

Stable Competitive Dynamics Emerge from Multispike Interactions in a Stochastic Model of Spike-Timing-Dependent Plasticity

Neural Computation ◽

10.1162/neco.2006.18.10.2414 ◽

2006 ◽

Vol 18 (10) ◽

pp. 2414-2464 ◽

Cited By ~ 19

Author(s):

Peter A. Appleby ◽

Terry Elliott

Keyword(s):

Synaptic Plasticity ◽

Stochastic Model ◽

Higher Order ◽

Spike Timing ◽

Competitive Dynamics ◽

Spike Timing Dependent Plasticity ◽

Interaction Function ◽

Dependent Plasticity ◽

Order Interaction ◽

Synaptic Dynamics

In earlier work we presented a stochastic model of spike-timing-dependent plasticity (STDP) in which STDP emerges only at the level of temporal or spatial synaptic ensembles. We derived the two-spike interaction function from this model and showed that it exhibits an STDP-like form. Here, we extend this work by examining the general n-spike interaction functions that may be derived from the model. A comparison between the two-spike interaction function and the higher-order interaction functions reveals profound differences. In particular, we show that the two-spike interaction function cannot support stable, competitive synaptic plasticity, such as that seen during neuronal development, without including modifications designed specifically to stabilize its behavior. In contrast, we show that all the higher-order interaction functions exhibit a fixed-point structure consistent with the presence of competitive synaptic dynamics. This difference originates in the unification of our proposed “switch” mechanism for synaptic plasticity, coupling synaptic depression and synaptic potentiation processes together. While three or more spikes are required to probe this coupling, two spikes can never do so. We conclude that this coupling is critical to the presence of competitive dynamics and that multispike interactions are therefore vital to understanding synaptic competition.

Download Full-text

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text