Kernel Temporal Difference based Reinforcement Learning for Brain Machine Interfaces*

Author(s):  
Xiang Shen ◽  
Xiang Zhang ◽  
Yiwen Wang
Author(s):  
Dómhnall J. Jennings ◽  
Eduardo Alonso ◽  
Esther Mondragón ◽  
Charlotte Bonardi

Standard associative learning theories typically fail to conceptualise the temporal properties of a stimulus, and hence cannot easily make predictions about the effects such properties might have on the magnitude of conditioning phenomena. Despite this, in intuitive terms we might expect that the temporal properties of a stimulus that is paired with some outcome to be important. In particular, there is no previous research addressing the way that fixed or variable duration stimuli can affect overshadowing. In this chapter we report results which show that the degree of overshadowing depends on the distribution form - fixed or variable - of the overshadowing stimulus, and argue that conditioning is weaker under conditions of temporal uncertainty. These results are discussed in terms of models of conditioning and timing. We conclude that the temporal difference model, which has been extensively applied to the reinforcement learning problem in machine learning, accounts for the key findings of our study.


2018 ◽  
Author(s):  
Minryung R. Song ◽  
Sang Wan Lee

AbstractDopamine activity may transition between two patterns: phasic responses to reward-predicting cues and ramping activity arising when an agent approaches the reward. However, when and why dopamine activity transitions between these modes is not understood. We hypothesize that the transition between ramping and phasic patterns reflects resource allocation which addresses the task dimensionality problem during reinforcement learning (RL). By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared it with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent narrows down candidate stimuli for the task; the opposite occurs when the agent needs to re-learn candidate stimuli due to a value change. These results lend insight into how dopamine deals with the tradeoff between cognitive resource and task dimensionality during RL.


Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1685 ◽  
Author(s):  
Chayoung Kim

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.


2017 ◽  
Vol 14 (3) ◽  
pp. 036016 ◽  
Author(s):  
Noeline W Prins ◽  
Justin C Sanchez ◽  
Abhishek Prasad

2009 ◽  
Vol 21 (4) ◽  
pp. 1173-1202 ◽  
Author(s):  
Christoph Kolodziejski ◽  
Bernd Porr ◽  
Florentin Wörgötter

In this theoretical contribution, we provide mathematical proof that two of the most important classes of network learning—correlation-based differential Hebbian learning and reward-based temporal difference learning—are asymptotically equivalent when timing the learning with a modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation-based perspective more closely related to the biophysics of neurons.


2017 ◽  
Vol 263 ◽  
pp. 15-25 ◽  
Author(s):  
Manuela Ruiz-Montiel ◽  
Lawrence Mandow ◽  
José-Luis Pérez-de-la-Cruz

Sign in / Sign up

Export Citation Format

Share Document