eligibility trace
Recently Published Documents


TOTAL DOCUMENTS

34
(FIVE YEARS 10)

H-INDEX

7
(FIVE YEARS 1)

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5062
Author(s):  
Xuejing Lan ◽  
Zhifeng Tan ◽  
Tao Zou ◽  
Wenbiao Xu

This paper focuses on the trajectory tracking guidance problem for the Terminal Area Energy Management (TAEM) phase of the Reusable Launch Vehicle (RLV). Considering the continuous state and action space of this guidance problem, the Continuous Actor–Critic Learning Automata (CACLA) is applied to construct the guidance strategy of RLV. Two three-layer neuron networks are used to model the critic and actor of CACLA, respectively. The weight vectors of the critic are updated by the model-free Temporal Difference (TD) learning algorithm, which is improved by eligibility trace and momentum factor. The weight vectors of the actor are updated based on the sign of TD error, and a Gauss exploration is carried out in the actor. Finally, a Monte Carlo simulation and a comparison simulation are performed to show the effectiveness of the CACLA-based guidance strategy.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Dong-Hyun Lim ◽  
Young Ju Yoon ◽  
Eunsil Her ◽  
Suehee Huh ◽  
Min Whan Jung

Abstract Even though persistent neural activity has been proposed as a mechanism for maintaining eligibility trace, direct empirical evidence for active maintenance of eligibility trace has been lacking. We recorded neuronal activity in the medial prefrontal cortex (mPFC) in rats performing a dynamic foraging task in which a choice must be remembered until its outcome on the timescale of seconds for correct credit assignment. We found that mPFC neurons maintain significant choice signals during the time period between action selection and choice outcome. We also found that neural signals for choice, outcome, and action value converge in the mPFC when choice outcome was revealed. Our results indicate that the mPFC maintains choice signals necessary for temporal credit assignment in the form of persistent neural activity in our task. They also suggest that the mPFC might update action value by combining actively maintained eligibility trace with action value and outcome signals.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1486
Author(s):  
Kanghyeon Seo ◽  
Jihoon Yang

We present a differentially private actor and its eligibility trace in an actor-critic approach, wherein an actor takes actions directly interacting with an environment; however, the critic estimates only the state values that are obtained through bootstrapping. In other words, the actor reflects the more detailed information about the sequence of taken actions on its parameter than the critic. Moreover, their corresponding eligibility traces have the same properties. Therefore, it is necessary to preserve the privacy of an actor and its eligibility trace while training on private or sensitive data. In this paper, we confirm the applicability of differential privacy methods to the actors updated using the policy gradient algorithm and discuss the advantages of such an approach with regard to differentially private critic learning. In addition, we measured the cosine similarity between the differentially private applied eligibility trace and the non-differentially private eligibility trace to analyze whether their anonymity is appropriately protected in the differentially private actor or the critic. We conducted the experiments considering two synthetic examples imitating real-world problems in medical and autonomous navigation domains, and the results confirmed the feasibility of the proposed method.


2020 ◽  
Author(s):  
Timo Oess ◽  
Marc O. Ernst ◽  
Heiko Neumann

The development of spatially registered auditory maps in the external nucleus of the inferior colliculus in young owls and their maintenance in adult animals is visually guided and evolves dynamically. To investigate the underlying neural mechanisms of this process, we developed a model of stabilized neoHebbian correlative learning which is augmented by an eligibility signal and a temporal trace of activations. This 3-component learning algorithm facilitates stable, yet flexible, formation of spatially registered auditory space maps composed of conductance-based topographically organized neu- ral units. Spatially aligned maps are learned for visual and auditory input stimuli that arrive in temporal and spatial registration. The reliability of visual sensory inputs can be used to regulate the learning rate in the form of an eligibility trace. We show that by shifting visual sensory inputs at the onset of learning the topography of auditory space maps is shifted accordingly. Simulation results explain why a shift of auditory maps in mature animals is possible only if corrections are induced in small steps. We conclude that learning spatially aligned auditory maps is flexibly controlled by reliable visual sensory neurons and can be formalized by a biological plausible unsupervised learning mechanism.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Marco P Lehmann ◽  
He A Xu ◽  
Vasiliki Liakoni ◽  
Michael H Herzog ◽  
Wulfram Gerstner ◽  
...  

In many daily tasks, we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning (RL) theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here, we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.


2019 ◽  
Author(s):  
Kenji Yamaguchi ◽  
Yoshitomo Maeda ◽  
Takeshi Sawada ◽  
Yusuke Iino ◽  
Mio Tajiri ◽  
...  

AbstractThe temporal precision of reward-reinforcement learning is determined by the minimal time window of the reward action—theoretically known as the eligibility trace. In animal studies, however, such a minimal time window and its origin have not been well understood. Here, we used head-restrained mice to accurately control the timing of sucrose water as an unconditioned stimulus (US); we found that the reinforcement effect of the US occurred only within 1 s after a short tone of a conditioned stimulus (CS). The conditioning required the dopamine D1 receptor and CaMKII signaling in the nucleus accumbens (NAc). The time window was not reduced by replacing CS with optogenetic stimulation of the synaptic inputs to the NAc, which is in agreement with previous reports on the effective dopamine timing of NAc synapses. Thus, our data suggest that the minimal reward time window is 1 s, and is formed in the NAc.


Sign in / Sign up

Export Citation Format

Share Document