Joint On-line Learning of a Zero-shot Spoken Semantic Parser and a Reinforcement Learning Dialogue Manager

Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical on-line learning. With the aim of establishing a common theoretical basis between LCSs and RL algorithms to share each field's findings, a detailed analysis was performed to compare the learning processes of these two approaches. Based on our previous work on deriving an equivalence between the Zeroth-level Classifier System (ZCS) and Q-learning with Function Approximation (FA), this paper extends the analysis to the influence of actually applying the conditions for this equivalence. Comparative experiments have revealed interesting implications: (1) ZCS's original parameter, the deduction rate, plays a role in stabilizing the action selection, but (2) from the Reinforcement Learning perspective, such a process inhibits the ability to accurately estimate values for the entire state-action space, thus limiting the performance of ZCS in problems requiring accurate value estimation.

Download Full-text

Combining Local and Global Direct Derivative-Free Optimization for Reinforcement Learning

Cybernetics and Information Technologies ◽

10.2478/cait-2012-0021 ◽

2012 ◽

Vol 12 (3) ◽

pp. 53-65 ◽

Cited By ~ 5

Author(s):

Matteo Leonetti ◽

Petar Kormushev ◽

Simone Sagratella

Keyword(s):

Reinforcement Learning ◽

Autonomous Underwater Vehicle ◽

Underwater Vehicle ◽

Target Area ◽

Policy Space ◽

Derivative Free Optimization ◽

Derivative Free ◽

Actual Environment ◽

On Line ◽

On Line Learning

Abstract We consider the problem of optimization in policy space for reinforcement learning. While a plethora of methods have been applied to this problem, only a narrow category of them proved feasible in robotics. We consider the peculiar characteristics of reinforcement learning in robotics, and devise a combination of two algorithms from the literature of derivative-free optimization. The proposed combination is well suited for robotics, as it involves both off-line learning in simulation and on-line learning in the real environment. We demonstrate our approach on a real-world task, where an Autonomous Underwater Vehicle has to survey a target area under potentially unknown environment conditions. We start from a given controller, which can perform the task under foreseeable conditions, and make it adaptive to the actual environment.

Download Full-text

Patterns of Implicit Learning Below the Level of Conscious Knowledge

Journal of Psychophysiology ◽

10.1027/0269-8803/a000018 ◽

2010 ◽

Vol 24 (2) ◽

pp. 91-101 ◽

Cited By ~ 1

Author(s):

Juliana Yordanova ◽

Rolf Verleger ◽

Ullrich Wagner ◽

Vasil Kolev

Keyword(s):

Implicit Learning ◽

Explicit Knowledge ◽

Implicit Knowledge ◽

Knowledge Generation ◽

Sleep Stages ◽

Implicit Processing ◽

Implicit Processes ◽

Late Night ◽

On Line ◽

On Line Learning

The objective of the present study was to evaluate patterns of implicit processing in a task where the acquisition of explicit and implicit knowledge occurs simultaneously. The number reduction task (NRT) was used as having two levels of organization, overt and covert, where the covert level of processing is associated with implicit associative and implicit procedural learning. One aim was to compare these two types of implicit processes in the NRT when sleep was or was not introduced between initial formation of task representations and subsequent NRT processing. To assess the effects of different sleep stages, two sleep groups (early- and late-night groups) were used where initial training of the task was separated from subsequent retest by 3 h full of predominantly slow wave sleep (SWS) or rapid eye movement (REM) sleep. In two no-sleep groups, no interval was introduced between initial and subsequent NRT performance. A second aim was to evaluate the interaction between procedural and associative implicit learning in the NRT. Implicit associative learning was measured by the difference between the speed of responses that could or could not be predicted by the covert abstract regularity of the task. Implicit procedural on-line learning was measured by the practice-based increased speed of performance with time on task. Major results indicated that late-night sleep produced a substantial facilitation of implicit associations without modifying individual ability for explicit knowledge generation or for procedural on-line learning. This was evidenced by the higher rate of subjects who gained implicit knowledge of abstract task structure in the late-night group relative to the early-night and no-sleep groups. Independently of sleep, gain of implicit associative knowledge was accompanied by a relative slowing of responses to unpredictable items suggesting reciprocal interactions between associative and motor procedural processes within the implicit system. These observations provide evidence for the separability and interactions of different patterns of processing within implicit memory.

Download Full-text