A Menu of Designs for Reinforcement Learning Over Time

2020 ◽  
Author(s):  
Than Le

<p>In this chapter, we address the competent Autonomous Vehicles should have the ability to analyze the structure and unstructured environments and then to localize itself relative to surrounding things, where GPS, RFID or other similar means cannot give enough information about the location. Reliable SLAM is the most basic prerequisite for any further artificial intelligent tasks of an autonomous mobile robots. The goal of this paper is to simulate a SLAM process on the advanced software development. The model represents the system itself, whereas the simulation represents the operation of the system over time. And the software architecture will help us to focus our work to realize our wish with least trivial work. It is an open-source meta-operating system, which provides us tremendous tools for robotics related problems.</p> <p>Specifically, we address the advanced vehicles should have the ability to analyze the structured and unstructured environment based on solving the search-based planning and then we move to discuss interested in reinforcement learning-based model to optimal trajectory in order to apply to autonomous systems.</p>


2010 ◽  
Vol 16 (1) ◽  
pp. 21-37 ◽  
Author(s):  
Chris Marriott ◽  
James Parker ◽  
Jörg Denzinger

We study the effects of an imitation mechanism on a population of animats capable of individual ontogenetic learning. An urge to imitate others augments a network-based reinforcement learning strategy used in the control system of the animats. We test populations of animats with imitation against populations without for their ability to find, and maintain over generations, successful foraging behavior in an environment containing three necessary resources: food, water, and shelter. We conclude that even simple imitation mechanisms are effective at increasing the frequency of success when measured over time and over populations of animats.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
A. Hamann ◽  
V. Dunjko ◽  
S. Wölk

AbstractIn recent years, quantum-enhanced machine learning has emerged as a particularly fruitful application of quantum algorithms, covering aspects of supervised, unsupervised and reinforcement learning. Reinforcement learning offers numerous options of how quantum theory can be applied, and is arguably the least explored, from a quantum perspective. Here, an agent explores an environment and tries to find a behavior optimizing some figure of merit. Some of the first approaches investigated settings where this exploration can be sped-up, by considering quantum analogs of classical environments, which can then be queried in superposition. If the environments have a strict periodic structure in time (i.e. are strictly episodic), such environments can be effectively converted to conventional oracles encountered in quantum information. However, in general environments, we obtain scenarios that generalize standard oracle tasks. In this work, we consider one such generalization, where the environment is not strictly episodic, which is mapped to an oracle identification setting with a changing oracle. We analyze this case and show that standard amplitude-amplification techniques can, with minor modifications, still be applied to achieve quadratic speed-ups. In addition, we prove that an algorithm based on Grover iterations is optimal for oracle identification even if the oracle changes over time in a way that the “rewarded space” is monotonically increasing. This result constitutes one of the first generalizations of quantum-accessible reinforcement learning.


2021 ◽  
Author(s):  
Annik Yalnizyan-Carson ◽  
Blake A Richards

Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limitations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.


2020 ◽  
Author(s):  
Than Le

<p>In this chapter, we address the competent Autonomous Vehicles should have the ability to analyze the structure and unstructured environments and then to localize itself relative to surrounding things, where GPS, RFID or other similar means cannot give enough information about the location. Reliable SLAM is the most basic prerequisite for any further artificial intelligent tasks of an autonomous mobile robots. The goal of this paper is to simulate a SLAM process on the advanced software development. The model represents the system itself, whereas the simulation represents the operation of the system over time. And the software architecture will help us to focus our work to realize our wish with least trivial work. It is an open-source meta-operating system, which provides us tremendous tools for robotics related problems.</p> <p>Specifically, we address the advanced vehicles should have the ability to analyze the structured and unstructured environment based on solving the search-based planning and then we move to discuss interested in reinforcement learning-based model to optimal trajectory in order to apply to autonomous systems.</p>


Author(s):  
Antonius Wiehler ◽  
Jan Peters

AbstractGambling disorder is associated with deficits in classical feedback-based learning tasks, but the computational mechanisms underlying such learning impairments are still poorly understood. Here, we examined this question using a combination of computational modeling and functional resonance imaging (fMRI) in gambling disorder participants (n=23) and matched controls (n=19). Participants performed a simple reinforcement learning task with two pairs of stimuli (80% vs. 20% reinforcement rates per pair). As predicted, gamblers made significantly fewer selections of the optimal stimulus, while overall response times (RTs) were not significantly different between groups. We then used comprehensive modeling using reinforcement learning drift diffusion models (RLDDMs) in combination with hierarchical Bayesian parameter estimation to shed light on the computational underpinnings of this performance impairment. In both groups, an RLDDM in which both non-decision time and response threshold (boundary separation) changed over the course of the experiment accounted for the data best. The model showed good parameter recovery, and posterior predictive checks revealed that in both groups, the model reproduced the evolution of both accuracy and RTs over time. Examination of the group-wise posterior distributions revealed that the learning impairment in gamblers was attributable to both reduced learning rates and a more rapid reduction in boundary separation over time, compared to controls. Furthermore, gamblers also showed substantially shorter non-decision times. Model-based imaging analyses then revealed that value representations in gamblers in the ventromedial prefrontal cortex were attenuated compared to controls, and these effects were partly associated with model-based learning rates. Exploratory analyses revealed that a more anterior ventromedial prefrontal cortex cluster showed attenuations in value representations in proportion to gambling disorder severity in gamblers. Taken together, our findings reveal computational mechanisms underlying reinforcement learning impairments in gambling disorder, and confirm the ventromedial prefrontal cortex and as a critical neural hub in this disorder.


2019 ◽  
Vol 29 (11) ◽  
pp. 4850-4862 ◽  
Author(s):  
Sebastian Weissengruber ◽  
Sang Wan Lee ◽  
John P O’Doherty ◽  
Christian C Ruff

Abstract While it is established that humans use model-based (MB) and model-free (MF) reinforcement learning in a complementary fashion, much less is known about how the brain determines which of these systems should control behavior at any given moment. Here we provide causal evidence for a neural mechanism that acts as a context-dependent arbitrator between both systems. We applied excitatory and inhibitory transcranial direct current stimulation over a region of the left ventrolateral prefrontal cortex previously found to encode the reliability of both learning systems. The opposing neural interventions resulted in a bidirectional shift of control between MB and MF learning. Stimulation also affected the sensitivity of the arbitration mechanism itself, as it changed how often subjects switched between the dominant system over time. Both of these effects depended on varying task contexts that either favored MB or MF control, indicating that this arbitration mechanism is not context-invariant but flexibly incorporates information about current environmental demands.


Author(s):  
Jacquelyne Forgette ◽  
Michael Katchabaw

A key challenge in programming virtual environments is to produce virtual characters that are autonomous and capable of action selections that appear believable. In this chapter, motivations are used as a basis for learning using reinforcements. With motives driving the decisions of characters, their actions will appear less structured and repetitious, and more human in nature. This will also allow developers to easily create virtual characters with specific motivations, based mostly on their narrative purposes or roles in the virtual world. With minimum and maximum desirable motive values, the characters use reinforcement learning to drive action selection to maximize their rewards across all motives. Experimental results show that a character can learn to satisfy as many as four motives, even with significantly delayed rewards, and motive changes that are caused by other characters in the world. While the actions tested are simple in nature, they show the potential of a more complicated motivation driven reinforcement learning system. The developer need only define a character's motivations, and the character will learn to act realistically over time in the virtual environment.


Author(s):  
Ioan-Sorin Comşa ◽  
Sijing Zhang ◽  
Mehmet Emin Aydin ◽  
Pierre Kuonen ◽  
Ramona Trestian ◽  
...  

The user experience constitutes an important quality metric when delivering high-definition video services in wireless networks. Failing to provide these services within requested data rates, the user perceived quality is strongly degraded. On the radio interface, the packet scheduler is the key entity designed to satisfy the users' data rates requirements. In this chapter, a novel scheduler is proposed to guarantee the bit rate requirements for different types of services. However, the existing scheduling schemes satisfy the user rate requirements only at some extent because of their inflexibility to adapt for a variety of traffic and network conditions. In this sense, the authors propose an innovative framework able to select each time the most appropriate scheduling scheme. This framework makes use of reinforcement learning and neural network approximations to learn over time the scheduler type to be applied on each momentary state. The simulation results show the effectiveness of the proposed techniques for a variety of data rates' requirements and network conditions.


Sign in / Sign up

Export Citation Format

Share Document