scholarly journals Absraction for Efficient Reinforcement Learning

2021 ◽  
Author(s):  
◽  
Alexander Telfar

<p>Successful reinforcement learning requires large amounts of data, compute, and some luck. We explore the ability of abstraction(s) to reduce these dependencies. Abstractions for reinforcement learning share the goals of this abstract: to capture essential details, while leaving out the unimportant. By throwing away inessential details, there will be less to compute, less to explore, and less variance in observations. But, does this always aid reinforcement learning? More specifically, we start by looking for abstractions that are easily solvable. This leads us to a type of linear abstraction. We show that, while it does allow efficient solutions, it also gives erroneous solutions, in the general case. We then attempt to improve the sample efficiency of a reinforcment learner. We do so by constructing a measure of symmetry and using it as an inductive bias. We design and run experiments to test the advantage provided by this inductive bias, but must leave conclusions to future work.</p>

2021 ◽  
Author(s):  
◽  
Alexander Telfar

<p>Successful reinforcement learning requires large amounts of data, compute, and some luck. We explore the ability of abstraction(s) to reduce these dependencies. Abstractions for reinforcement learning share the goals of this abstract: to capture essential details, while leaving out the unimportant. By throwing away inessential details, there will be less to compute, less to explore, and less variance in observations. But, does this always aid reinforcement learning? More specifically, we start by looking for abstractions that are easily solvable. This leads us to a type of linear abstraction. We show that, while it does allow efficient solutions, it also gives erroneous solutions, in the general case. We then attempt to improve the sample efficiency of a reinforcment learner. We do so by constructing a measure of symmetry and using it as an inductive bias. We design and run experiments to test the advantage provided by this inductive bias, but must leave conclusions to future work.</p>


2017 ◽  
Vol 31 (1) ◽  
pp. 63-83 ◽  
Author(s):  
Ramji Balakrishnan ◽  
Ella Mae Matsumura ◽  
Sridhar Ramamoorti

ABSTRACT We examine the extent to which the 2013 COSO Internal Control—Integrated Framework (ICIF) succeeds in the goal to expand its application beyond a compliance framework. We do so by mapping the points of focus in the 2013 ICIF to the principles articulated in the Levers of Control (LOC) framework advocated by Simons (1995). The analysis shows how the revision achieves partial success. This identification of areas in which the frameworks overlap promotes an integrated view of organizational control and aids assessment of the efficacy of a firm's control over its strategic and operational processes. We also examine the extent to which the 2016 COSO Enterprise Risk Management (ERM) Exposure Draft captures non-overlapping areas between the 2013 ICIF and the LOC, and highlight implications for future work in this evolving area.


Author(s):  
Matthew E. Taylor

Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available. In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on. In all cases, our work has shown such information can be effectively leveraged. After giving a high-level overview of this work, we will highlight a set of open questions and suggest where future work could be usefully focused.


2021 ◽  
Vol 13 (2) ◽  
pp. 57-80
Author(s):  
Arunita Kundaliya ◽  
D.K. Lobiyal

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.


2020 ◽  
Vol 8 (1) ◽  
pp. 121-126
Author(s):  
Nicola Parkin

This paper turns toward learning design, not as a role, method, skill or even style of thinking, but as something that we are already existentially ‘in’, a lived-and-living part of teaching which is natural and arises from the places of our here-and-now situations. This way of understanding the work of learning design contradicts the prevailing position of learning design as instrumental future-work in which our faces are ever turned towards a time that is always yet-to-come. Our work is not, in the temporal sense, of itself, but always on the way to being something other than itself.  As we strive to transcend our current situation towards a greater measure of fulfilment, we are reaching always away from ourselves. Instead, we might take a stance of ‘slow’: Slow makes a space for us to encounter ourselves in practice and invites us to stay-with rather than race ahead. It begins with the quietly radical act of seeing goodness in slowness, in trusting time. Slow means finding the natural pace of our work, and takes the long-scale view that accepts into itself the many tempos and time scales in the work of learning design – including at times, the need for fast work. This paper invites you to pause and sit, to expand the moment you are already in, and to ponder philosophically, rambling across the page with notions of untangling, opening, loosening, listening, seeing, belonging pondering, sitting with and trusting. Taking time to do so is self-affirming. But perhaps the deepest gift that slow offers is choice: it opens a space for considered thought and action, and calls into question the habits and expectations of speed that we have grown so accustomed to.


2020 ◽  
Vol 29 (01n02) ◽  
pp. 2040009
Author(s):  
Sultan Alyahya ◽  
Manar Alsayyari

Crowdsourced software testing (CST) is an emerging trend in software testing. Companies and developers assign testing tasks through CST platforms to thousands of online testers. Currently, CST platform managers are trying to find and resolve challenges in order to reach the best CST practice. Many features have been applied by CST platforms to improve CST activities, including notification emails, online chat rooms, forums and, most importantly, a CST platform dashboard to view all testing projects and tasks; these features have enabled CST to operate efficiently. Still, CST users find it difficult to stay abreast of test project updates, maintain their motivation, and avoid frustration. This aligns with the increasing studies in the literature that call for more efficient solutions to support CST process. This research aims to support CST by searching for potential process limitations and overcoming them. In order to do so, a five-stage approach is used: first, the current process of CST is investigated through reviewing 15 CST platforms. Second, the review has resulted in identifying eight possible activities to improve CST; among them six have been selected after a survey distributed to 30 domain experts in Stage 3. In Stage 4, we have designed and implemented five process models in a web-based system that can fulfill the requirements of the six activities identified earlier. In Stage 5, we have evaluated these process models through interviews with representatives of two CST platforms and an expert tester, and through making scenario-based evaluation with 20 domain experts who used the system and rated the value of the processes. The results show that the new improvements are sound and can strengthen the CST practice.


2018 ◽  
Vol 120 (6) ◽  
pp. 2877-2896 ◽  
Author(s):  
Romain Cazé ◽  
Mehdi Khamassi ◽  
Lise Aubin ◽  
Benoît Girard

Multiple in vivo studies have shown that place cells from the hippocampus replay previously experienced trajectories. These replays are commonly considered to mainly reflect memory consolidation processes. Some data, however, have highlighted a functional link between replays and reinforcement learning (RL). This theory, extensively used in machine learning, has introduced efficient algorithms and can explain various behavioral and physiological measures from different brain regions. RL algorithms could constitute a mechanistic description of replays and explain how replays can reduce the number of iterations required to explore the environment during learning. We review the main findings concerning the different hippocampal replay types and the possible associated RL models (either model-based, model-free, or hybrid model types). We conclude by tying these frameworks together. We illustrate the link between data and RL through a series of model simulations. This review, at the frontier between informatics and biology, paves the way for future work on replays.


Author(s):  
Céline Hocquette

World-class human players have been outperformed in a number of complex two person games such as Go by Deep Reinforcement Learning systems GO. However, several drawbacks can be identified for these systems: 1) The data efficiency is unclear given they appear to require far more training games to achieve such performance than any human player might experience in a lifetime. 2) These systems are not easily interpretable as they provide limited explanation about how decisions are made. 3) These systems do not provide transferability of the learned strategies to other games. We study in this work how an explicit logical representation can overcome these limitations and introduce a new logical system called MIGO designed for learning two player game optimal strategies. It benefits from a strong inductive bias which provides the capability to learn efficiently from a few examples of games played. Additionally, MIGO's learned rules are relatively easy to comprehend, and are demonstrated to achieve significant transfer learning.


2021 ◽  
Author(s):  
Benson Chen ◽  
Xiang Fu ◽  
Regina Barzilay ◽  
Tommi Jaakkola

Searching for novel molecular compounds with desired properties is an important problem in drug discovery. Many existing frameworks generate molecules one atom at a time. We instead propose a flexible editing paradigm that generates molecules using learned molecular fragments---meaningful substructures of molecules. To do so, we train a variational autoencoder (VAE) to encode molecular fragments in a coherent latent space, which we then utilize as a vocabulary for editing molecules to explore the complex chemical property space. Equipped with the learned fragment vocabulary, we propose Fragment-based Sequential Translation (FaST), which learns a reinforcement learning (RL) policy to iteratively translate model-discovered molecules into increasingly novel molecules while satisfying desired properties. Empirical evaluation shows that FaST significantly improves over state-of-the-art methods on benchmark single/multi-objective molecular optimization tasks.


2020 ◽  
Author(s):  
Samuel D. McDougle ◽  
Ian C. Ballard ◽  
Beth Baribault ◽  
Sonia J. Bishop ◽  
Anne G.E. Collins

ABSTRACTRecent evidence suggests that executive processes shape reinforcement learning (RL) computations. Here, we extend this idea to the processing of choice outcomes, asking if executive function and RL interact during learning from novel goals. We designed a task where people learned from familiar rewards or abstract instructed goals. We hypothesized that learning from these goals would produce reliable responses in canonical reward circuits, and would do so by leveraging executive function. Behavioral results pointed to qualitatively similar learning processes when subjects learned from achieving goals versus familiar rewards. Goal learning was robustly and selectively correlated with performance on an independent executive function task. Neuroimaging revealed comparable appetitive responses and computational signatures in reinforcement learning circuits for both goal-based and familiar learning contexts. During goal learning, we observed enhanced correlations between prefrontal cortex and canonical reward-sensitive regions, including hippocampus, striatum, and the midbrain. These findings demonstrate that attaining novel goals produces reliable reward signals in dopaminergic circuits. We propose that learning from goal-directed behavior is mediated by top-down input that primes the reward system to endow value to cues signaling goal attainment.


Sign in / Sign up

Export Citation Format

Share Document