scholarly journals Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations

2020 ◽  
Vol 34 (04) ◽  
pp. 5207-5215
Author(s):  
Aditya Modi ◽  
Debadeepta Dey ◽  
Alekh Agarwal ◽  
Adith Swaminathan ◽  
Besmira Nushi ◽  
...  

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in areas such as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.


Author(s):  
Juan Marcelo Parra-Ullauri ◽  
Antonio García-Domínguez ◽  
Nelly Bencomo ◽  
Changgang Zheng ◽  
Chen Zhen ◽  
...  

AbstractModern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.


Author(s):  
Yang Yu

Reinforcement learning is a major tool to realize intelligent agents that can be autonomously adaptive to the environment. With deep models, reinforcement learning has shown great potential in complex tasks such as playing games from pixels. However, current reinforcement learning techniques are still suffer from requiring a huge amount of interaction data, which could result in unbearable cost in real-world applications. In this article, we share our understanding of the problem, and discuss possible ways to alleviate the sample cost of reinforcement learning, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction. We also discuss some challenges in real-world applications, with the hope of inspiring future researches.


2000 ◽  
Vol 23 (5) ◽  
pp. 764-765 ◽  
Author(s):  
Robert J. Sternberg

The simple heuristics described in this book are ingenious but are unlikely to be optimally helpful in real-world, consequential, high-stakes decision making, such as mate and job selection. I discuss why the heuristics may not always provide people with such decisions to make with as much enlightenment as they would wish.


Author(s):  
Natália Souza Soares ◽  
João Marcelo Xavier Natário Teixeira ◽  
Veronica Teichrieb

In this work, we propose a framework to train a robot in a virtual environment using Reinforcement Learning (RL) techniques and thus facilitating the use of this type of approach in robotics. With our integrated solution for virtual training, it is possible to programmatically change the environment parameters, making it easy to implement domain randomization techniques on-the-fly. We conducted experiments with a TurtleBot 2i in an indoor navigation task with static obstacle avoidance using an RL algorithm called Proximal Policy Optimization (PPO). Our results show that even though the training did not use any real data, the trained model was able to generalize to different virtual environments and real-world scenes.


2018 ◽  
Vol 8 (2) ◽  
pp. 76-80 ◽  
Author(s):  
Xi Chen

Cognitive functioning is critical as in our daily life a host of real-world complex decisions in high-stakes markets have to be made. The decision-making process can be vulnerable to environmental stressors. Summarizing the growing economic and epidemiologic evidence linking air pollution, cognition performance and real-world decision-making, we first illustrate key physiological and psychological pathways between air pollution and cognition. We then document the main patterns of air pollution affecting cognitive test performance by type of cognitive tests, gender, window of exposure, age profile, and educational attainment. We further extend to a review of real-world decision-making that has been found to be affected by air pollution and the resulting cognitive impairments. Finally, rich implications on environmental health policies are drawn based on existing evaluations of social costs of air pollution.


Author(s):  
Syed Ihtesham Hussain Shah ◽  
Giuseppe De Pietro

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.


Author(s):  
Kushagra Singh Sisodiya

Abstract: The Industrial Revolution 4.0 has flooded the virtual world with data, which includes Internet of Things (IoT) data, mobile data, cybersecurity data, business data, social networks, including health data. To analyse this data efficiently and create related efficient and streamlined applications, expertise in artificial intelligence specifically machine learning (ML), is required. This field makes use of a variety of machine learning methods, including supervised, unsupervised, semi-supervised, and reinforcement. Additionally, deep learning, which is a subset of a larger range of machine learning techniques, is capable of effectively analysing vast amounts of data. Machine learning is a broad term that encompasses a number of methods used to extract information from data. These methods may allow the rapid translation of massive real-world information into applications that assist patients and providers in making decisions. The objective of this literature review was to find observational studies that utilised machine learning to enhance patient-provider decision-making utilising secondary data. Keywords: Machine Learning, Real World, Patient, Population, Artificial Intelligence


Author(s):  
Jeff Elpern ◽  
Sergiu Dascalu

Traditional software engineering methodologies have mostly evolved from the environment of proprietary, large-scale software systems. Here, software design principles operate within a hierarchical decision- making context. Development of banking, enterprise resource and complex weapons systems all fit this paradigm. However, another paradigm for developing software-intensive systems has emerged, the paradigm of open source software. Although from a traditional perspective open source projects might look like chaos, their real-world results have been spectacular. This chapter presents open source software development as a fundamentally new paradigm driven by economics and facilitated by new processes. The new paradigm’s revolutionary aspects are explored, a framework for describing the massive impact brought about by the new paradigm is proposed, and directions of future research are outlined. The proposed framework’s goals are to help the understanding of the open source paradigm as a new economic revolution and stimulate research in designing open source software.


Sign in / Sign up

Export Citation Format

Share Document