Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations

Aditya Modi; Debadeepta Dey; Alekh Agarwal; Adith Swaminathan; Besmira Nushi; Sean Andrist; Eric Horvitz

doi:10.1609/aaai.v34i04.5965

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5965 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5207-5215

Author(s):

Aditya Modi ◽

Debadeepta Dey ◽

Alekh Agarwal ◽

Adith Swaminathan ◽

Besmira Nushi ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Real World ◽

Combinatorial Problem ◽

Computing System ◽

Software Systems ◽

High Stakes ◽

Modular Software ◽

Learning Techniques ◽

Time Critical

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in areas such as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Event-driven temporal models for explanations - ETeMoX: explaining reinforcement learning

Software & Systems Modeling ◽

10.1007/s10270-021-00952-4 ◽

2021 ◽

Author(s):

Juan Marcelo Parra-Ullauri ◽

Antonio García-Domínguez ◽

Nelly Bencomo ◽

Changgang Zheng ◽

Chen Zhen ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Mobile Communications ◽

Autonomous Agents ◽

Time Windows ◽

Autonomous Systems ◽

Software Systems ◽

Trade Offs ◽

Temporal Models ◽

Event Driven

AbstractModern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.

Download Full-text

Towards Sample Efficient Reinforcement Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/820 ◽

2018 ◽

Cited By ~ 2

Author(s):

Yang Yu

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Intelligent Agents ◽

Interaction Data ◽

Environment Modeling ◽

Huge Amount ◽

Complex Tasks ◽

Learning Techniques ◽

Real World Applications

Reinforcement learning is a major tool to realize intelligent agents that can be autonomously adaptive to the environment. With deep models, reinforcement learning has shown great potential in complex tasks such as playing games from pixels. However, current reinforcement learning techniques are still suffer from requiring a huge amount of interaction data, which could result in unbearable cost in real-world applications. In this article, we share our understanding of the problem, and discuss possible ways to alleviate the sample cost of reinforcement learning, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction. We also discuss some challenges in real-world applications, with the hope of inspiring future researches.

Download Full-text

Damn it, I still don't know what to do!

Behavioral and Brain Sciences ◽

10.1017/s0140525x00483445 ◽

2000 ◽

Vol 23 (5) ◽

pp. 764-765 ◽

Cited By ~ 2

Author(s):

Robert J. Sternberg

Keyword(s):

Decision Making ◽

Real World ◽

High Stakes ◽

Job Selection ◽

Simple Heuristics

The simple heuristics described in this book are ingenious but are unlikely to be optimally helpful in real-world, consequential, high-stakes decision making, such as mate and job selection. I discuss why the heuristics may not always provide people with such decisions to make with as much enlightenment as they would wish.

Download Full-text

Robot training in virtual environments using Reinforcement Learning techniques

10.5753/svr_estendido.2020.12950 ◽

2020 ◽

Author(s):

Natália Souza Soares ◽

João Marcelo Xavier Natário Teixeira ◽

Veronica Teichrieb

Keyword(s):

Reinforcement Learning ◽

Virtual Environments ◽

Virtual Environment ◽

Real World ◽

Real Data ◽

Indoor Navigation ◽

Virtual Training ◽

Learning Techniques ◽

Policy Optimization ◽

Environment Parameters

In this work, we propose a framework to train a robot in a virtual environment using Reinforcement Learning (RL) techniques and thus facilitating the use of this type of approach in robotics. With our integrated solution for virtual training, it is possible to programmatically change the environment parameters, making it easy to implement domain randomization techniques on-the-fly. We conducted experiments with a TurtleBot 2i in an indoor navigation task with static obstacle avoidance using an RL algorithm called Proximal Policy Optimization (PPO). Our results show that even though the training did not use any real data, the trained model was able to generalize to different virtual environments and real-world scenes.

Download Full-text

Smog, Cognition and Real-World Decision-Making

International Journal of Health Policy and Management ◽

10.15171/ijhpm.2018.105 ◽

2018 ◽

Vol 8 (2) ◽

pp. 76-80 ◽

Cited By ~ 2

Author(s):

Xi Chen

Keyword(s):

Decision Making ◽

Air Pollution ◽

Real World ◽

Test Performance ◽

Cognitive Test ◽

High Stakes ◽

Epidemiologic Evidence ◽

Exposure Age ◽

Age Profile ◽

Cognitive Test Performance

Cognitive functioning is critical as in our daily life a host of real-world complex decisions in high-stakes markets have to be made. The decision-making process can be vulnerable to environmental stressors. Summarizing the growing economic and epidemiologic evidence linking air pollution, cognition performance and real-world decision-making, we first illustrate key physiological and psychological pathways between air pollution and cognition. We then document the main patterns of air pollution affecting cognitive test performance by type of cognitive tests, gender, window of exposure, age profile, and educational attainment. We further extend to a review of real-world decision-making that has been found to be affected by air pollution and the resulting cognitive impairments. Finally, rich implications on environmental health policies are drawn based on existing evaluations of social costs of air pollution.

Download Full-text

An Overview of Inverse Reinforcement Learning Techniques

Intelligent Environments 2021 - Ambient Intelligence and Smart Environments ◽

10.3233/aise210097 ◽

2021 ◽

Author(s):

Syed Ihtesham Hussain Shah ◽

Giuseppe De Pietro

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Decision Process ◽

Autonomous Agents ◽

Theoretical Background ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Learning Techniques ◽

Markov Decision ◽

Potential Use

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.

Download Full-text

Systematic Review of the Literature on Machine Learning Techniques Employed in Real-World Data Analysis for Patient-Provider Decision Making

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37836 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2728-2731

Author(s):

Kushagra Singh Sisodiya

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Decision Making ◽

Real World ◽

Industrial Revolution ◽

Secondary Data ◽

Machine Learning Techniques ◽

Real World Data ◽

Learning Techniques ◽

World Information

Abstract: The Industrial Revolution 4.0 has flooded the virtual world with data, which includes Internet of Things (IoT) data, mobile data, cybersecurity data, business data, social networks, including health data. To analyse this data efficiently and create related efficient and streamlined applications, expertise in artificial intelligence specifically machine learning (ML), is required. This field makes use of a variety of machine learning methods, including supervised, unsupervised, semi-supervised, and reinforcement. Additionally, deep learning, which is a subset of a larger range of machine learning techniques, is capable of effectively analysing vast amounts of data. Machine learning is a broad term that encompasses a number of methods used to extract information from data. These methods may allow the rapid translation of massive real-world information into applications that assist patients and providers in making decisions. The objective of this literature review was to find observational studies that utilised machine learning to enhance patient-provider decision-making utilising secondary data. Keywords: Machine Learning, Real World, Patient, Population, Artificial Intelligence

Download Full-text

A Framework for Understanding the Open Source Revolution

Designing Software-Intensive Systems ◽

10.4018/978-1-59904-699-0.ch014 ◽

2009 ◽

pp. 439-454

Author(s):

Jeff Elpern ◽

Sergiu Dascalu

Keyword(s):

Decision Making ◽

Open Source ◽

Open Source Software ◽

Real World ◽

Large Scale ◽

Future Research ◽

Software Systems ◽

New Paradigm ◽

Hierarchical Decision Making ◽

Hierarchical Decision

Traditional software engineering methodologies have mostly evolved from the environment of proprietary, large-scale software systems. Here, software design principles operate within a hierarchical decision- making context. Development of banking, enterprise resource and complex weapons systems all fit this paradigm. However, another paradigm for developing software-intensive systems has emerged, the paradigm of open source software. Although from a traditional perspective open source projects might look like chaos, their real-world results have been spectacular. This chapter presents open source software development as a fundamentally new paradigm driven by economics and facilitated by new processes. The new paradigm’s revolutionary aspects are explored, a framework for describing the massive impact brought about by the new paradigm is proposed, and directions of future research are outlined. The proposed framework’s goals are to help the understanding of the open source paradigm as a new economic revolution and stimulate research in designing open source software.

Download Full-text