scholarly journals Event-driven temporal models for explanations - ETeMoX: explaining reinforcement learning

Author(s):  
Juan Marcelo Parra-Ullauri ◽  
Antonio García-Domínguez ◽  
Nelly Bencomo ◽  
Changgang Zheng ◽  
Chen Zhen ◽  
...  

AbstractModern software systems are increasingly expected to show higher degrees of autonomy and self-management to cope with uncertain and diverse situations. As a consequence, autonomous systems can exhibit unexpected and surprising behaviours. This is exacerbated due to the ubiquity and complexity of Artificial Intelligence (AI)-based systems. This is the case of Reinforcement Learning (RL), where autonomous agents learn through trial-and-error how to find good solutions to a problem. Thus, the underlying decision-making criteria may become opaque to users that interact with the system and who may require explanations about the system’s reasoning. Available work for eXplainable Reinforcement Learning (XRL) offers different trade-offs: e.g. for runtime explanations, the approaches are model-specific or can only analyse results after-the-fact. Different from these approaches, this paper aims to provide an online model-agnostic approach for XRL towards trustworthy and understandable AI. We present ETeMoX, an architecture based on temporal models to keep track of the decision-making processes of RL systems. In cases where the resources are limited (e.g. storage capacity or time to response), the architecture also integrates complex event processing, an event-driven approach, for detecting matches to event patterns that need to be stored, instead of keeping the entire history. The approach is applied to a mobile communications case study that uses RL for its decision-making. In order to test the generalisability of our approach, three variants of the underlying RL algorithms are used: Q-Learning, SARSA and DQN. The encouraging results show that using the proposed configurable architecture, RL developers are able to obtain explanations about the evolution of a metric, relationships between metrics, and were able to track situations of interest happening over time windows.

Author(s):  
Ruikun Luo ◽  
Na Du ◽  
Kevin Y. Huang ◽  
X. Jessie Yang

Human-autonomy teaming is a major emphasis in the ongoing transformation of future work space wherein human agents and autonomous agents are expected to work as a team. While the increasing complexity in algorithms empowers autonomous systems, one major concern arises from the human factors perspective: Human agents have difficulty deciphering autonomy-generated solutions and increasingly perceive autonomy as a mysterious black box. The lack of transparency could lead to the lack of trust in autonomy and sub-optimal team performance (Chen and Barnes, 2014; Endsley, 2017; Lyons and Havig, 2014; de Visser et al., 2018; Yang et al., 2017). In response to this concern, researchers have investigated ways to enhance autonomy transparency. Existing human factors research on autonomy transparency has largely concentrated on conveying automation reliability or likelihood/(un)certainty information (Beller et al., 2013; McGuirl and Sarter, 2006; Wang et al., 2009; Neyedli et al., 2011). Providing explanations of automation’s behaviors is another way to increase transparency, which leads to higher performance and trust (Dzindolet et al., 2003; Mercado et al., 2016). Specifically, in the context of automated vehicles, studies have showed that informing the drivers of the reasons for the action of automated vehicles decreased drivers’ anxiety, increased their sense of control, preference and acceptance (Koo et al., 2014, 2016; Forster et al., 2017). However, the studies mentioned above largely focused on conveying simple likelihood information or used hand-drafted explanations, with only few exceptions (e.g.(Mercado et al., 2016)). Further research is needed to examine potential design structures of transparency autonomy. In the present study, we wish to propose an option-centric explanation approach, inspired by the research on design rationale. Design rationale is an area of design science focusing on the “representation for explicitly documenting the reasoning and argumentation that make sense of a specific artifact (MacLean et al., 1991)”. The theoretical underpinning for design rationale is that for designers what is important is not just the specific artifact itself but its other possibilities – why an artifact is designed in a particular way compared to how it might otherwise be. We aim to evaluate the effectiveness of the option-centric explanation approach on trust, dependence and team performance. We conducted a human-in-the-loop experiment with 34 participants (Age: Mean = 23.7 years, SD = 2.88 years). We developed a simulated game Treasure Hunter, where participants and an intelligent assistant worked together to uncover a map for treasures. The intelligent assistant’s ability, intent and decision-making rationale was conveyed in the option-centric rationale display. The experiment used a between-subject design with an independent variable – whether the option-centric rationale explanation was provided. The participants were randomly assigned to either of the two explanation conditions. Participants’ trust to the intelligent assistant, confidence of accomplishing the experiment without the intelligent assistant, and workload for the whole session were collected, as well as their scores for each map. The results showed that by conveying the intelligent assistant’s ability, intent and decision-making rationale in the option-centric rationale display, participants had higher task performance. With the display of all the options, participants had a better understanding and overview of the system. Therefore, they could utilize the intelligent assistant more appropriately and earned a higher score. It is notable that every participant only played 10 maps during the whole session. The advantages of option-centric rationale display might be more apparent if more rounds are played in the experiment session. Although not significant at the .05 level, there seems to be a trend suggesting lower levels of workload when the rationale explanation displayed. Our study contributes to the study of human-autonomy teaming by considering the important role of explanation display. It can help human operators build appropriate trust and improve the human-autonomy team performance.


Author(s):  
Syed Ihtesham Hussain Shah ◽  
Giuseppe De Pietro

In decision-making problems reward function plays an important role in finding the best policy. Reinforcement Learning (RL) provides a solution for decision-making problems under uncertainty in an Intelligent Environment (IE). However, it is difficult to specify the reward function for RL agents in large and complex problems. To counter these problems an extension of RL problem named Inverse Reinforcement Learning (IRL) is introduced, where reward function is learned from expert demonstrations. IRL is appealing for its potential use to build autonomous agents, capable of modeling others, deprived of compromising in performance of the task. This approach of learning by demonstrations relies on the framework of Markov Decision Process (MDP). This article elaborates original IRL algorithms along with their close variants to mitigate challenges. The purpose of this paper is to highlight an overview and theoretical background of IRL in the field of Machine Learning (ML) and Artificial Intelligence (AI). We presented a brief comparison between different variants of IRL in this article.


2020 ◽  
Vol 39 (6) ◽  
pp. 8427-8439
Author(s):  
Leonardo Espinosa-Leal ◽  
Anthony Chapman ◽  
Magnus Westerlund

Industry has always been in the pursuit of becoming more economically efficient and the current focus has been to reduce human labour using modern technologies. Even with cutting edge technologies, which range from packaging robots to AI for fault detection, there is still some ambiguity on the aims of some new systems, namely, whether they are automated or autonomous. In this paper, we indicate the distinctions between automated and autonomous systems as well as review the current literature and identify the core challenges for creating learning mechanisms of autonomous agents. We discuss using different types of extended realities, such as digital twins, how to train reinforcement learning agents to learn specific tasks through generalisation. Once generalisation is achieved, we discuss how these can be used to develop self-learning agents. We then introduce self-play scenarios and how they can be used to teach self-learning agents through a supportive environment that focuses on how the agents can adapt to different environments. We introduce an initial prototype of our ideas by solving a multi-armed bandit problem using two ε-greedy algorithms. Further, we discuss future applications in the industrial management realm and propose a modular architecture for improving the decision-making process via autonomous agents.


2021 ◽  
Vol 7 ◽  
Author(s):  
Simen Theie Havenstrøm ◽  
Adil Rasheed ◽  
Omer San

Control theory provides engineers with a multitude of tools to design controllers that manipulate the closed-loop behavior and stability of dynamical systems. These methods rely heavily on insights into the mathematical model governing the physical system. However, in complex systems, such as autonomous underwater vehicles performing the dual objective of path following and collision avoidance, decision making becomes nontrivial. We propose a solution using state-of-the-art Deep Reinforcement Learning (DRL) techniques to develop autonomous agents capable of achieving this hybrid objective without having a priori knowledge about the goal or the environment. Our results demonstrate the viability of DRL in path following and avoiding collisions towards achieving human-level decision making in autonomous vehicle systems within extreme obstacle configurations.


2020 ◽  
Vol 34 (04) ◽  
pp. 5207-5215
Author(s):  
Aditya Modi ◽  
Debadeepta Dey ◽  
Alekh Agarwal ◽  
Adith Swaminathan ◽  
Besmira Nushi ◽  
...  

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in areas such as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.


2020 ◽  
pp. 189-217
Author(s):  
Louise Dennis ◽  
Michael Fisher

AbstractThe move towards greater autonomy presents challenges for software engineering. As we may be delegating greater responsibility to software systems and as these autonomous systems can make their own decisions and take their own actions, a step change in the way the systems are developed and verified is needed. This step involves moving from just considering what the system does, but also why it chooses to do it (since decision-making may be delegated). In this chapter, we provide an overview of our programme of work in this area: utilising hybrid agent architectures, exposing and verifying the reasons for decisions, and applying this to assessing a range of properties of autonomous systems.


Author(s):  
Fangkai Yang ◽  
Daoming Lyu ◽  
Bo Liu ◽  
Steven Gustafson

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework PEORL that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.


2017 ◽  
Author(s):  
Eugenia Isabel Gorlin ◽  
Michael W. Otto

To live well in the present, we take direction from the past. Yet, individuals may engage in a variety of behaviors that distort their past and current circumstances, reducing the likelihood of adaptive problem solving and decision making. In this article, we attend to self-deception as one such class of behaviors. Drawing upon research showing both the maladaptive consequences and self-perpetuating nature of self-deception, we propose that self-deception is an understudied risk and maintaining factor for psychopathology, and we introduce a “cognitive-integrity”-based approach that may hold promise for increasing the reach and effectiveness of our existing therapeutic interventions. Pending empirical validation of this theoretically-informed approach, we posit that patients may become more informed and autonomous agents in their own therapeutic growth by becoming more honest with themselves.


Sign in / Sign up

Export Citation Format

Share Document