scholarly journals Balancing control: a Bayesian interpretation of habitual and goal-directed behavior

2019 ◽  
Author(s):  
Sarah Schwöbel ◽  
Dimitrije Markovic ◽  
Michael N. Smolka ◽  
Stefan J. Kiebel

AbstractIn everyday life, our behavior varies on a continuum from automatic and habitual to deliberate and goal-directed. Recent evidence suggests that habit formation and relearning of habits operate in a context-dependent manner: Habit formation is promoted when actions are performed in a specific context, while breaking off habits is facilitated after a context change. It is an open question how one can computationally model the brain’s balancing between context-specific habits and goal-directed actions. Here, we propose a hierarchical Bayesian approach for control of a partially observable Markov decision process that enables conjoint learning of habits and reward structure in a context-specific manner. In this model, habit learning corresponds to an updating of priors over policies and interacts with the learning of the outcome contingencies. Importantly, the model is solely built on probabilistic inference, which effectively provides a simple explanation of how the brain may balance contributions of habitual and goal-directed control. We illustrated the resulting behavior using agent-based simulated experiments, where we replicated several findings of devaluation, extinction, and renewal experiments, as well as the so-called two-step task which is typically used with human participants. In addition, we show how a single parameter, the habitual tendency, can explain individual differences in habit learning and the balancing between habitual and goal-directed control. Finally, we discuss the link of the proposed model to other habit learning models and implications for understanding specific phenomena in substance use disorder.


2019 ◽  
Author(s):  
Anthony Guanxun Chen ◽  
David Benrimoh ◽  
Thomas Parr ◽  
Karl J. Friston

AbstractThis paper offers a formal account of policy learning, or habitual behavioural optimisation, under the framework of Active Inference. In this setting, habit formation becomes an autodidactic, experience-dependent process, based upon what the agent sees itself doing. We focus on the effect of environmental volatility on habit formation by simulating artificial agents operating in a partially observable Markov decision process. Specifically, we used a ‘two-step’ maze paradigm, in which the agent has to decide whether to go left or right to secure a reward. We observe that in volatile environments with numerous reward locations, the agents learn to adopt a generalist strategy, never forming a strong habitual behaviour for any preferred maze direction. Conversely, in conservative or static environments, agents adopt a specialist strategy; forming strong preferences for policies that result in approach to a small number of previously-observed reward locations. The pros and cons of the two strategies are tested and discussed. In general, specialization offers greater benefits, but only when contingencies are conserved over time. We consider the implications of this formal (Active Inference) account of policy learning for understanding the relationship between specialisation and habit formation.Author SummaryActive inference is a theoretical framework that formalizes the behaviour of any organism in terms of a single imperative – to minimize surprise. Starting from this principle, we can construct simulations of simple “agents” (artificial organisms) that show the ability to infer causal relationships and learn. Here, we expand upon currently-existing implementations of Active Inference by enabling synthetic agents to optimise the space of behavioural policies that they can pursue. Our results show that by adapting the probabilities of certain action sequences (which may correspond biologically to the phenomenon of synaptic plasticity), and by rejecting improbable sequences (synaptic pruning), the agents can begin to form habits. Furthermore, we have shown our agent’s habit formation to be environment-dependent. Some agents become specialised to a constant environment, while other adopt a more general strategy, each with sensible pros and cons. This work has potential applications in computational psychiatry, including in behavioural phenotyping to better understand disorders.



Author(s):  
James M. DuBois ◽  
Beth Prusaczyk

This chapter focuses primarily on the protection of human participants in D&I studies. It begins by reviewing the Belmont principles that undergird US research regulations and considering the ethical case for D&I research. It then proceeds to examine some ethical issues that might arise during the course of a public health, D&I research agenda in middle schools. It covers the ethical case for D&I research and common ethical challenges. The chapter also discusses strategies for ethical decision-making. While these strategies may be beneficial to all researchers, the authors believe they are of particular value to dissemination and implementation researchers because the nature of their work—context specific, complex, and unfamiliar to many peers, collaborators, and reviewers—means they will deal with uncertainty and conflict on a regular basis, and solutions to the problems they face will rarely be found through simple reference principles, rules, or regulations.



Author(s):  
Chaochao Lin ◽  
Matteo Pozzi

Optimal exploration of engineering systems can be guided by the principle of Value of Information (VoI), which accounts for the topological important of components, their reliability and the management costs. For series systems, in most cases higher inspection priority should be given to unreliable components. For redundant systems such as parallel systems, analysis of one-shot decision problems shows that higher inspection priority should be given to more reliable components. This paper investigates the optimal exploration of redundant systems in long-term decision making with sequential inspection and repairing. When the expected, cumulated, discounted cost is considered, it may become more efficient to give higher inspection priority to less reliable components, in order to preserve system redundancy. To investigate this problem, we develop a Partially Observable Markov Decision Process (POMDP) framework for sequential inspection and maintenance of redundant systems, where the VoI analysis is embedded in the optimal selection of exploratory actions. We investigate the use of alternative approximate POMDP solvers for parallel and more general systems, compare their computation complexities and performance, and show how the inspection priorities depend on the economic discount factor, the degradation rate, the inspection precision, and the repair cost.



2018 ◽  
Vol 15 (02) ◽  
pp. 1850011 ◽  
Author(s):  
Frano Petric ◽  
Damjan Miklić ◽  
Zdenko Kovačić

The existing procedures for autism spectrum disorder (ASD) diagnosis are often time consuming and tiresome both for highly-trained human evaluators and children, which may be alleviated by using humanoid robots in the diagnostic process. Hence, this paper proposes a framework for robot-assisted ASD evaluation based on partially observable Markov decision process (POMDP) modeling, specifically POMDPs with mixed observability (MOMDPs). POMDP is broadly used for modeling optimal sequential decision making tasks under uncertainty. Spurred by the widely accepted autism diagnostic observation schedule (ADOS), we emulate ADOS through four tasks, whose models incorporate observations of multiple social cues such as eye contact, gestures and utterances. Relying only on those observations, the robot provides an assessment of the child’s ASD-relevant functioning level (which is partially observable) within a particular task and provides human evaluators with readable information by partitioning its belief space. Finally, we evaluate the proposed MOMDP task models and demonstrate that chaining the tasks provides fine-grained outcome quantification, which could also increase the appeal of robot-assisted diagnostic protocols in the future.





Author(s):  
Chuande Liu ◽  
Chuang Yu ◽  
Bingtuan Gao ◽  
Syed Awais Ali Shah ◽  
Adriana Tapus

AbstractTelemanipulation in power stations commonly require robots first to open doors and then gain access to a new workspace. However, the opened doors can easily close by disturbances, interrupt the operations, and potentially lead to collision damages. Although existing telemanipulation is a highly efficient master–slave work pattern due to human-in-the-loop control, it is not trivial for a user to specify the optimal measures to guarantee safety. This paper investigates the safety-critical motion planning and control problem to balance robotic safety against manipulation performance during work emergencies. Based on a dynamic workspace released by door-closing, the interactions between the workspace and robot are analyzed using a partially observable Markov decision process, thereby making the balance mechanism executed as belief tree planning. To act the planning, apart from telemanipulation actions, we clarify other three safety-guaranteed actions: on guard, defense and escape for self-protection by estimating collision risk levels to trigger them. Besides, our experiments show that the proposed method is capable of determining multiple solutions for balancing robotic safety and work efficiency during telemanipulation tasks.



Sign in / Sign up

Export Citation Format

Share Document