Model-free LQR design by Q-function learning

Automatica ◽  
2022 ◽  
Vol 137 ◽  
pp. 110060
Author(s):  
Milad Farjadnasab ◽  
Maryam Babazadeh
Author(s):  
Vinamra Jain ◽  
Prashant Doshi ◽  
Bikramjit Banerjee

The problem of learning an expert’s unknown reward function using a limited number of demonstrations recorded from the expert’s behavior is investigated in the area of inverse reinforcement learning (IRL). To gain traction in this challenging and underconstrained problem, IRL methods predominantly represent the reward function of the expert as a linear combination of known features. Most of the existing IRL algorithms either assume the availability of a transition function or provide a complex and inefficient approach to learn it. In this paper, we present a model-free approach to IRL, which casts IRL in the maximum likelihood framework. We present modifications of the model-free Q-learning that replace its maximization to allow computing the gradient of the Q-function. We use gradient ascent to update the feature weights to maximize the likelihood of expert’s trajectories. We demonstrate on two problem domains that our approach improves the likelihood compared to previous methods.


2020 ◽  
Vol 43 ◽  
Author(s):  
Peter Dayan

Abstract Bayesian decision theory provides a simple formal elucidation of some of the ways that representation and representational abstraction are involved with, and exploit, both prediction and its rather distant cousin, predictive coding. Both model-free and model-based methods are involved.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2020 ◽  
Vol 1 (1) ◽  
Author(s):  
Nunu Nugraha Purnawan

The lecturer's performance assessment by students in the form of an online Lecturer Performance Assessment Instrument Questionnaire (KIPKD) is in line with the work concepts of Green Computing by utilizing computer hardware and software better, more efficiently and more useful. KIPKD online POLSUB uses Google Forms, because it has an attractive and responsive look, provides a fairly complete choice of stuffing model, free, the results are neatly arranged and can be analyzed easily. This research method uses literature review in the form of books, journals that discuss about topics related to the use of Google Forms as a medium in the manufacture of questionnaires for surveys and data collection, as well as related to the concept of Green Computing. While data collection methods used in field research by way of observation of the system running in the academic POLSUB. The use of KIPKD online illustrates that POLSUB participates in preserving the environment, with no 12 paper/year rims, equivalent to 12 tree trunks.


Sign in / Sign up

Export Citation Format

Share Document