scholarly journals Better Than Maximum Likelihood Estimation of Model-Based And Model-Free Learning Style

Author(s):  
Sadjad Yazdani ◽  
Abdol-Hossein Vahabie ◽  
Babak Nadjar Araabi ◽  
Majid Nili Ahmadabadi

Abstract Various decision-making systems work together to shape human behavior. Habitual and goal-directed systems are the two most important ones that are studied by reinforcement learning (RL), using model-free and model-based learning methods, respectively. Human behavior resembles the weighted combination of these two systems. Such a combination is modeled by the weighted sum of action values of the model-based and model-free systems. The weighting parameter has been mostly extracted by "maximum likelihood" or "maximum a-posteriori" estimation methods. In this study, we show these two well-known methods bring many challenges, and their respective extracted values are less reliable, especially in the case of limited sample size or at the proximity of extremes values. We propose that using k‑nearest neighbor, as a free format estimate, can improve the estimation error. k-nn uses global information extracted from the behavior such as stay probability, along with fitted values. The proposed method is examined by simulated experiments, where obtained results indicate the advantage of our method in reducing both bias and variance of the error. Investigation of the human behavior data from previous studies shows that the proposed method results in more statistically robust estimates in predicting other behavioral indices such as the number of gaze directions toward each target or symptoms of some psychiatric disorders. In brief, the proposed method increases the reliability of the estimated parameters and enhances the applicability of reinforcement learning paradigms in clinical trials.

2018 ◽  
Author(s):  
Sadjad Yazdani ◽  
Abdol-Hossein Vahabie ◽  
Babak Nadjar Araabi ◽  
Majid Nili Ahmadabadi

AbstractMultiple decision making systems work together to shape the final choices in human behavior. Habitual and goal-directed systems are the two most important systems that are studied in the reinforcement learning (RL) literature by model-free and model-based learning methods. Human behavior resembles the weighted combination of these systems and such a combination is modeled by weighted summation of action’s value from the model based and model free systems. Extraction of this weighted parameter, which is important for many applications and computational modeling, has been mostly based on the maximum likelihood or maximum a posteriori methods. We show these methods bring many challenges and their respective extracted values are less reliable especially in the proximity of extremes values. We propose that using a free format learning method (k-nearest neighbor) which uses more information besides the fitted values e.g. global information like stay probability instead of trial by trial information can ameliorate the estimation error. The proposed method is examined by simulation and results show the advantage of the proposed method. In addition, investigation of the human behavior data from previous researchers proved the proposed method to result in more statistically robust results in predicting other behavioral indices such as the number of gaze directions toward each target. In brief, the proposed method increases the reliability of the estimated parameters and enhances the applicability of reinforcement learning paradigms in clinical trials.


Author(s):  
Vinamra Jain ◽  
Prashant Doshi ◽  
Bikramjit Banerjee

The problem of learning an expert’s unknown reward function using a limited number of demonstrations recorded from the expert’s behavior is investigated in the area of inverse reinforcement learning (IRL). To gain traction in this challenging and underconstrained problem, IRL methods predominantly represent the reward function of the expert as a linear combination of known features. Most of the existing IRL algorithms either assume the availability of a transition function or provide a complex and inefficient approach to learn it. In this paper, we present a model-free approach to IRL, which casts IRL in the maximum likelihood framework. We present modifications of the model-free Q-learning that replace its maximization to allow computing the gradient of the Q-function. We use gradient ascent to update the feature weights to maximize the likelihood of expert’s trajectories. We demonstrate on two problem domains that our approach improves the likelihood compared to previous methods.


2019 ◽  
Author(s):  
Leor M Hackel ◽  
Jeffrey Jordan Berg ◽  
Björn Lindström ◽  
David Amodio

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.


2022 ◽  
pp. 1-12
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Huiwen Zhang ◽  
Xin Zhang ◽  
Yuquan Leng

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.


2021 ◽  
Vol 8 ◽  
Author(s):  
Huan Zhao ◽  
Junhua Zhao ◽  
Ting Shu ◽  
Zibin Pan

Buildings account for a large proportion of the total energy consumption in many countries and almost half of the energy consumption is caused by the Heating, Ventilation, and air-conditioning (HVAC) systems. The model predictive control of HVAC is a complex task due to the dynamic property of the system and environment, such as temperature and electricity price. Deep reinforcement learning (DRL) is a model-free method that utilizes the “trial and error” mechanism to learn the optimal policy. However, the learning efficiency and learning cost are the main obstacles of the DRL method to practice. To overcome this problem, the hybrid-model-based DRL method is proposed for the HVAC control problem. Firstly, a specific MDPs is defined by considering the energy cost, temperature violation, and action violation. Then the hybrid-model-based DRL method is proposed, which utilizes both the knowledge-driven model and the data-driven model during the whole learning process. Finally, the protection mechanism and adjusting reward methods are used to further reduce the learning cost. The proposed method is tested in a simulation environment using the Australian Energy Market Operator (AEMO) electricity price data and New South Wales temperature data. Simulation results show that 1) the DRL method can reduce the energy cost while maintaining the temperature satisfactory compared to the short term MPC method; 2) the proposed method improves the learning efficiency and reduces the learning cost during the learning process compared to the model-free method.


2019 ◽  
Author(s):  
Allison Letkiewicz ◽  
Amy L. Cochran ◽  
Josh M. Cisler

Trauma and trauma-related disorders are characterized by altered learning styles. Two learning processes that have been delineated using computational modeling are model-free and model-based reinforcement learning (RL), characterized by trial and error and goal-driven, rule-based learning, respectively. Prior research suggests that model-free RL is disrupted among individuals with a history of assaultive trauma and may contribute to altered fear responding. Currently, it is unclear whether model-based RL, which involves building abstract and nuanced representations of stimulus-outcome relationships to prospectively predict action-related outcomes, is also impaired among individuals who have experienced trauma. The present study sought to test the hypothesis of impaired model-based RL among adolescent females exposed to assaultive trauma. Participants (n=60) completed a three-arm bandit RL task during fMRI acquisition. Two computational models compared the degree to which each participant’s task behavior fit the use of a model-free versus model-based RL strategy. Overall, a greater portion of participants’ behavior was better captured by the model-based than model-free RL model. Although assaultive trauma did not predict learning strategy use, greater sexual abuse severity predicted less use of model-based compared to model-free RL. Additionally, severe sexual abuse predicted less left frontoparietal network encoding of model-based RL updates, which was not accounted for by PTSD. Given the significant impact that sexual trauma has on mental health and other aspects of functioning, it is plausible that altered model-based RL is an important route through which clinical impairment emerges.


Sign in / Sign up

Export Citation Format

Share Document