scholarly journals Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

2020 ◽  
Author(s):  
Jonathan W. Kanen ◽  
Qiang Luo ◽  
Mojtaba R. Kandroodi ◽  
Rudolf N. Cardinal ◽  
Trevor W. Robbins ◽  
...  

AbstractThe non-selective serotonin 2A (5-HT2A) receptor agonist lysergic acid diethylamide (LSD) holds promise as a treatment for some psychiatric disorders. Psychedelic drugs such as LSD have been suggested to have therapeutic actions through their effects on learning. The behavioural effects of LSD in humans, however, remain largely unexplored. Here we examined how LSD affects probabilistic reversal learning in healthy humans. Conventional measures assessing sensitivity to immediate feedback (“win-stay” and “lose-shift” probabilities) were unaffected, whereas LSD increased the impact of the strength of initial learning on perseveration. Computational modelling revealed that the most pronounced effect of LSD was enhancement of the reward learning rate. The punishment learning rate was also elevated. Increased reinforcement learning rates suggest LSD induced a state of heightened plasticity. These results indicate a potential mechanism through which revision of maladaptive associations could occur.

Algorithms ◽  
2020 ◽  
Vol 13 (9) ◽  
pp. 239 ◽  
Author(s):  
Menglin Li ◽  
Xueqiang Gu ◽  
Chengyi Zeng ◽  
Yuan Feng

Reinforcement learning, as a branch of machine learning, has been gradually applied in the control field. However, in the practical application of the algorithm, the hyperparametric approach to network settings for deep reinforcement learning still follows the empirical attempts of traditional machine learning (supervised learning and unsupervised learning). This method ignores part of the information generated by agents exploring the environment contained in the updating of the reinforcement learning value function, which will affect the performance of the convergence and cumulative return of reinforcement learning. The reinforcement learning algorithm based on dynamic parameter adjustment is a new method for setting learning rate parameters of deep reinforcement learning. Based on the traditional method of setting parameters for reinforcement learning, this method analyzes the advantages of different learning rates at different stages of reinforcement learning and dynamically adjusts the learning rates in combination with the temporal-difference (TD) error values to achieve the advantages of different learning rates in different stages to improve the rationality of the algorithm in practical application. At the same time, by combining the Robbins–Monro approximation algorithm and deep reinforcement learning algorithm, it is proved that the algorithm of dynamic regulation learning rate can theoretically meet the convergence requirements of the intelligent control algorithm. In the experiment, the effect of this method is analyzed through the continuous control scenario in the standard experimental environment of ”Car-on-The-Hill” of reinforcement learning, and it is verified that the new method can achieve better results than the traditional reinforcement learning in practical application. According to the model characteristics of the deep reinforcement learning, a more suitable setting method for the learning rate of the deep reinforcement learning network proposed. At the same time, the feasibility of the method has been proved both in theory and in the application. Therefore, the method of setting the learning rate parameter is worthy of further development and research.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Michiyo Sugawara ◽  
Kentaro Katahira

AbstractThe learning rate is a key parameter in reinforcement learning that determines the extent to which novel information (outcome) is incorporated in guiding subsequent actions. Numerous studies have reported that the magnitude of the learning rate in human reinforcement learning is biased depending on the sign of the reward prediction error. However, this asymmetry can be observed as a statistical bias if the fitted model ignores the choice autocorrelation (perseverance), which is independent of the outcomes. Therefore, to investigate the genuine process underlying human choice behavior using empirical data, one should dissociate asymmetry in learning and perseverance from choice behavior. The present study addresses this issue by using a Hybrid model incorporating asymmetric learning rates and perseverance. First, by conducting simulations, we demonstrate that the Hybrid model can identify the true underlying process. Second, using the Hybrid model, we show that empirical data collected from a web-based experiment are governed by perseverance rather than asymmetric learning. Finally, we apply the Hybrid model to two open datasets in which asymmetric learning was reported. As a result, the asymmetric learning rate was validated in one dataset but not another.


2021 ◽  
Author(s):  
Stefano Palminteri

Do we preferentially learn from outcomes that confirm our choices? This is one of the most basic, and yet consequence-bearing, questions concerning reinforcement learning. In recent years, we investigated this question in a series of studies implementing increasingly complex behavioral protocols. The learning rates fitted in experiments featuring partial or complete feedback, as well as free and forced choices, were systematically found to be consistent with a choice-confirmation bias. This result is robust across a broad range of outcome contingencies and response modalities. One of the prominent behavioral consequences of the confirmatory learning rate pattern is choice hysteresis: that is the tendency of repeating previous choices, despite contradictory evidence. As robust and replicable as they have proven to be, these findings were (legitimately) challenged by a couple of studies pointing out that a choice-confirmatory pattern of learning rates may spuriously arise from not taking into consideration an explicit choice autocorrelation term in the model. In the present study, we re-analyze data from four previously published papers (in total nine experiments; N=363), originally included in the studies demonstrating (or criticizing) the choice-confirmation bias in human participants. We fitted two models: one featured valence-specific updates (i.e., different learning rates for confirmatory and disconfirmatory outcomes) and one additionally including an explicit choice autocorrelation process (gradual perseveration). Our analysis confirms that the inclusion of the gradual perseveration process in the model significantly reduces the estimated choice-confirmation bias. However, in all considered experiments, the choice-confirmation bias remains present at the meta-analytical level, and significantly different from zero in most experiments. Our results demonstrate that the choice-confirmation bias resists the inclusion of an explicit choice autocorrelation term, thus proving to be a robust feature of human reinforcement learning. We conclude by discussing the psychological plausibility of the gradual perseveration process in the context of these behavioral paradigms and by pointing to additional computational processes that may play an important role in estimating and interpreting the computational biases under scrutiny.


2018 ◽  
Author(s):  
Vincent Moens ◽  
Alexandre Zénon

AbstractAgents living in volatile environments must be able to detect changes in contingencies while refraining to adapt to unexpected events that are caused by noise. In Reinforcement Learning (RL) frameworks, this requires learning rates that adapt to past reliability of the model. The observation that behavioural flexibility in animals tends to decrease following prolonged training in stable environment provides experimental evidence for such adaptive learning rates. However, in classical RL models, learning rate is either fixed or scheduled and can thus not adapt dynamically to environmental changes. Here, we propose a new Bayesian learning model, using variational inference, that achieves adaptive change detection by the use of Stabilized Forgetting, updating its current belief based on a mixture of fixed, initial priors and previous posterior beliefs. The weight given to these two sources is optimized alongside the other parameters, allowing the model to adapt dynamically to changes in environmental volatility and to unexpected observations. This approach is used to implement the “critic” of an actor-critic RL model, while the actor samples the resulting value distributions to choose which action to undertake. We show that our model can emulate different adaptation strategies to contingency changes, depending on its prior assumptions of environmental stability, and that model parameters can be fit to real data with high accuracy. The model also exhibits trade-offs between flexibility and computational costs that mirror those observed in real data. Overall, the proposed method provides a general framework to study learning flexibility and decision making in RL contexts.Author summaryIn stable contexts, animals and humans exhibit automatic behaviour that allows them to make fast decisions. However, these automatic processes exhibit a lack of flexibility when environmental contingencies change. In the present paper, we propose a model of behavioural automatization that is based on adaptive forgetting and that emulates these properties. The model builds an estimate of the stability of the environment and uses this estimate to adjust its learning rate and the balance between exploration and exploitation policies. The model performs Bayesian inference on latent variables that represent relevant environmental properties, such as reward functions, optimal policies or environment stability. From there, the model makes decisions in order to maximize long-term rewards, with a noise proportional to environmental uncertainty. This rich model encompasses many aspects of Reinforcement Learning (RL), such as Temporal Difference RL and counterfactual learning, and accounts for the reduced computational cost of automatic behaviour. Using simulations, we show that this model leads to interesting predictions about the efficiency with which subjects adapt to sudden change of contingencies after prolonged training.


2020 ◽  
Vol 6 ◽  
pp. 205032452094234
Author(s):  
JEC Anthony ◽  
A Winstock ◽  
JA Ferris ◽  
DJ Nutt

It is well documented that psychedelic drugs can have a profound effect on colour perception. After previous research involving psychedelic drug ingestion, several participants had written to the authors describing how symptoms of their colour blindness had improved. The Global Drugs Survey runs the world’s largest annual online drug survey. In the Global Drugs Survey 2017, participants reporting the use of lysergic acid diethylamide or psilocybin in the last 12 months were asked,    We have received reports from some people with colour-blindness that this improves after they use psychedelics. If you have experienced such an effect can you please describe it in the box below, say what drug you took and how long the effect lasted. We received 47 responses that could be usefully categorised of which 23 described improved colour blindness. Commonly cited drugs were LSD and psilocybin; however, several other psychedelic compounds were also listed. Some respondents cited that the changes in colour blindness persisted, from a period of several days to years. Improved colour blindness may be a result of new photisms experienced in the psychedelic state aligning with pre-existing concepts of colour to be ascribed a label. Connections between visual and linguistic cortical areas may be enhanced due to disorder in the brain’s neural connections induced by psychedelics allowing these new photisms and concepts to become linked. This paper provides preliminary data regarding improved colour blindness accompanying recreational psychedelic use which may be further investigated in future iterations of the Global Drugs Survey or in a stand-alone Global Drugs Survey-managed psychedelics survey.


2021 ◽  
Author(s):  
Mojtaba Rostami Kandroodi ◽  
Abdol-Hossein Vahabie ◽  
Sara Ahmadi ◽  
Babak Nadjar Araabi ◽  
Majid Nili Ahmadabadi

AbstractThe ability to predict the future is essential for decision-making and interaction with the environment to avoid punishment and gain reward. Reinforcement learning algorithms provide a normative way for interactive learning, especially in volatile environments. The optimal strategy for the classic reinforcement learning model is to increase the learning rate as volatility increases. Inspired by optimistic bias in humans, an alternative reinforcement learning model has been developed by adding a punishment learning rate to the classic reinforcement learning model. In this study, we aim to 1) compare the performance of these two models in interaction with different environments, and 2) find optimal parameters for the models. Our simulations indicate that having two different learning rates for rewards and punishments increases performance in a volatile environment. Investigation of the optimal parameters shows that in almost all environments, having a higher reward learning rate compared to the punishment learning rate is beneficial for achieving higher performance which in this case is the accumulation of more rewards. Our results suggest that to achieve high performance, we need a shorter memory window for recent rewards and a longer memory window for punishments. This is consistent with optimistic bias in human behavior.


2019 ◽  
Author(s):  
Anne Kühnel ◽  
Vanessa Teckentrup ◽  
Monja P. Neuser ◽  
Quentin J. M. Huys ◽  
Caroline Burrasch ◽  
...  

AbstractWhen facing decisions to approach rewards or to avoid punishments, we often figuratively go with our gut, and the impact of metabolic states such as hunger on motivation are well documented. However, whether and how vagal feedback signals from the gut influence instrumental actions is unknown. Here, we investigated the effect of non-invasive transcutaneous vagus nerve stimulation (tVNS) vs. sham (randomized cross-over design) on approach and avoidance behavior using an established go/no-go reinforcement learning paradigm (Guitart-Masip et al., 2012) in 39 healthy, participants after an overnight fast. First, mixed-effects logistic regression analysis of choice accuracy showed that tVNS acutely impaired decision-making, p = .045. Computational reinforcement learning models identified the cause of this as a reduction in the learning rate through tVNS (Δα = −0.092, pboot= .002), particularly after punishment (ΔαPun= −0.081, pboot= .012 vs. ΔαRew= −0.031, p = .22). However, tVNS had no effect on go biases, Pavlovian response biases or response time. Hence, tVNS appeared to influence learning rather than action execution. These results highlight a novel role of vagal afferent input in modulating reinforcement learning by tuning the learning rate according to homeostatic needs.


2020 ◽  
Author(s):  
Jo Cutler ◽  
Marco Wittmann ◽  
Ayat Abdurahman ◽  
Luca Hargitai ◽  
Daniel Drew ◽  
...  

AbstractReinforcement learning is a fundamental mechanism displayed by many species from mice to humans. However, adaptive behaviour depends not only on learning associations between actions and outcomes that affect ourselves, but critically, also outcomes that affect other people. Existing studies suggest reinforcement learning ability declines across the lifespan and self-relevant learning can be computationally separated from learning about rewards for others, yet how older adults learn what rewards others is unknown. Here, using computational modelling of a probabilistic reinforcement learning task, we tested whether young (age 18-36) and older (age 60-80, total n=152) adults can learn to gain rewards for themselves, another person (prosocial), or neither individual (control). Detailed model comparison showed that a computational model with separate learning rates best explained how people learn associations for different recipients. Young adults were faster to learn when their actions benefitted themselves, compared to when they helped others. Strikingly however, older adults showed reduced self-bias, with a relative increase in the rate at which they learnt about actions that helped others, compared to themselves. Moreover, we find evidence that these group differences are associated with changes in psychopathic traits over the lifespan. In older adults, psychopathic traits were significantly reduced and negatively correlated with prosocial learning rates. Importantly, older people with the lowest levels of psychopathy had the highest prosocial learning rates. These findings suggest learning how our actions help others is preserved across the lifespan with implications for our understanding of reinforcement learning mechanisms and theoretical accounts of healthy ageing.


2020 ◽  
Author(s):  
Liyu Xia ◽  
Sarah L Master ◽  
Maria K Eckstein ◽  
Beth Baribault ◽  
Ronald E Dahl ◽  
...  

AbstractIn the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggests probabilistic learning may be inefficient in youth compared to adults [1], while others suggest it may be more efficient in youth that are in mid adolescence [2, 3]. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time horizon); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.Author summaryAdolescence is a time of great uncertainty. It is also a critical time for brain development, learning, and decision making in social and educational domains. There are currently contradictory findings about learning in adolescence. We sought to better isolate how learning from stable probabilistic contingencies changes during adolescence with a task that previously showed interesting results in adolescents. We collected a relatively large sample size (297 participants) across a wide age range (8-30), to trace the adolescent developmental trajectory of learning under stable but uncertain conditions. We found that age in our sample was positively associated with higher learning rates and lower choice exploration. Within narrow age bins, we found that higher saliva testosterone levels were associated with higher learning rates in participants age 13-15 years. These findings can help us better isolate the trajectory of maturation of core learning and decision making processes during adolescence.


Sign in / Sign up

Export Citation Format

Share Document