Flexibility in valenced reinforcement learning computations across development

Mapping Intimacies ◽

10.31234/osf.io/5f9uc ◽

2021 ◽

Author(s):

Kate Nussenbaum ◽

Juan A. Velez ◽

Bradli T. Washington ◽

Hannah E. Hamling ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Mixed Race ◽

Positive Outcomes ◽

Negative Outcomes ◽

Optimal Integration ◽

Adolescents And Adults ◽

Context Dependent ◽

Expected Outcomes ◽

Better Than

Optimal integration of positive and negative outcomes during learning varies depending on an environment’s reward statistics. The present study investigated the extent to which children, adolescents, and adults (N = 142 8 - 25 year-olds, 55% female, 42% White, 31% Asian, 17% mixed race, and 8% Black) adapt their weighting of better-than-expected and worse-than-expected outcomes when learning from reinforcement. Participants made a series of choices across two contexts: one in which weighting positive outcomes more heavily than negative outcomes led to better performance, and one in which the reverse was true. Reinforcement learning modeling revealed that across age, participants shifted their valence biases in accordance with the structure of the environment. Exploratory analyses revealed increases in context-dependent flexibility with age.

Download Full-text

Valence biases in reinforcement learning shift across adolescence and modulate subsequent memory

10.31234/osf.io/n3vsr ◽

2020 ◽

Author(s):

Gail Rosenbaum ◽

Hannah Grassie ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Trial And Error ◽

Independent Dataset ◽

Age Related ◽

Subjective Value ◽

Valuation Process ◽

Incidental Memory ◽

Past Experiences ◽

Expected Outcomes ◽

Better Than

Individuals learn differently through trial and error, with some more influenced by good outcomes, and others weighting bad outcomes more heavily. Such valence biases may also influence memory for past experiences. Here, we examined whether valence asymmetries in reinforcement learning change across adolescence, and whether individual learning asymmetries bias the content of subsequent memory. Participants ages 8-27 learned the values of “point machines”, after which their memory for trial-unique images presented with choice outcomes was assessed. Relative to children and adults, adolescents overweighted worse-than-expected outcomes during learning. Individuals’ valence biases modulated incidental memory, such that those who prioritized worse- (or better-) than-expected outcomes during learning were also more likely to remember images paired with these outcomes, an effect reproduced in an independent dataset. Collectively, these results highlight age-related changes in the computation of subjective value, and demonstrate that a valence-asymmetric valuation process influences how information is prioritized in episodic memory.

Download Full-text

Children are full of optimism, but those rose-tinted glasses are fading: reduced learning from negative outcomes drives hyperoptimism in children.

10.1101/2021.06.29.450349 ◽

2021 ◽

Author(s):

Johanna Habicht ◽

Aislinn Bowler ◽

Madeleine E Moses-Payne ◽

Tobias U. Hauser

Keyword(s):

Age Groups ◽

Computational Modelling ◽

Learning Task ◽

Positive Outcomes ◽

Late Adolescents ◽

Negative Outcomes ◽

Optimism Bias ◽

The Face ◽

Expected Outcomes

Believing that good things will happen in life is essential to maintain motivation and achieve highly ambitious goals. This optimism bias, the overestimation of positive outcomes, may be particularly important during childhood when motivation must be maintained in the face of negative outcomes. In a learning task, we have thus studied the mechanisms underlying the development of optimism bias. Investigating children (8-9 year-olds), early (12-13 year-olds) and late adolescents (16-17 year-olds), we find a consistent optimism bias across age groups. However, children were particularly hyperoptimistic, with the optimism bias decreasing with age. Using computational modelling, we show that this was driven by a reduced learning from worse-than-expected outcomes, and this reduced learning explains why children are hyperoptimistic. Our findings thus show that insensitivity to bad outcomes in childhood helps to prevent taking on an overly realistic perspective and maintain motivation.

Download Full-text

Testing a micro-genesis account of longer-form reinforcement learning (win-calmness and loss-restlessness)

10.31234/osf.io/r485q ◽

2021 ◽

Author(s):

Ahad Asad ◽

Ben Dyson

Keyword(s):

Reinforcement Learning ◽

Simple Game ◽

Positive Outcomes ◽

Group Level ◽

Negative Outcomes ◽

Decision Space ◽

Learning Principles

Fundamental reinforcement learning principles such as win-stay and lose-shift represent outcome-action associations between consecutive trials (trial n-1 and n). Longer-form expressions of the tendency to continually repeat previous actions following positive outcomes, and the tendency to continually change previous actions following negative outcomes, have been identified as win-calmness and lose-restlessness, respectively. Across 10 experiments, we tested a micro-genesis account of these phenomena by examining sequential contingencies across trial n-2, n-1 and n using simple game spaces. At a group level, we found no evidence of win-calmness and lose-restlessness when wins could not be maximized (unexploitable opponent). Similarly, we found no evidence of win-calmness and lose-restlessness when the threat of win minimization was presented (exploiting opponent). In contrast, we found evidence of win-calmness (but not lose-restlessness) when win maximization was made possible (exploitable opponent). At a participant level, we confirm that individual win rates determined the degree of win-calmness and lose-restlessness only in contexts were win rates could be maximized. The data identify the mechanisms that allow for the development of longer-form reinforcement learning principles and demonstrate the relative flexibility in decision-space afforded by positive outcomes, and the relative inflexibility in decision-space following negative outcomes.

Download Full-text

Try and try again: Post-error boost of an implicit measure of agency

Quarterly Journal of Experimental Psychology ◽

10.1080/17470218.2017.1350871 ◽

2018 ◽

Vol 71 (7) ◽

pp. 1584-1595 ◽

Cited By ~ 10

Author(s):

Steven Di Costa ◽

Héloïse Théro ◽

Valérian Chambon ◽

Patrick Haggard

Keyword(s):

Reinforcement Learning ◽

Subjective Experience ◽

Action Selection ◽

Quantitative Model ◽

Sense Of Agency ◽

Implicit Measure ◽

Positive Outcomes ◽

Negative Outcomes ◽

Intentional Binding ◽

Reward Probability

The sense of agency refers to the feeling that we control our actions and, through them, effects in the outside world. Reinforcement learning provides an important theoretical framework for understanding why people choose to make particular actions. Few previous studies have considered how reinforcement and learning might influence the subjective experience of agency over actions and outcomes. In two experiments, participants chose between two action alternatives, which differed in reward probability. Occasional reversals of action–reward mapping required participants to monitor outcomes and adjust action selection processing accordingly. We measured shifts in the perceived times of actions and subsequent outcomes (‘intentional binding’) as an implicit proxy for sense of agency. In the first experiment, negative outcomes showed stronger binding towards the preceding action, compared to positive outcomes. Furthermore, negative outcomes were followed by increased binding of actions towards their outcome on the following trial. Experiment 2 replicated this post-error boost in action binding and showed that it only occurred when people could learn from their errors to improve action choices. We modelled the post-error boost using an established quantitative model of reinforcement learning. The post-error boost in action binding correlated positively with participants’ tendency to learn more from negative outcomes than from positive outcomes. Our results suggest a novel relation between sense of agency and reinforcement learning, in which sense of agency is increased when negative outcomes trigger adaptive changes in subsequent action selection processing.

Download Full-text

Reality Negotiation

The Oxford Handbook of Positive Psychology ◽

10.1093/oxfordhb/9780195187243.013.0045 ◽

2009 ◽

pp. 474-482

Author(s):

Raymond L. Higgins ◽

Matthew W. Gallagher

Keyword(s):

Social Support ◽

Individual Differences ◽

Positive Psychology ◽

Social Constructionist ◽

Positive Outcomes ◽

Negative Outcomes ◽

Negotiation Research ◽

The Social ◽

Coping Processes

This chapter presents an overview of the development and status of the reality negotiation construct and relates it to a variety of coping processes. The reality negotiation construct follows from the social constructionist tradition and first appeared in discussions of how excuses protect self-images by decreasing the causal linkage to negative outcomes. The reality negotiation construct was later expanded to include a discussion of how the process of hoping may be used to increase perceived linkage to positive outcomes. In the two decades since these constructs were first introduced, four individual differences measures have been developed, and the effects of these reality negotiation techniques have been studied extensively. Reality negotiation techniques can be both maladaptive and adaptive and have been shown to be associated with coping and social support in a variety of populations. The chapter concludes by highlighting a few areas in which reality negotiation research could expand to further its relevance and applicability to the field of positive psychology.

Download Full-text

Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination

SN Applied Sciences ◽

10.1007/s42452-021-04658-6 ◽

2021 ◽

Vol 3 (6) ◽

Author(s):

Ogbonnaya Anicho ◽

Philip B. Charlesworth ◽

Gurvinder S. Baicher ◽

Atulya K. Nagar

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Swarm Intelligence ◽

Performance Indicators ◽

Convergence Rates ◽

Tuning Parameters ◽

Continuous State Space ◽

Continuous State ◽

User Coverage ◽

Better Than

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.

Download Full-text

The rational use of causal inference to guide reinforcement learning strengthens with age

10.31234/osf.io/j9zuk ◽

2019 ◽

Author(s):

Alexandra O. Cohen ◽

Kate Nussenbaum ◽

Hayley Dorfman ◽

Samuel J. Gershman ◽

Catherine A. Hartley

Keyword(s):

Reinforcement Learning ◽

Causal Structure ◽

Learning Task ◽

Negative Events ◽

Shape Learning ◽

Adolescents And Adults ◽

Bayesian Reinforcement Learning ◽

External Causes ◽

Reinforcement Learning Models ◽

Best Fit

Beliefs about the controllability of positive or negative events in the environment can shape learning throughout the lifespan. Previous research has shown that adults’ learning is modulated by beliefs about the causal structure of the environment such that they will update their value estimates to a lesser extent when the outcomes can be attributed to hidden causes. The present study examined whether external causes similarly influenced outcome attributions and learning across development. Ninety participants, ages 7 to 25 years, completed a reinforcement learning task in which they chose between two options with fixed reward probabilities. Choices were made in three distinct environments in which different hidden agents occasionally intervened to generate positive, negative, or random outcomes. Participants’ beliefs about hidden-agent intervention aligned well with the true probabilities of positive, negative, or random outcome manipulation in each of the three environments. Computational modeling of the learning data revealed that while the choices made by both adults (ages 18 - 25) and adolescents (ages 13 - 17) were best fit by Bayesian reinforcement learning models that incorporate beliefs about hidden agent intervention, those of children (ages 7 - 12) were best fit by a one learning rate model that updates value estimates based on choice outcomes alone. Together, these results suggest that while children demonstrate explicit awareness of the causal structure of the task environment they do not implicitly use beliefs about the causal structure of the environment to guide reinforcement learning in the same manner as adolescents and adults.

Download Full-text

REINFORCEMENT LEARNING BASED ANTI-COLLISION ALGORITHM FOR RFID SYSTEMS

International Journal of Computing ◽

10.47839/ijc.18.2.1414 ◽

2019 ◽

pp. 155-168

Author(s):

Murukesan Loganathan ◽

Thennarasan Sabapathy ◽

Mohamed Elobaid Elshaikh ◽

Mohamed Nasrun Osman ◽

Rosemizi Abd Rahim ◽

...

Keyword(s):

Reinforcement Learning ◽

Radio Frequency Identification ◽

Energy Efficient ◽

Current Standard ◽

Tag Identification ◽

Rfid Systems ◽

Order Of Magnitude ◽

Control Message ◽

Frequency Identification ◽

Better Than

Efficient collision arbitration protocol facilitates fast tag identification in radio frequency identification (RFID) systems. EPCGlobal-Class1-Generation2 (EPC-C1G2) protocol is the current standard for collision arbitration in commercial RFID systems. However, the main drawback of this protocol is that it requires excessive message exchanges between tags and the reader for its operation. This wastes energy of the already resource-constrained RFID readers. Hence, in this work, reinforcement learning based anti-collision protocol (RL-DFSA) is proposed to address the energy efficient collision arbitration problem in the RFID system. The proposed algorithm continuously learns and adapts to the changes in the environment by devising an optimal policy. The proposed RL-DFSA was evaluated through extensive simulations and compared with the variants of EPC-C1G2 algorithms that are currently being used in the commercial readers. Based on the results, it is concluded that RL-DFSA performs equal or better than EPC-C1G2 protocol in delay, throughput and time system efficiency when simulated for sparse and dense environments while requiring one order of magnitude lesser control message exchanges between the reader and the tags.

Download Full-text

Intention-Outcome Asymmetry Effect

10.31234/osf.io/4rfmc ◽

2019 ◽

Author(s):

Arunima Sarin ◽

David Lagnado ◽

Paul Burgess

Keyword(s):

Moral Judgment ◽

Moral Agent ◽

Positive Outcomes ◽

Negative Outcomes ◽

Asymmetry Effect

Knowledge of intention and outcome is integral to making judgments of responsibility, blame, and causality. Yet, little is known about the effect of conflicting intentions and outcomes on these judgments. In a series of four experiments, we combine good and bad intentions with positive and negative outcomes, presenting these through everyday moral scenarios. Our results demonstrate an asymmetry in responsibility, causality, and blame judgments for the two incongruent conditions: well-intentioned agents are regarded more morally and causally responsible for negative outcomes than ill-intentioned agents are held for positive outcomes. This novel effect of an intention-outcome asymmetry identifies an unexplored aspect of moral judgment and is partially explained by extra inferences that participants make about the actions of the moral agent.

Download Full-text

LEARNING TO COOPERATE IN SOLVING THE TRAVELING SALESMAN PROBLEM

International Journal of Neural Systems ◽

10.1142/s0129065705000153 ◽

2005 ◽

Vol 15 (01n02) ◽

pp. 151-162 ◽

Cited By ~ 4

Author(s):

DEHU QI ◽

RON SUN

Keyword(s):

Reinforcement Learning ◽

Traveling Salesman Problem ◽

Single Agent ◽

Traveling Salesman ◽

Cooperative Team ◽

The Traveling Salesman Problem ◽

Better Than

A cooperative team of agents may perform many tasks better than single agents. The question is how cooperation among self-interested agents should be achieved. It is important that, while we encourage cooperation among agents in a team, we maintain autonomy of individual agents as much as possible, so as to maintain flexibility and generality. This paper presents an approach based on bidding utilizing reinforcement values acquired through reinforcement learning. We tested and analyzed this approach and demonstrated that a team indeed performed better than the best single agent as well as the average of single agents.

Download Full-text