Testing a micro-genesis account of longer-form reinforcement learning (win-calmness and loss-restlessness)
Fundamental reinforcement learning principles such as win-stay and lose-shift represent outcome-action associations between consecutive trials (trial n-1 and n). Longer-form expressions of the tendency to continually repeat previous actions following positive outcomes, and the tendency to continually change previous actions following negative outcomes, have been identified as win-calmness and lose-restlessness, respectively. Across 10 experiments, we tested a micro-genesis account of these phenomena by examining sequential contingencies across trial n-2, n-1 and n using simple game spaces. At a group level, we found no evidence of win-calmness and lose-restlessness when wins could not be maximized (unexploitable opponent). Similarly, we found no evidence of win-calmness and lose-restlessness when the threat of win minimization was presented (exploiting opponent). In contrast, we found evidence of win-calmness (but not lose-restlessness) when win maximization was made possible (exploitable opponent). At a participant level, we confirm that individual win rates determined the degree of win-calmness and lose-restlessness only in contexts were win rates could be maximized. The data identify the mechanisms that allow for the development of longer-form reinforcement learning principles and demonstrate the relative flexibility in decision-space afforded by positive outcomes, and the relative inflexibility in decision-space following negative outcomes.