Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-off: Evidence for the Adaptive Gain Theory

2011 ◽  
Vol 23 (7) ◽  
pp. 1587-1596 ◽  
Author(s):  
Marieke Jepma ◽  
Sander Nieuwenhuis

The adaptive regulation of the balance between exploitation and exploration is critical for the optimization of behavioral performance. Animal research and computational modeling have suggested that changes in exploitative versus exploratory control state in response to changes in task utility are mediated by the neuromodulatory locus coeruleus–norepinephrine (LC–NE) system. Recent studies have suggested that utility-driven changes in control state correlate with pupil diameter, and that pupil diameter can be used as an indirect marker of LC activity. We measured participants' pupil diameter while they performed a gambling task with a gradually changing payoff structure. Each choice in this task can be classified as exploitative or exploratory using a computational model of reinforcement learning. We examined the relationship between pupil diameter, task utility, and choice strategy (exploitation vs. exploration), and found that (i) exploratory choices were preceded by a larger baseline pupil diameter than exploitative choices; (ii) individual differences in baseline pupil diameter were predictive of an individual's tendency to explore; and (iii) changes in pupil diameter surrounding the transition between exploitative and exploratory choices correlated with changes in task utility. These findings provide novel evidence that pupil diameter correlates closely with control state, and are consistent with a role for the LC–NE system in the regulation of the exploration–exploitation trade-off in humans.

2020 ◽  
Vol 10 (21) ◽  
pp. 7462
Author(s):  
Jesús Enrique Sierra-García ◽  
Matilde Santos

In this work, a pitch controller of a wind turbine (WT) inspired by reinforcement learning (RL) is designed and implemented. The control system consists of a state estimator, a reward strategy, a policy table, and a policy update algorithm. Novel reward strategies related to the energy deviation from the rated power are defined. They are designed to improve the efficiency of the WT. Two new categories of reward strategies are proposed: “only positive” (O-P) and “positive-negative” (P-N) rewards. The relationship of these categories with the exploration-exploitation dilemma, the use of ϵ-greedy methods and the learning convergence are also introduced and linked to the WT control problem. In addition, an extensive analysis of the influence of the different rewards in the controller performance and in the learning speed is carried out. The controller is compared with a proportional-integral-derivative (PID) regulator for the same small wind turbine, obtaining better results. The simulations show how the P-N rewards improve the performance of the controller, stabilize the output power around the rated power, and reduce the error over time.


2011 ◽  
Vol 23 (4) ◽  
pp. 923-935 ◽  
Author(s):  
Sarah E. Forster ◽  
Cameron S. Carter ◽  
Jonathan D. Cohen ◽  
Raymond Y. Cho

Mechanisms by which the brain monitors and modulates performance are an important focus of recent research. The conflict-monitoring hypothesis posits that the ACC detects conflict between competing response pathways which, in turn, signals for enhanced control. The N2, an ERP component that has been localized to ACC, has been observed after high conflict stimuli. As a candidate index of the conflict signal, the N2 would be expected to be sensitive to the degree of response conflict present, a factor that depends on both the features of external stimuli and the internal control state. In the present study, we sought to explore the relationship between N2 amplitude and these variables through use of a modified Eriksen flankers task in which target–distracter compatibility was parametrically varied. We hypothesized that greater target–distracter incompatibility would result in higher levels of response conflict, as indexed by both behavior and the N2 component. Consistent with this prediction, there were parametric degradations in behavioral performance and increases in N2 amplitudes with increasing incompatibility. Further, increasingly incompatible stimuli led to the predicted parametric increases in control on subsequent incompatible trials as evidenced by enhanced performance and reduced N2 amplitudes. These findings suggest that the N2 component and associated behavioral performance are finely sensitive to the degree of response conflict present and to the control adjustments that result from modulations in conflict.


Decision ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 115-131 ◽  
Author(s):  
Helen Steingroever ◽  
Ruud Wetzels ◽  
Eric-Jan Wagenmakers

Author(s):  
Tian Wu ◽  
Danyan Hu ◽  
Qingfen Wang

Abstract Background Noni (Morinda citrifolia Linn.) is a tropical tree that bears climacteric fruit. Previous observations and research have shown that the second day (2 d) after harvest is the most important demarcation point when the fruit has the same appearance as the freshly picked fruit (0 d); however, they are beginning to become water spot appearance. We performed a conjoint analysis of metabolome and transcriptome data for noni fruit of 0 d and 2 d to reveal what happened to the fruit at the molecular level. Genes and metabolites were annotated to KEGG pathways and the co-annotated KEGG pathways were used as a statistical analysis. Results We found 25 pathways that were significantly altered at both metabolic and transcriptional levels, including a total of 285 differentially expressed genes (DEGs) and 11 differential metabolites through an integrative analysis of transcriptomics and metabolomics. The energy metabolism and pathways originating from phenylalanine were disturbed the most. The upregulated resistance metabolites and genes implied the increase of resistance and energy consumption in the postharvest noni fruit. Most genes involved in glycolysis were downregulated, further limiting the available energy. This lack of energy led noni fruit to water spot appearance, a prelude to softening. The metabolites and genes related to the resistance and energy interacted and restricted each other to keep noni fruit seemingly hard within two days after harvest, but actually the softening was already unstoppable. Conclusions This study provides a new insight into the relationship between the metabolites and genes of noni fruit, as well as a foundation for further clarification of the post-ripening mechanism in noni fruit.


2021 ◽  
Vol 12 ◽  
Author(s):  
Marcel Schulze ◽  
David Coghill ◽  
Silke Lux ◽  
Alexandra Philipsen

Background: Deficient decision-making (DM) in attention deficit/hyperactivity disorder (ADHD) is marked by altered reward sensitivity, higher risk taking, and aberrant reinforcement learning. Previous meta-analysis aggregate findings for the ADHD combined presentation (ADHD-C) mostly, while the ADHD predominantly inattentive presentation (ADHD-I) and the predominantly hyperactive/impulsive presentation (ADHD-H) were not disentangled. The objectives of the current meta-analysis were to aggregate findings from DM for each presentation separately.Methods: A comprehensive literature search of the PubMed (Medline) and Web of Science Database took place using the keywords “ADHD,” “attention-deficit/hyperactivity disorder,” “decision-making,” “risk-taking,” “reinforcement learning,” and “risky.” Random-effects models based on correlational effect-sizes were conducted. Heterogeneity analysis and sensitivity/outlier analysis were performed, and publication biases were assessed with funnel-plots and the egger intercept.Results: Of 1,240 candidate articles, seven fulfilled criteria for analysis of ADHD-C (N = 193), seven for ADHD-I (N = 256), and eight for ADHD-H (N = 231). Moderate effect-size were found for ADHD-C (r = 0.34; p = 0.0001; 95% CI = [0.19, 0.49]). Small effect-sizes were found for ADHD-I (r = 0.09; p = 0.0001; 95% CI = [0.008, 0.25]) and for ADHD-H (r = 0.1; p = 0.0001; 95% CI = [−0.012, 0.32]). Heterogeneity was moderate for ADHD-H. Sensitivity analyses show robustness of the analysis, and no outliers were detected. No publication bias was evident.Conclusion: This is the first study that uses a meta-analytic approach to investigate the relationship between the different presentations of ADHD separately. These findings provide first evidence of lesser pronounced impairment in DM for ADHD-I and ADHD-I compared to ADHD-C. While the exact factors remain elusive, the current study can be considered as a starting point to reveal the relationship of ADHD presentations and DM more detailed.


Sign in / Sign up

Export Citation Format

Share Document