scholarly journals Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation

2019 ◽  
Author(s):  
Benjamin James Dyson ◽  
Ben Albert Steward ◽  
Tea Meneghetti ◽  
Lewis Forder

AbstractTo understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to earlier indices of visual attention (N1), later indices of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n=36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n=36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 – FRN – P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 631
Author(s):  
Chunyang Hu

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.



2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Payam Piray ◽  
Nathaniel D. Daw

AbstractIt is thought that the brain’s judicious reuse of previous computation underlies our ability to plan flexibly, but also that inappropriate reuse gives rise to inflexibilities like habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, here we introduce a model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a default policy. This solution demonstrates connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also provides insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.



Author(s):  
Akshat Kumar

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.



2020 ◽  
Vol 34 (04) ◽  
pp. 6811-6820
Author(s):  
Ruohan Zhang ◽  
Calen Walshe ◽  
Zhuode Liu ◽  
Lin Guan ◽  
Karl Muller ◽  
...  

Large-scale public datasets have been shown to benefit research in multiple areas of modern artificial intelligence. For decision-making research that requires human data, high-quality datasets serve as important benchmarks to facilitate the development of new methods by providing a common reproducible standard. Many human decision-making tasks require visual attention to obtain high levels of performance. Therefore, measuring eye movements can provide a rich source of information about the strategies that humans use to solve decision-making tasks. Here, we provide a large-scale, high-quality dataset of human actions with simultaneously recorded eye movements while humans play Atari video games. The dataset consists of 117 hours of gameplay data from a diverse set of 20 games, with 8 million action demonstrations and 328 million gaze samples. We introduce a novel form of gameplay, in which the human plays in a semi-frame-by-frame manner. This leads to near-optimal game decisions and game scores that are comparable or better than known human records. We demonstrate the usefulness of the dataset through two simple applications: predicting human gaze and imitating human demonstrated actions. The quality of the data leads to promising results in both tasks. Moreover, using a learned human gaze model to inform imitation learning leads to an 115% increase in game performance. We interpret these results as highlighting the importance of incorporating human visual attention in models of decision making and demonstrating the value of the current dataset to the research community. We hope that the scale and quality of this dataset can provide more opportunities to researchers in the areas of visual attention, imitation learning, and reinforcement learning.



2019 ◽  
Author(s):  
Payam Piray ◽  
Nathaniel D. Daw

AbstractIt is thought that the brain’s judicious allocation and reuse of computation underlies our ability to plan flexibly, but also failures to do so as in habits and compulsion. Yet we lack a complete, realistic account of either. Building on control engineering, we introduce a new model for decision making in the brain that reuses a temporally abstracted map of future events to enable biologically-realistic, flexible choice at the expense of specific, quantifiable biases. It replaces the classic nonlinear, model-based optimization with a linear approximation that softly maximizes around (and is weakly biased toward) a learned default policy. This solution exposes connections between seemingly disparate phenomena across behavioral neuroscience, notably flexible replanning with biases and cognitive control. It also gives new insight into how the brain can represent maps of long-distance contingencies stably and componentially, as in entorhinal response fields, and exploit them to guide choice even under changing goals.



2018 ◽  
Author(s):  
Maël Lebreton ◽  
Karin Bacily ◽  
Stefano Palminteri ◽  
Jan B. Engelmann

AbstractThe ability to correctly estimate the probability of one’s choices being correct is fundamental to optimally re-evaluate previous choices or to arbitrate between different decision strategies. Experimental evidence nonetheless suggests that this metacognitive process -referred to as a confidence judgment-is susceptible to numerous biases. We investigate the effect of outcome valence (gains or losses) on confidence while participants learned stimulus-outcome associations by trial-and-error. In two experiments, we demonstrate that participants are more confident in their choices when learning to seek gains compared to avoiding losses. Importantly, these differences in confidence were observed despite objectively equal choice difficulty and similar observed performance between those two contexts. Using computational modelling, we show that this bias is driven by the context-value, a dynamically updated estimate of the average expected-value of choice options that has previously been demonstrated to be necessary to explain equal performance in the gain and loss domain. The biasing effect of context-value on confidence, also recently observed in the context of incentivized perceptual decision-making, is therefore domain-general, with likely important functional consequences.



2021 ◽  
Author(s):  
Jinzhi Liao ◽  
Xiang Zhao ◽  
Jiuyang Tang ◽  
Weixin Zeng ◽  
Zhen Tan

AbstractWith the proliferation of large-scale knowledge graphs (KGs), multi-hop knowledge graph reasoning has been a capstone that enables machines to be able to handle intelligent tasks, especially where some explicit reasoning path is appreciated for decision making. To train a KG reasoner, supervised learning-based methods suffer from false-negative issues, i.e., unseen paths during training are not to be found in prediction; in contrast, reinforcement learning (RL)-based methods do not require labeled paths, and can explore to cover many appropriate reasoning paths. In this connection, efforts have been dedicated to investigating several RL formulations for multi-hop KG reasoning. Particularly, current RL-based methods generate rewards at the very end of the reasoning process, due to which short paths of hops less than a given threshold are likely to be overlooked, and the overall performance is impaired. To address the problem, we propose , a revised RL formulation of multi-hop KG reasoning that is characterized by two novel designs—the stop signal and the worth-trying signal. The stop signal instructs the agent of RL to stay at the entity after finding the answer, preventing from hopping further even if the threshold is not reached; meanwhile, the worth-trying signal encourages the agent to try to learn some partial patterns from the paths that fail to lead to the answer. To validate the design of our model , comprehensive experiments are carried out on three benchmark knowledge graphs, and the results and analysis suggest the superiority of over state-of-the-art methods.





Sign in / Sign up

Export Citation Format

Share Document