Daytime and season do not affect reinforcement learning capacity in a response time adjustment task

Author(s):  
Sina Kohne ◽  
Luise Reimers ◽  
Malika Müller ◽  
Esther K. Diekhof
Author(s):  
Lin Lan ◽  
Zhenguo Li ◽  
Xiaohong Guan ◽  
Pinghui Wang

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.


2020 ◽  
Author(s):  
Ceyda Sayalı ◽  
David Badre

AbstractPeople balance the benefits of cognitive work against the costs of cognitive effort. Models that incorporate prospective estimates of the costs of cognitive effort into decision making require a mechanism by which these costs are learned. However, it remains open what brain systems are important for this learning, particularly when learning is not tied explicitly to a decision about what task to perform. In this fMRI experiment, we parametrically manipulated the level of effort a task requires by increasing task switching frequency across six task contexts. In a scanned learning phase, participants implicitly learned about the task switching frequency in each context. In a subsequent test phase outside the scanner, participants made selections between pairs of these task contexts. Notably, during learning, participants were not aware of this later choice phase. Nonetheless, participants avoided task contexts requiring more task switching. We modeled learning within a reinforcement learning framework, and found that effort expectations that derived from task-switching probability and response time (RT) during learning were the best predictors of later choice behavior. Interestingly, prediction errors (PE) from these two models were differentially associated with separate brain networks during distinct learning epochs. Specifically, PE derived from expected RT was most correlated with the cingulo-opercular network early in learning, whereas PE derived from expected task switching frequency was correlated with the fronto-parietal network late in learning. These observations are discussed in relation to the contribution of cognitive control systems to new task learning and how this may bear on effort-based decisions.Significance StatementOn a daily basis, we make decisions about cognitive effort expenditure. It has been argued that we avoid cognitively effortful tasks to the degree subjective costs outweigh the benefits of the task. Here, we investigate the brain systems that learn about task demands for use in later effort-based decisions. Using reinforcement learning models, we find that learning about both expected response time and task switching frequency affect later effort-based decisions and these are differentially tracked by distinct brain networks during different epochs of learning. The results indicate that more than one signal is used by the brain to associate effort costs with a given task.


Algorithms ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 23
Author(s):  
Markus Rabe ◽  
Majsa Ammouriova ◽  
Dominik Schmitt ◽  
Felix Dross

The distribution process in business-to-business materials trading is among the most complex and in transparent ones within logistics. The highly volatile environment requires continuous adaptations by the responsible decision-makers, who face a substantial number of potential improvement actions with conflicting goals, such as simultaneously maintaining a high service level and low costs. Simulation-optimisation approaches have been proposed in this context, for example based on evolutionary algorithms. But, on real-world system dimensions, they face impractically long computation times. This paper addresses this challenge in two principal streams. On the one hand, reinforcement learning is investigated to reduce the response time of the system in a concrete decision situation. On the other hand, domain-specific information and defining equivalent solutions are exploited to support a metaheuristic algorithm. For these approaches, we have developed suitable implementations and evaluated them with subsets of real-world data. The results demonstrate that reinforcement learning exploits the idle time between decision situations to learn which decisions might be most promising, thus adding computation time but significantly reducing the response time. Using domain-specific information reduces the number of required simulation runs and guides the search for promising actions. In our experimentation, defining equivalent solutions decreased the number of required simulation runs up to 15%.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-23
Author(s):  
Lei Yang ◽  
Xi Yu ◽  
Jiannong Cao ◽  
Xuxun Liu ◽  
Pan Zhou

Autonomous on-demand services, such as GOGOX (formerly GoGoVan) in Hong Kong, provide a platform for users to request services and for suppliers to meet such demands. In such a platform, the suppliers have autonomy to accept or reject the demands to be dispatched to him/her, so it is challenging to make an online matching between demands and suppliers. Existing methods use round-based approaches to dispatch demands. In these works, the dispatching decision is based on the predicted response patterns of suppliers to demands in the current round, but they all fail to consider the impact of future demands and suppliers on the current dispatching decision. This could lead to taking a suboptimal dispatching decision from the future perspective. To solve this problem, we propose a novel demand dispatching model using deep reinforcement learning. In this model, we make each demand as an agent. The action of each agent, i.e., the dispatching decision of each demand, is determined by a centralized algorithm in a coordinated way. The model works in the following two steps. (1) It learns the demand’s expected value in each spatiotemporal state using historical transition data. (2) Based on the learned values, it conducts a Many-To-Many dispatching using a combinatorial optimization algorithm by considering both immediate rewards and expected values of demands in the next round. In order to get a higher total reward, the demands with a high expected value (short response time) in the future may be delayed to the next round. On the contrary, the demands with a low expected value (long response time) in the future would be dispatched immediately. Through extensive experiments using real-world datasets, we show that the proposed model outperforms the existing models in terms of Cancellation Rate and Average Response Time.


Author(s):  
Roberto Limongi ◽  
Angélica M. Silva

Abstract. The Sternberg short-term memory scanning task has been used to unveil cognitive operations involved in time perception. Participants produce time intervals during the task, and the researcher explores how task performance affects interval production – where time estimation error is the dependent variable of interest. The perspective of predictive behavior regards time estimation error as a temporal prediction error (PE), an independent variable that controls cognition, behavior, and learning. Based on this perspective, we investigated whether temporal PEs affect short-term memory scanning. Participants performed temporal predictions while they maintained information in memory. Model inference revealed that PEs affected memory scanning response time independently of the memory-set size effect. We discuss the results within the context of formal and mechanistic models of short-term memory scanning and predictive coding, a Bayes-based theory of brain function. We state the hypothesis that our finding could be associated with weak frontostriatal connections and weak striatal activity.


Decision ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 115-131 ◽  
Author(s):  
Helen Steingroever ◽  
Ruud Wetzels ◽  
Eric-Jan Wagenmakers

2000 ◽  
Author(s):  
Michael Anthony ◽  
Robert W. Fuhrman
Keyword(s):  

Author(s):  
Hans Bergman ◽  
Albert Brinkman ◽  
Harry S. Koelega

Sign in / Sign up

Export Citation Format

Share Document