sequential decision making
Recently Published Documents


TOTAL DOCUMENTS

298
(FIVE YEARS 83)

H-INDEX

26
(FIVE YEARS 4)

Author(s):  
Vedang Naik ◽  
◽  
Rohit Sahoo ◽  
Sameer Mahajan ◽  
Saurabh Singh ◽  
...  

Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009633
Author(s):  
Yeonju Sin ◽  
HeeYoung Seon ◽  
Yun Kyoung Shin ◽  
Oh-Sang Kwon ◽  
Dongil Chung

Many decisions in life are sequential and constrained by a time window. Although mathematically derived optimal solutions exist, it has been reported that humans often deviate from making optimal choices. Here, we used a secretary problem, a classic example of finite sequential decision-making, and investigated the mechanisms underlying individuals’ suboptimal choices. Across three independent experiments, we found that a dynamic programming model comprising subjective value function explains individuals’ deviations from optimality and predicts the choice behaviors under fewer and more opportunities. We further identified that pupil dilation reflected the levels of decision difficulty and subsequent choices to accept or reject the stimulus at each opportunity. The value sensitivity, a model-based estimate that characterizes each individual’s subjective valuation, correlated with the extent to which individuals’ physiological responses tracked stimuli information. Our results provide model-based and physiological evidence for subjective valuation in finite sequential decision-making, rediscovering human suboptimality in subjectively optimal decision-making processes.


2021 ◽  
Author(s):  
Vahid Azimirad ◽  
Mohammad Tayefe Ramezanlou ◽  
Saleh Valizadeh Sotubadi ◽  
Farrokh Janabi-Sharifi

Author(s):  
Maaike M.H. van Swieten ◽  
Rafal Bogacz ◽  
Sanjay G. Manohar

AbstractHuman decisions can be reflexive or planned, being governed respectively by model-free and model-based learning systems. These two systems might differ in their responsiveness to our needs. Hunger drives us to specifically seek food rewards, but here we ask whether it might have more general effects on these two decision systems. On one hand, the model-based system is often considered flexible and context-sensitive, and might therefore be modulated by metabolic needs. On the other hand, the model-free system’s primitive reinforcement mechanisms may have closer ties to biological drives. Here, we tested participants on a well-established two-stage sequential decision-making task that dissociates the contribution of model-based and model-free control. Hunger enhanced overall performance by increasing model-free control, without affecting model-based control. These results demonstrate a generalized effect of hunger on decision-making that enhances reliance on primitive reinforcement learning, which in some situations translates into adaptive benefits.


2021 ◽  
Author(s):  
Laura Fontanesi ◽  
Amitai Shenhav ◽  
Sebastian Gluth

Recent years have witnessed a surge of interest in understanding the neural and cognitive dynamics that drive sequential decision making in general and foraging behavior in particular. Due to the intrinsic properties of most sequential decision-making paradigms, however, previous research in this area has suffered from the difficulty to disentangle properties of the decision related to (a) the value of switching to a new patch versus (b) the conflict experienced between choosing to stay or leave. Here, we show how the same problems arise in studies of sequential decision-making under risk, and how they can be overcome, taking as a specific example recent research on the `pig' dice game. In each round of the `pig' dice game, people roll a die and accumulate rewards until they either decide to proceed to the next round or lose all rewards. By combining simulation-based dissections of the task structure with two experiments, we show how an extension of the standard paradigm, together with cognitive modeling of decision-making processes, disentangles value- from conflict-related choice properties. Our study elucidates the cognitive mechanisms of sequential decision making and underscores the importance of avoiding potential pitfalls of paradigms that are commonly used in this research area.


Algorithms ◽  
2021 ◽  
Vol 14 (10) ◽  
pp. 291
Author(s):  
Juri Hinz

In industrial applications, the processes of optimal sequential decision making are naturally formulated and optimized within a standard setting of Markov decision theory. In practice, however, decisions must be made under incomplete and uncertain information about parameters and transition probabilities. This situation occurs when a system may suffer a regime switch changing not only the transition probabilities but also the control costs. After such an event, the effect of the actions may turn to the opposite, meaning that all strategies must be revised. Due to practical importance of this problem, a variety of methods has been suggested, ranging from incorporating regime switches into Markov dynamics to numerous concepts addressing model uncertainty. In this work, we suggest a pragmatic and practical approach using a natural re-formulation of this problem as a so-called convex switching system, we make efficient numerical algorithms applicable.


2021 ◽  
Author(s):  
Amjad Yousef Majid ◽  
Serge Saaybi ◽  
Tomas van Rietbergen ◽  
Vincent Francois-Lavet ◽  
R Venkatesha Prasad ◽  
...  

<div>Deep Reinforcement Learning (DRL) and Evolution Strategies (ESs) have surpassed human-level control in many sequential decision-making problems, yet many open challenges still exist.</div><div>To get insights into the strengths and weaknesses of DRL versus ESs, an analysis of their respective capabilities and limitations is provided. </div><div>After presenting their fundamental concepts and algorithms, a comparison is provided on key aspects such as scalability, exploration, adaptation to dynamic environments, and multi-agent learning. </div><div>Then, the benefits of hybrid algorithms that combine concepts from DRL and ESs are highlighted. </div><div>Finally, to have an indication about how they compare in real-world applications, a survey of the literature for the set of applications they support is provided.</div>


Sign in / Sign up

Export Citation Format

Share Document