sequential decision
Recently Published Documents


TOTAL DOCUMENTS

782
(FIVE YEARS 234)

H-INDEX

40
(FIVE YEARS 5)

2023 ◽  
Vol 55 (1) ◽  
pp. 1-36
Author(s):  
Chao Yu ◽  
Jiming Liu ◽  
Shamim Nemati ◽  
Guosheng Yin

As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision making by using interaction samples of an agent with its environment and the potentially delayed feedbacks. In contrast to traditional supervised learning that typically relies on one-shot, exhaustive, and supervised reward signals, RL tackles sequential decision-making problems with sampled, evaluative, and delayed feedbacks simultaneously. Such a distinctive feature makes RL techniques a suitable candidate for developing powerful solutions in various healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged period with delayed feedbacks. By first briefly examining theoretical foundations and key methods in RL research, this survey provides an extensive overview of RL applications in a variety of healthcare domains, ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis, and many other control or scheduling problems that have infiltrated every aspect of the healthcare system. In addition, we discuss the challenges and open issues in the current research and highlight some potential solutions and directions for future research.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 196
Author(s):  
Zhenshan Zhu ◽  
Zhimin Weng ◽  
Hailin Zheng

Microgrid with hydrogen storage is an effective way to integrate renewable energy and reduce carbon emissions. This paper proposes an optimal operation method for a microgrid with hydrogen storage. The electrolyzer efficiency characteristic model is established based on the linear interpolation method. The optimal operation model of microgrid is incorporated with the electrolyzer efficiency characteristic model. The sequential decision-making problem of the optimal operation of microgrid is solved by a deep deterministic policy gradient algorithm. Simulation results show that the proposed method can reduce about 5% of the operation cost of the microgrid compared with traditional algorithms and has a certain generalization capability.


2022 ◽  
Author(s):  
Shaozhe Cheng ◽  
Ning Tang ◽  
Yang Zhao ◽  
Jifan Zhou ◽  
mowed shen ◽  
...  

It is an ancient insight that human actions are driven by desires. This insight inspired the formulation that a rational agent acts to maximize expected utility (MEU), which has been widely used in psychology for modeling theory of mind and in artificial intelligence (AI) for controlling machines’ actions. Yet, it's rather unclear how humans act coherently when their desires are complex and often conflicting with each other. Here we show desires do not directly control human actions. Instead, actions are regulated by an intention — a deliberate mental state that commits to a fixed future rather than taking the expected utilities of many futures evaluated by many desires. Our study reveals four behavioral signatures of human intention by demonstrating how human sequential decision-making deviates from the optimal policy based on MEU in a navigation task: “Disruption resistance” as the persistent pursuit of an original intention despite an unexpected change has made that intention suboptimal; “Ulysses-constraint of freedom” as the proactive constraint of one’s freedom by avoiding a path that could lead to many futures, similar to Ulysses’s self-binding to resist the temptation of the Siren’s song; “Enhanced legibility” as an active demonstration of intention by choosing a path whose destination can be promptly inferred by a third-party observer; “Temporal leap” as committing to a distant future even before reaching the proximal one. Our results showed how the philosophy of intention can lead to discoveries of human decision-making, which can also be empirically compared with AI algorithms. The findings showing that to define a theory of mind, intention should be highlighted as a distinctive mental state in between desires and actions, for quarantining conflicting desires from the execution of actions.


Mathematics ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 158
Author(s):  
Alexander Gnedin ◽  
Zakaria Derbazi

We introduce a betting game where the gambler aims to guess the last success epoch in a series of inhomogeneous Bernoulli trials paced randomly in time. At a given stage, the gambler may bet on either the event that no further successes occur, or the event that exactly one success is yet to occur, or may choose any proper range of future times (a trap). When a trap is chosen, the gambler wins if the last success epoch is the only one that falls in the trap. The game is closely related to the sequential decision problem of maximising the probability of stopping on the last success. We use this connection to analyse the best-choice problem with random arrivals generated by a Pólya-Lundberg process.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Tian Wang ◽  
Yangyang Liang ◽  
Zhong Zheng

PurposeThe purpose of this paper is to investigate manufacturer encroachment and distributor encroachment in a three-echelon supply chain consisting of an upside manufacturer, an intermediate distributor and a downside retailer.Design/methodology/approachIn this paper, the authors use the optimization theory to mathematize the proposed question and build a model. First, the authors consider sequential quantity decisions, where the encroacher decides on the direct selling quantity after determining the retailer's order quantity. Second, the authors relax this sequential decision process assumption by reconsidering a circumstance in which quantity decisions are decided simultaneously.FindingsIn contrast to previous studies, this study shows that in three-echelon supply chains, the upside firm is more likely to encroach compared with the downside firm. The “bright side” of encroachment exists for all players only when the encroachment cost is at a moderate level. However, in manufacturer encroachment under simultaneous quantity decisions, the “bright side” skips the distributor but benefits the retailer directly as the encroachment cost increases from zero to a certain level. The main reason lies in that the distributor loses its pricing power because the end-market has been disturbed by the simultaneous quantity decisions. A comparison of the results of sequential and simultaneous quantity decisions reveals the merit of simultaneous quantity decisions. The authors find that the intermediate role (the distributor in our model) in three-echelon supply chains may benefit more from simultaneous quantity decisions. That is, the distributor may achieve a better profit even in a market with intensified competition.Originality/valueThe findings of this paper contribute to the marketing science literature on encroachment. The majority of existing literature has focused on manufacturer encroachment in two-echelon supply chains. This paper innovatively investigates and compares manufacturer encroachment and distributor encroachment in a three-echelon supply chain.


2021 ◽  
Vol 12 (6) ◽  
pp. 1-21
Author(s):  
Pengzhan Guo ◽  
Keli Xiao ◽  
Zeyang Ye ◽  
Wei Zhu

Vehicle mobility optimization in urban areas is a long-standing problem in smart city and spatial data analysis. Given the complex urban scenario and unpredictable social events, our work focuses on developing a mobile sequential recommendation system to maximize the profitability of vehicle service providers (e.g., taxi drivers). In particular, we treat the dynamic route optimization problem as a long-term sequential decision-making task. A reinforcement-learning framework is proposed to tackle this problem, by integrating a self-check mechanism and a deep neural network for customer pick-up point monitoring. To account for unexpected situations (e.g., the COVID-19 outbreak), our method is designed to be capable of handling related environment changes with a self-adaptive parameter determination mechanism. Based on the yellow taxi data in New York City and vicinity before and after the COVID-19 outbreak, we have conducted comprehensive experiments to evaluate the effectiveness of our method. The results show consistently excellent performance, from hourly to weekly measures, to support the superiority of our method over the state-of-the-art methods (i.e., with more than 98% improvement in terms of the profitability for taxi drivers).


2021 ◽  
Vol 11 (3-4) ◽  
pp. 1-35
Author(s):  
Jonathan Dodge ◽  
Roli Khanna ◽  
Jed Irvine ◽  
Kin-ho Lam ◽  
Theresa Mai ◽  
...  

Explainable AI is growing in importance as AI pervades modern society, but few have studied how explainable AI can directly support people trying to assess an AI agent. Without a rigorous process, people may approach assessment in ad hoc ways—leading to the possibility of wide variations in assessment of the same agent due only to variations in their processes. AAR, or After-Action Review, is a method some military organizations use to assess human agents, and it has been validated in many domains. Drawing upon this strategy, we derived an After-Action Review for AI (AAR/AI), to organize ways people assess reinforcement learning agents in a sequential decision-making environment. We then investigated what AAR/AI brought to human assessors in two qualitative studies. The first investigated AAR/AI to gather formative information, and the second built upon the results, and also varied the type of explanation (model-free vs. model-based) used in the AAR/AI process. Among the results were the following: (1) participants reporting that AAR/AI helped to organize their thoughts and think logically about the agent, (2) AAR/AI encouraged participants to reason about the agent from a wide range of perspectives , and (3) participants were able to leverage AAR/AI with the model-based explanations to falsify the agent’s predictions.


Author(s):  
Vedang Naik ◽  
◽  
Rohit Sahoo ◽  
Sameer Mahajan ◽  
Saurabh Singh ◽  
...  

Reinforcement learning is an artificial intelligence paradigm that enables intelligent agents to accrue environmental incentives to get superior results. It is concerned with sequential decision-making problems which offer limited feedback. Reinforcement learning has roots in cybernetics and research in statistics, psychology, neurology, and computer science. It has piqued the interest of the machine learning and artificial intelligence groups in the last five to ten years. It promises that it allows you to train agents using rewards and penalties without explaining how the task will be completed. The RL issue may be described as an agent that must make decisions in a given environment to maximize a specified concept of cumulative rewards. The learner is not taught which actions to perform but must experiment to determine which acts provide the greatest reward. Thus, the learner has to actively choose between exploring its environment or exploiting it based on its knowledge. The exploration-exploitation paradox is one of the most common issues encountered while dealing with Reinforcement Learning algorithms. Deep reinforcement learning is the combination of reinforcement learning (RL) and deep learning. We describe how to utilize several deep reinforcement learning (RL) algorithms for managing a Cartpole system used to represent episodic environments and Stock Market Trading, which is used to describe continuous environments in this study. We explain and demonstrate the effects of different RL ideas such as Deep Q Networks (DQN), Double DQN, and Dueling DQN on learning performance. We also look at the fundamental distinctions between episodic and continuous activities and how the exploration-exploitation issue is addressed in their context.


Sign in / Sign up

Export Citation Format

Share Document