Has Dynamic Programming Improved Decision Making?

2019 ◽  
Vol 11 (1) ◽  
pp. 833-858 ◽  
Author(s):  
John Rust

Dynamic programming (DP) is a powerful tool for solving a wide class of sequential decision-making problems under uncertainty. In principle, it enables us to compute optimal decision rules that specify the best possible decision in any situation. This article reviews developments in DP and contrasts its revolutionary impact on economics, operations research, engineering, and artificial intelligence with the comparative paucity of its real-world applications to improve the decision making of individuals and firms. The fuzziness of many real-world decision problems and the difficulty in mathematically modeling them are key obstacles to a wider application of DP in real-world settings. Nevertheless, I discuss several success stories, and I conclude that DP offers substantial promise for improving decision making if we let go of the empirically untenable assumption of unbounded rationality and confront the challenging decision problems faced every day by individuals and firms.

2020 ◽  
Vol 68 ◽  
pp. 311-364
Author(s):  
Francesco Trovo ◽  
Stefano Paladino ◽  
Marcello Restelli ◽  
Nicola Gatti

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009633
Author(s):  
Yeonju Sin ◽  
HeeYoung Seon ◽  
Yun Kyoung Shin ◽  
Oh-Sang Kwon ◽  
Dongil Chung

Many decisions in life are sequential and constrained by a time window. Although mathematically derived optimal solutions exist, it has been reported that humans often deviate from making optimal choices. Here, we used a secretary problem, a classic example of finite sequential decision-making, and investigated the mechanisms underlying individuals’ suboptimal choices. Across three independent experiments, we found that a dynamic programming model comprising subjective value function explains individuals’ deviations from optimality and predicts the choice behaviors under fewer and more opportunities. We further identified that pupil dilation reflected the levels of decision difficulty and subsequent choices to accept or reject the stimulus at each opportunity. The value sensitivity, a model-based estimate that characterizes each individual’s subjective valuation, correlated with the extent to which individuals’ physiological responses tracked stimuli information. Our results provide model-based and physiological evidence for subjective valuation in finite sequential decision-making, rediscovering human suboptimality in subjectively optimal decision-making processes.


2010 ◽  
Vol 09 (06) ◽  
pp. 873-888 ◽  
Author(s):  
TZUNG-PEI HONG ◽  
CHING-YAO WANG ◽  
CHUN-WEI LIN

Mining knowledge from large databases has become a critical task for organizations. Managers commonly use the obtained sequential patterns to make decisions. In the past, databases were usually assumed to be static. In real-world applications, however, transactions may be updated. In this paper, a maintenance algorithm for rapidly updating sequential patterns for real-time decision making is proposed. The proposed algorithm utilizes previously discovered large sequences in the maintenance process, thus greatly reducing the number of database rescans and improving performance. Experimental results verify the performance of the proposed approach. The proposed algorithm provides real-time knowledge that can be used for decision making.


2016 ◽  
Vol 15 (06) ◽  
pp. 1503-1519 ◽  
Author(s):  
R. A. Aliev ◽  
O. H. Huseynov ◽  
R. Serdaroglu

Real-world decision problems in decision analysis, system analysis, economics, ecology, and other fields are characterized by fuzziness and partial reliability of relevant information. In order to deal with such information, Prof. Zadeh suggested the concept of a Z-number as an ordered pair [Formula: see text] of fuzzy numbers [Formula: see text] and [Formula: see text], the first of which is a linguistic value of a variable of interest, and the second one is a linguistic value of probability measure of the first one, playing a role of reliability of information. Decision making under Z-number based information requires ranking of Z-numbers. In this paper we suggest a human-like fundamental approach for ranking of Z-numbers which is based on two main ideas. One idea is to compute optimality degrees of Z-numbers and the other one is to adjust the obtained degrees by using a human being’s opinion formalized by a degree of pessimism. Two examples and a real-world application are provided to show validity of the suggested research. A comparison of the proposed approach with the existing methods is conducted.


1991 ◽  
Vol 20 (1) ◽  
pp. 15-23 ◽  
Author(s):  
Carolyn R. Harper

The use of chemical pesticides frequently causes minor pests to become serious problems by disturbing the natural controls that keep them in check. As a result, it is possible to suffer heavier crop losses after pesticides are introduced than before their introduction. Efficient use of pesticides requires complete biological modeling that takes the appropriate predator-prey relationships into account. A bioeconomic model is introduced involving three key species: a primary target pest, a secondary pest, and a natural enemy of the secondary pest. Optimal decision rules are derived and contrasted with myopic decision making, which treats the predator-prey system as an externality. The issue of resistance in the secondary pest is examined briefly.


2001 ◽  
Vol 5 (1) ◽  
pp. 47-59
Author(s):  
Omar Ben-Ayed

Operations Research techniques are usually presented as distinct models. Difficult as it may often be, achieving linkage between these models could reveal their interdependency and make them easier for the user to understand. In this article three different models, namely Markov Chain, Dynamic Programming, and Markov Sequential Decision Processes, are used to solve an inventory problem based on the periodic review system. We show how the three models converge to the same (s,S) policy and we provide a numerical example to illustrate such a convergence.


2017 ◽  
Author(s):  
Michael Veale

Presented as a talk at the 4th Workshop on Fairness, Accountability and Transparency in Machine Learning (FAT/ML 2017), Halifax, Nova Scotia, Canada.Machine learning systems are increasingly used to support public sector decision-making across a variety of sectors. Given concerns around accountability in these domains, and amidst accusations of intentional or unintentional bias, there have been increased calls for transparency of these technologies. Few, however, have considered how logics and practices concerning transparency have been understood by those involved in the machine learning systems already being piloted and deployed in public bodies today. This short paper distils insights about transparency on the ground from interviews with 27 such actors, largely public servants and relevant contractors, across 5 OECD countries. Considering transparency and opacity in relation to trust and buy-in, better decision-making, and the avoidance of gaming, it seeks to provide useful insights for those hoping to develop socio-technical approaches to transparency that might be useful to practitioners on-the-ground.


Sign in / Sign up

Export Citation Format

Share Document