A Real-Time Computational Learning Model for Sequential Decision-Making Problems Under Uncertainty

Author(s):  
Andreas A. Malikopoulos ◽  
Panos Y. Papalambros ◽  
Dennis N. Assanis

Modeling dynamic systems incurring stochastic disturbances for deriving a control policy is a ubiquitous task in engineering. However, in some instances obtaining a model of a system may be impractical or impossible. Alternative approaches have been developed using a simulation-based stochastic framework, in which the system interacts with its environment in real time and obtains information that can be processed to produce an optimal control policy. In this context, the problem of developing a policy for controlling the system’s behavior is formulated as a sequential decision-making problem under uncertainty. This paper considers the problem of deriving a control policy for a dynamic system with unknown dynamics in real time, formulated as a sequential decision-making under uncertainty. The evolution of the system is modeled as a controlled Markov chain. A new state-space representation model and a learning mechanism are proposed that can be used to improve system performance over time. The major difference between the existing methods and the proposed learning model is that the latter utilizes an evaluation function, which considers the expected cost that can be achieved by state transitions forward in time. The model allows decision-making based on gradually enhanced knowledge of system response as it transitions from one state to another, in conjunction with actions taken at each state. The proposed model is demonstrated on the single cart-pole balancing problem and a vehicle cruise-control problem.

Author(s):  
Andreas A. Malikopoulos ◽  
Panos Y. Papalambros ◽  
Dennis N. Assanis

Modeling dynamic systems incurring stochastic disturbances for deriving a control policy is a ubiquitous task in engineering. However, in some instances obtaining a model of a system may be impractical or impossible. Alternative approaches have been developed using a simulation-based stochastic framework, in which the system interacts with its environment in real time and obtains information that can be processed to produce an optimal control policy. In this context, the problem of developing a policy for controlling the system’s behavior is formulated as a sequential decision-making problem under uncertainty. This paper considers real-time sequential decision-making under uncertainty modeled as a Markov Decision Process (MDP). A state-space representation model is constructed through a learning mechanism and is used to improve system performance over time. The model allows decision making based on gradually enhanced knowledge of system response as it transitions from one state to another, in conjunction with actions taken at each state. A learning algorithm is implemented realizing in real time the optimal control policy associated with the state transitions. The proposed method is demonstrated on the single cart-pole balancing problem and a vehicle cruise control problem.


2017 ◽  
Vol 29 (12) ◽  
pp. 2103-2113 ◽  
Author(s):  
Samuel J. Gershman ◽  
Jimmy Zhou ◽  
Cody Kommers

Imagination enables us not only to transcend reality but also to learn about it. In the context of reinforcement learning, an agent can rationally update its value estimates by simulating an internal model of the environment, provided that the model is accurate. In a series of sequential decision-making experiments, we investigated the impact of imaginative simulation on subsequent decisions. We found that imagination can cause people to pursue imagined paths, even when these paths are suboptimal. This bias is systematically related to participants' optimism about how much reward they expect to receive along imagined paths; providing feedback strongly attenuates the effect. The imagination effect can be captured by a reinforcement learning model that includes a bonus added onto imagined rewards. Using fMRI, we show that a network of regions associated with valuation is predictive of the imagination effect. These results suggest that imagination, although a powerful tool for learning, is also susceptible to motivational biases.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Sign in / Sign up

Export Citation Format

Share Document