scholarly journals Peril, Prudence and Planning as Risk, Avoidance and Worry

2021 ◽  
Author(s):  
Christopher Gagne ◽  
Peter Dayan

Risk occupies a central role in both the theory and practice of decision-making. Although it is deeply implicated in many conditions involving dysfunctional behavior and thought, modern theoretical approaches to understanding and mitigating risk, in either one-shot or sequential settings, have yet to permeate fully the fields of neural reinforcement learning and computational psychiatry. Here we use one prominent approach, called conditional value-at-risk (CVaR), to examine optimal risk-sensitive choice and one form of optimal, risk-sensitive offline planning. We relate the former to both a justified form of the gambler’s fallacy and extremely risk-avoidant behavior resembling that observed in anxiety disorders. We relate the latter to worry and rumination.

2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yuwei Wang ◽  
Jingmin Wang ◽  
Wei Sun ◽  
Mingrui Zhao

Bidding in spot electricity market (EM) is a key source for electricity retailer (ER)’s power purchasing. In China for the near future, besides the real-time load and spot clearing prices uncertainties, it will be hard for a newborn ER to adjust its retail prices at will due to the strict governmental supervision. Hence, spot EM bidding decision-making is a very complicated and important issue for ER in many countries including China. In this paper, an inner-outer 2-layer model system based on stochastic mixed-integer optimization is proposed for ER’s day-ahead EM bidding decision-making. This model system not only can help to make ERs more beneficial under China’s EM circumstances in the near future, but also can be applied for improving their profits under many other deregulated EM circumstances (e.g., PJM and Nord Pool) if slight transformation is implemented. Different from many existing researches, we pursue optimizing both the number of blocks in ER’s day-ahead piecewise staircase (energy-price) bidding curves and the bidding price of every block. Specifically, the inner layer of this system is in fact a stochastic mixed-integer optimization model, by which the bidding prices are optimized by parameterizing the number of blocks in bidding curves. The outer layer of this system implicitly possesses the characteristics of heuristic optimization in discrete space, by which the number of blocks is optimized by parameterizing bidding prices in bidding curves. Moreover, in order to maintain relatively low financial-risk brought by clearing prices and real-time load uncertainties, we introduce the conditional value at risk (CVaR) of profit in the objective function of inner layer model in addition to the expected profit. Simulations based on historical data have not only tested the scientificity and feasibility of our model system, but also verified that our model system can further improve the actual profit of ER compared to other methods.


Author(s):  
Margaret P Chapman ◽  
Riccardo Bonalli ◽  
Kevin M. Smith ◽  
Insoon Yang ◽  
Marco Pavone ◽  
...  

Author(s):  
Shuai Ma ◽  
Jia Yuan Yu

In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SATs), which preserve the reward sequences as well as the reward distributions and the optimal policy in risk-sensitive reinforcement learning. In risk-sensitive scenarios, firstly we prove that, for every MDP with a stochastic transition-based reward function, there exists an MDP with a deterministic state-based reward function, such that for any given (randomized) policy for the first MDP, there exists a corresponding policy for the second MDP, such that both Markov reward processes share the same reward sequence. Secondly we illustrate that two situations require the proposed SATs in an inventory control problem. One could be using Q-learning (or other learning methods) on MDPs with transition-based reward functions, and the other could be using methods, which are for the Markov processes with a deterministic state-based reward functions, on the Markov processes with general reward functions. We show the advantage of the SATs by considering Value-at-Risk as an example, which is a risk measure on the reward distribution instead of the measures (such as mean and variance) of the distribution. We illustrate the error in the reward distribution estimation from the reward simplification, and show how the SATs enable a variance formula to work on Markov processes with general reward functions.


Computers ◽  
2018 ◽  
Vol 7 (4) ◽  
pp. 57 ◽  
Author(s):  
Chanchal Kumar ◽  
Mohammad Najmud Doja

This paper proposes a novel framework for solving the portfolio selection problem. This framework is excogitated using two newly parameters obtained from an existing basic mean variance model. The scheme can prove entirely advantageous for decision-making while using computed values of these significant parameters. The framework combines effectiveness of the mean-variance model and another significant parameter called Conditional-Value-at-Risk (CVaR). It focuses on extracting two newly parameters viz. αnew and βnew, which are demarcated from results obtained from mean-variance model and the value of CVaR. The method intends to minimize the overall cost, which is computed in the framework using quadratic equations involving these newly parameters. The new structure of ANFIS is designed by changing existing structure of ANFIS and this new structure contains six layers instead of existing five-layered structure. Fuzzy sets are harnessed for the design of the second layer of this new ANFIS structure. The output parameter acquired from the sixth layer of the new ANFIS structure serves as an important index for an investor in the decision-making. The numerical results acquired from the framework and the new six-layered structure is presented and these results are assimilated and compared with the results of the existing ANFIS structure.


2021 ◽  
Vol 8 ◽  
Author(s):  
Chen Yu ◽  
Andre Rosendo

Model-Based Reinforcement Learning (MBRL) algorithms have been shown to have an advantage on data-efficiency, but often overshadowed by state-of-the-art model-free methods in performance, especially when facing high-dimensional and complex problems. In this work, a novel MBRL method is proposed, called Risk-Aware Model-Based Control (RAMCO). It combines uncertainty-aware deep dynamics models and the risk assessment technique Conditional Value at Risk (CVaR). This mechanism is appropriate for real-world application since it takes epistemic risk into consideration. In addition, we use a model-free solver to produce warm-up training data, and this setting improves the performance in low-dimensional environments and covers the shortage of MBRL’s nature in the high-dimensional scenarios. In comparison with other state-of-the-art reinforcement learning algorithms, we show that it produces superior results on a walking robot model. We also evaluate the method with an Eidos environment, which is a novel experimental method with multi-dimensional randomly initialized deep neural networks to measure the performance of any reinforcement learning algorithm, and the advantages of RAMCO are highlighted.


2014 ◽  
Vol 16 (6) ◽  
pp. 3-29 ◽  
Author(s):  
Samuel Drapeau ◽  
Michael Kupper ◽  
Antonis Papapantoleon

Sign in / Sign up

Export Citation Format

Share Document