scholarly journals Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

2020 ◽  
Vol 34 (04) ◽  
pp. 4436-4443
Author(s):  
Ramtin Keramati ◽  
Christoph Dann ◽  
Alex Tamkin ◽  
Emma Brunskill

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.

Author(s):  
TUNCER ŞAKAR CEREN ◽  
MURAT KÖKSALAN

We study the effects of considering different criteria simultaneously on portfolio optimization. Using a single-period optimization setting, we use various combinations of expected return, variance, liquidity and Conditional Value at Risk criteria. With stocks from Borsa Istanbul, we make computational studies to show the effects of these criteria on objective and decision spaces. We also consider cardinality and weight constraints and study their effects on the results. In general, we observe that considering alternative criteria results in enlarged regions in the efficient frontier that may be of interest to the decision maker. We discuss the results of our experiments and provide insights.


2013 ◽  
Vol 732-733 ◽  
pp. 1438-1443
Author(s):  
Pin Jie Xie ◽  
Jian Chao Hou ◽  
Quan Sheng Shi

It is an urgent problem to solve for generation companies that how to find the optimal bidding strategy to obtain the highest profits and to decrease the risk to the lowest level. This paper presented a new bidding decision model with risk management for generation companies based on the conditional value at risk (CVaR). In the process of building the optimal bidding decision models, three situations are considered separately, including only consider maximize the expected return or the CVaR value of benefit, and considering the benefit and risk (CVaR). By this method, the generation companies can be determined the two decision variables of bidding price and bidding output to maximize its revenue at the same time to declare the risk.


2019 ◽  
Author(s):  
Denis Pais ◽  
Valdinei Freire ◽  
Karina Valdivia-Delgado

Processos de decisão Markovianos (Markov Decision Processes -- MDPs) são amplamente utilizados para resolver problemas de tomada de decisão sequencial. O critério de desempenho mais utilizado em MDPs é a minimização do custo total esperado. Porém, esta abordagem não leva em consideração flutuações em torno da média, o que pode afetar significativamente o desempenho geral do processo. MDPs que lidam com esse tipo de problema são chamados de MDPs sensíveis a risco. Um tipo especial de MDP sensível a risco é o CVaR MDP, que inclui a métrica CVaR (Conditional-Value-at-Risk) comumente utilizada na área financeira. Um algoritmo que encontra a política ótima para CVaR MDPs é o algoritmo de Iteração de Valor com Interpolação Linear chamado CVaRVILI. O algoritmo CVaRVILI precisa resolver problemas de programação linear várias vezes, o que faz com que o algoritmo tenha um alto custo computacional. Neste trabalho, é proposto um algoritmo que avalia uma política estacionário para CVaR MDPs de custo constante e que não precisa resolver problemas de programação linear, esse algoritmo é chamado de PECVaR. Além disso, foram realizados experimentos usando o custo total esperado e o custo usando o algoritmo PECVaR de uma política neutra para inicializar o algoritmo CVaRVILI. Os resultados mostram que utilizando essas inicializações é possível diminuir o tempo de convergência do CVaRVILI na maioria dos casos.


Author(s):  
Mihály Ormos ◽  
Dusán Timotity

AbstractThis paper discusses an alternative explanation for the empirical findings contradicting the positive relationship between risk (variance) and reward (expected return). We show that these contradicting results might be due to the false definition of risk-perception, which we correct by introducing Expected Downside Risk (EDR). The EDR parameter, similar to the Expected Shortfall or Conditional Value-at-Risk, measures the tail risk, however, fits and better explains the utility perception of investors. Our results indicate that when using the EDR as risk measure, both the positive and negative relationship between expected return and risk can be derived under standard conditions (e. g. expected utility theory and positive risk-aversion). Therefore, no alternative psychological explanation or additional boundary condition on utility theory is required to explain the phenomenon. Furthermore, we show empirically that it is a more precise linear predictor of expected return than volatility, both for individual assets and portfolios.


1979 ◽  
Vol 36 (8) ◽  
pp. 939-947 ◽  
Author(s):  
Roy Mendelssohn

Conditions are given that imply there exist policies that "minimize risk" of undesirable events for stochastic harvesting models. It is shown that for many problems, either such a policy will not exist, or else it is an "extreme" policy that is equally undesirable. Techniques are given to systematically trade-off decreases in the long-run expected return with decreases in the long-run risk. Several numerical examples are given for models of salmon runs, when both population-based risks and harvest-based risks are considered. Key words: Markov decision processes, risk, salmon management, Pareto optimal policies, trade-off curves, linear programing


2010 ◽  
Vol 4 (2) ◽  
pp. 47-69 ◽  
Author(s):  
Bartosz Sawik

This paper presents a bi-objective portfolio model with the expected return as a performance measure and the expected worst-case return as a risk measure. The problems are formulated as a bi-objective linear program. Numerical examples based on 1000, 3500 and 4020 historical daily input data from the Warsaw Stock Exchange are presented and selected computational results are provided. The computational experiments prove that the proposed linear programming approach provides the decision maker with a simple tool for evaluating the relationship between the expected and the worst-case portfolio return.


2021 ◽  
Author(s):  
Xiaocheng Li ◽  
Huaiyang Zhong ◽  
Margaret L. Brandeau

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.


2022 ◽  
Author(s):  
Agostino Capponi ◽  
Alexey Rubtsov

How can we construct portfolios that perform well in the face of systemic events? The global financial crisis of 2007–2008 and the coronavirus disease 2019 pandemic have highlighted the importance of accounting for extreme form of risks. In “Systemic Risk-Driven Portfolio Selection,” Capponi and Rubtsov investigate the design of portfolios that trade off tail risk and expected growth of the investment. The authors show how two well-known risk measures, the value-at-risk and the conditional value-at-risk, can be used to construct portfolios that perform well in the face of systemic events. The paper uses U.S. stock data from the S&P500 Financials Index and Canadian stock data from the S&P/TSX Capped Financial Index, and it demonstrates that portfolios accounting for systemic risk attain higher risk-adjusted expected returns, compared with well-known benchmark portfolio criteria, during times of market downturn.


2018 ◽  
Vol 2018 ◽  
pp. 1-14
Author(s):  
Leiyan Xu ◽  
Zhiqing Meng ◽  
Gengui Zhou ◽  
Yunzhi Mu ◽  
Minchao Zheng

Direct chain enterprises (DCEs) face a decision-making issue as to how to allocate and supply their products to their stores for sales with the minimum losses and maximum profits for the manufacturers. This paper presents a single-cycle optimal allocation model for DCEs under the given total production amount and conditional value at risk loss. The optimal strategy for production allocation and supply is derived. Subsequently, an approximate algorithm for solving the optimal total production amount is presented. The optimal allocation and supply strategy, the minimum total production amount, the minimum allocation strategy, and the discount pricing strategy are obtained for the single cycle. Finally, with the sales data of a food DCE, numerical results corroborate that adopting different production and supply strategies reduces the risk of expected losses and increases the expected return. It is of an important theoretical significance in guiding the production and operation of direct chain enterprises.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Meihua Wang ◽  
Cheng Li ◽  
Honggang Xue ◽  
Fengmin Xu

A portfolio rebalancing model with self-finance strategy and consideration of V-shaped transaction cost is presented in this paper. Our main contribution is that a new constraint is introduced to confirm that the rebalance necessity of the existing portfolio needs to be adjusted. The constraint is constructed by considering both the transaction amount and transaction cost without any additional supply to the investment amount. The V-shaped transaction cost function is used to calculate the transaction cost of the portfolio, and conditional value at risk (CVaR) is used to measure the risk of the portfolios. Computational tests on practical financial data show that the proposed model is effective and the rebalanced portfolio increases the expected return of the portfolio and reduces the CVaR risk of the portfolio.


Sign in / Sign up

Export Citation Format

Share Document