Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the first algorithm for sample-efficient learning of CVaR-optimal policies in Markov decision processes based on the optimism in the face of uncertainty principle. This method relies on a novel optimistic version of the distributional Bellman operator that moves probability mass from the lower to the upper tail of the return distribution. We prove asymptotic convergence and optimism of this operator for the tabular policy evaluation case. We further demonstrate that our algorithm finds CVaR-optimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces.

Download Full-text

EFFECTS OF MULTIPLE CRITERIA ON PORTFOLIO OPTIMIZATION

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622014500047 ◽

2014 ◽

Vol 13 (01) ◽

pp. 77-99 ◽

Cited By ~ 6

Author(s):

TUNCER ŞAKAR CEREN ◽

MURAT KÖKSALAN

Keyword(s):

At Risk ◽

Portfolio Optimization ◽

Decision Maker ◽

Value At Risk ◽

Efficient Frontier ◽

Conditional Value At Risk ◽

Computational Studies ◽

Expected Return ◽

Risk Criteria ◽

Return Variance

We study the effects of considering different criteria simultaneously on portfolio optimization. Using a single-period optimization setting, we use various combinations of expected return, variance, liquidity and Conditional Value at Risk criteria. With stocks from Borsa Istanbul, we make computational studies to show the effects of these criteria on objective and decision spaces. We also consider cardinality and weight constraints and study their effects on the results. In general, we observe that considering alternative criteria results in enlarged regions in the efficient frontier that may be of interest to the decision maker. We discuss the results of our experiments and provide insights.

Download Full-text

Novel Bidding Decision Model for Generation Company Based on CVaR Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.732-733.1438 ◽

2013 ◽

Vol 732-733 ◽

pp. 1438-1443

Author(s):

Pin Jie Xie ◽

Jian Chao Hou ◽

Quan Sheng Shi

Keyword(s):

Value At Risk ◽

Decision Model ◽

Decision Models ◽

Conditional Value At Risk ◽

Expected Return ◽

Decision Variables ◽

Bidding Price ◽

Generation Company ◽

Optimal Bidding ◽

Benefit And Risk

It is an urgent problem to solve for generation companies that how to find the optimal bidding strategy to obtain the highest profits and to decrease the risk to the lowest level. This paper presented a new bidding decision model with risk management for generation companies based on the conditional value at risk (CVaR). In the process of building the optimal bidding decision models, three situations are considered separately, including only consider maximize the expected return or the CVaR value of benefit, and considering the benefit and risk (CVaR). By this method, the generation companies can be determined the two decision variables of bidding price and bidding output to maximize its revenue at the same time to declare the risk.

Download Full-text

Algoritmo Exato de Avaliação de uma Política Estacionária para CVaR MDP

10.5753/eniac.2019.9341 ◽

2019 ◽

Author(s):

Denis Pais ◽

Valdinei Freire ◽

Karina Valdivia-Delgado

Keyword(s):

At Risk ◽

Markov Decision Processes ◽

Value At Risk ◽

Decision Processes ◽

Conditional Value At Risk ◽

Markov Decision

Processos de decisão Markovianos (Markov Decision Processes -- MDPs) são amplamente utilizados para resolver problemas de tomada de decisão sequencial. O critério de desempenho mais utilizado em MDPs é a minimização do custo total esperado. Porém, esta abordagem não leva em consideração flutuações em torno da média, o que pode afetar significativamente o desempenho geral do processo. MDPs que lidam com esse tipo de problema são chamados de MDPs sensíveis a risco. Um tipo especial de MDP sensível a risco é o CVaR MDP, que inclui a métrica CVaR (Conditional-Value-at-Risk) comumente utilizada na área financeira. Um algoritmo que encontra a política ótima para CVaR MDPs é o algoritmo de Iteração de Valor com Interpolação Linear chamado CVaRVILI. O algoritmo CVaRVILI precisa resolver problemas de programação linear várias vezes, o que faz com que o algoritmo tenha um alto custo computacional. Neste trabalho, é proposto um algoritmo que avalia uma política estacionário para CVaR MDPs de custo constante e que não precisa resolver problemas de programação linear, esse algoritmo é chamado de PECVaR. Além disso, foram realizados experimentos usando o custo total esperado e o custo usando o algoritmo PECVaR de uma política neutra para inicializar o algoritmo CVaRVILI. Os resultados mostram que utilizando essas inicializações é possível diminuir o tempo de convergência do CVaRVILI na maioria dos casos.

Download Full-text

The Case of “Less is More”: Modelling Risk-Preference with Expected Downside Risk

The B E Journal of Theoretical Economics ◽

10.1515/bejte-2016-0100 ◽

2017 ◽

Vol 17 (2) ◽

Cited By ~ 2

Author(s):

Mihály Ormos ◽

Dusán Timotity

Keyword(s):

Value At Risk ◽

Utility Theory ◽

Risk Measure ◽

Negative Relationship ◽

Downside Risk ◽

Psychological Explanation ◽

Conditional Value At Risk ◽

Expected Return ◽

Linear Predictor ◽

Less Is More

AbstractThis paper discusses an alternative explanation for the empirical findings contradicting the positive relationship between risk (variance) and reward (expected return). We show that these contradicting results might be due to the false definition of risk-perception, which we correct by introducing Expected Downside Risk (EDR). The EDR parameter, similar to the Expected Shortfall or Conditional Value-at-Risk, measures the tail risk, however, fits and better explains the utility perception of investors. Our results indicate that when using the EDR as risk measure, both the positive and negative relationship between expected return and risk can be derived under standard conditions (e. g. expected utility theory and positive risk-aversion). Therefore, no alternative psychological explanation or additional boundary condition on utility theory is required to explain the phenomenon. Furthermore, we show empirically that it is a more precise linear predictor of expected return than volatility, both for individual assets and portfolios.

Download Full-text

Determining the Best Trade-Off Between Expected Economic Return and the Risk of Undesirable Events When Managing a Randomly Varying Population

Journal of the Fisheries Research Board of Canada ◽

10.1139/f79-131 ◽

1979 ◽

Vol 36 (8) ◽

pp. 939-947 ◽

Cited By ~ 10

Author(s):

Roy Mendelssohn

Keyword(s):

Markov Decision Processes ◽

Population Based ◽

Decision Processes ◽

Expected Return ◽

Trade Off ◽

Long Run ◽

Optimal Policies ◽

Markov Decision ◽

Long Run Risk ◽

Varying Population

Conditions are given that imply there exist policies that "minimize risk" of undesirable events for stochastic harvesting models. It is shown that for many problems, either such a policy will not exist, or else it is an "extreme" policy that is equally undesirable. Techniques are given to systematically trade-off decreases in the long-run expected return with decreases in the long-run risk. Several numerical examples are given for models of salmon runs, when both population-based risks and harvest-based risks are considered. Key words: Markov decision processes, risk, salmon management, Pareto optimal policies, trade-off curves, linear programing

Download Full-text

A Bi-Objective Portfolio Optimization with Conditional Value-at-Risk

Decision Making in Manufacturing and Services ◽

10.7494/dmms.2010.4.2.47 ◽

2010 ◽

Vol 4 (2) ◽

pp. 47-69 ◽

Cited By ~ 4

Author(s):

Bartosz Sawik

Keyword(s):

Value At Risk ◽

Stock Exchange ◽

Risk Measure ◽

Performance Measure ◽

Conditional Value At Risk ◽

Programming Approach ◽

Expected Return ◽

Worst Case ◽

Warsaw Stock Exchange ◽

The Relationship

This paper presents a bi-objective portfolio model with the expected return as a performance measure and the expected worst-case return as a risk measure. The problems are formulated as a bi-objective linear program. Numerical examples based on 1000, 3500 and 4020 historical daily input data from the Warsaw Stock Exchange are presented and selected computational results are provided. The computational experiments prove that the proposed linear programming approach provides the decision maker with a simple tool for evaluating the relationship between the expected and the worst-case portfolio return.

Download Full-text

Quantile Markov Decision Processes

Operations Research ◽

10.1287/opre.2021.2123 ◽

2021 ◽

Author(s):

Xiaocheng Li ◽

Huaiyang Zhong ◽

Margaret L. Brandeau

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Conditional Value At Risk ◽

Sequential Decision ◽

Optimal Drug ◽

Markov Decision

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Download Full-text

Systemic Risk-Driven Portfolio Selection

Operations Research ◽

10.1287/opre.2021.2234 ◽

2022 ◽

Author(s):

Agostino Capponi ◽

Alexey Rubtsov

Keyword(s):

At Risk ◽

Portfolio Selection ◽

Systemic Risk ◽

Value At Risk ◽

Global Financial Crisis ◽

Risk Measures ◽

Conditional Value At Risk ◽

The Face ◽

Financial Index ◽

The Global Financial Crisis

How can we construct portfolios that perform well in the face of systemic events? The global financial crisis of 2007–2008 and the coronavirus disease 2019 pandemic have highlighted the importance of accounting for extreme form of risks. In “Systemic Risk-Driven Portfolio Selection,” Capponi and Rubtsov investigate the design of portfolios that trade off tail risk and expected growth of the investment. The authors show how two well-known risk measures, the value-at-risk and the conditional value-at-risk, can be used to construct portfolios that perform well in the face of systemic events. The paper uses U.S. stock data from the S&P500 Financials Index and Canadian stock data from the S&P/TSX Capped Financial Index, and it demonstrates that portfolios accounting for systemic risk attain higher risk-adjusted expected returns, compared with well-known benchmark portfolio criteria, during times of market downturn.

Download Full-text

Study on Single Cycle Production Allocation and Supply Strategy for DCEs Based on the CVaR Criterion

Discrete Dynamics in Nature and Society ◽

10.1155/2018/7840264 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14

Author(s):

Leiyan Xu ◽

Zhiqing Meng ◽

Gengui Zhou ◽

Yunzhi Mu ◽

Minchao Zheng

Keyword(s):

Value At Risk ◽

Optimal Allocation ◽

Approximate Algorithm ◽

Conditional Value At Risk ◽

Total Production ◽

Single Cycle ◽

Expected Return ◽

Allocation Model ◽

Production Allocation ◽

The Given

Direct chain enterprises (DCEs) face a decision-making issue as to how to allocate and supply their products to their stores for sales with the minimum losses and maximum profits for the manufacturers. This paper presents a single-cycle optimal allocation model for DCEs under the given total production amount and conditional value at risk loss. The optimal strategy for production allocation and supply is derived. Subsequently, an approximate algorithm for solving the optimal total production amount is presented. The optimal allocation and supply strategy, the minimum total production amount, the minimum allocation strategy, and the discount pricing strategy are obtained for the single cycle. Finally, with the sales data of a food DCE, numerical results corroborate that adopting different production and supply strategies reduces the risk of expected losses and increases the expected return. It is of an important theoretical significance in guiding the production and operation of direct chain enterprises.

Download Full-text

A New Portfolio Rebalancing Model with Transaction Costs

Journal of Applied Mathematics ◽

10.1155/2014/942374 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Meihua Wang ◽

Cheng Li ◽

Honggang Xue ◽

Fengmin Xu

Keyword(s):

At Risk ◽

Transaction Costs ◽

Cost Function ◽

Transaction Cost ◽

Value At Risk ◽

Financial Data ◽

Conditional Value At Risk ◽

Expected Return ◽

Portfolio Rebalancing ◽

Proposed Model

A portfolio rebalancing model with self-finance strategy and consideration of V-shaped transaction cost is presented in this paper. Our main contribution is that a new constraint is introduced to confirm that the rebalance necessity of the existing portfolio needs to be adjusted. The constraint is constructed by considering both the transaction amount and transaction cost without any additional supply to the investment amount. The V-shaped transaction cost function is used to calculate the transaction cost of the portfolio, and conditional value at risk (CVaR) is used to measure the risk of the portfolios. Computational tests on practical financial data show that the proposed model is effective and the rebalanced portfolio increases the expected return of the portfolio and reduces the CVaR risk of the portfolio.

Download Full-text