scholarly journals Risk-aware multi-armed bandit problem with application to portfolio selection

2017 ◽  
Vol 4 (11) ◽  
pp. 171377 ◽  
Author(s):  
Xiaoguang Huo ◽  
Feng Fu

Sequential portfolio selection has attracted increasing interest in the machine learning and quantitative finance communities in recent years. As a mathematical framework for reinforcement learning policies, the stochastic multi-armed bandit problem addresses the primary difficulty in sequential decision-making under uncertainty, namely the exploration versus exploitation dilemma, and therefore provides a natural connection to portfolio selection. In this paper, we incorporate risk awareness into the classic multi-armed bandit setting and introduce an algorithm to construct portfolio. Through filtering assets based on the topological structure of the financial market and combining the optimal multi-armed bandit policy with the minimization of a coherent risk measure, we achieve a balance between risk and return.

Author(s):  
S. Geissel ◽  
H. Graf ◽  
J. Herbinger ◽  
F. T. Seifried

AbstractThe purpose of this article is to evaluate optimal expected utility risk measures (OEU) in a risk-constrained portfolio optimization context where the expected portfolio return is maximized. We compare the portfolio optimization with OEU constraint to a portfolio selection model using value at risk as constraint. The former is a coherent risk measure for utility functions with constant relative risk aversion and allows individual specifications to the investor’s risk attitude and time preference. In a case study with three indices, we investigate how these theoretical differences influence the performance of the portfolio selection strategies. A copula approach with univariate ARMA-GARCH models is used in a rolling forecast to simulate monthly future returns and calculate the derived measures for the optimization. The results of this study illustrate that both optimization strategies perform considerably better than an equally weighted portfolio and a buy and hold portfolio. Moreover, our results illustrate that portfolio optimization with OEU constraint experiences individualized effects, e.g., less risk-averse investors lose more portfolio value in the financial crises but outperform their more risk-averse counterparts in bull markets.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


2020 ◽  
Vol 144 ◽  
pp. 113032 ◽  
Author(s):  
Hamid Hosseini Nesaz ◽  
Milad Jasemi ◽  
Leslie Monplaisir

2006 ◽  
pp. 220-225 ◽  
Author(s):  
Imre Kondor ◽  
Szilárd Pafka ◽  
Richárd Karádi ◽  
Gábor Nagy

Sign in / Sign up

Export Citation Format

Share Document