Direct Expected Quadratic Utility Maximization for Mean-Variance Controlled Reinforcement Learning

Traditional modeling on the mean-variance portfolio selection often assumes a full knowledge on statistics of assets' returns. It is, however, not always the case in real financial markets. This paper deals with an ambiguous mean-variance portfolio selection problem with a mixture model on the returns of risky assets, where the proportions of different component distributions are assumed to be unknown to the investor, but being constants (in any time instant). Taking into consideration the updates of proportions from future observations is essential to find an optimal policy with active learning feature, but makes the problem intractable when we adopt the classical methods. Using reinforcement learning, we derive an investment policy with a learning feature in a two-level framework. In the lower level, the time-decomposed approach (dynamic programming) is adopted to solve a family of scenario subcases where in each case the series of component distributions along multiple time periods is specified. At the upper level, a scenario-decomposed approach (progressive hedging algorithm) is applied in order to iteratively aggregate the scenario solutions from the lower layer based on the current knowledge on proportions, and this two-level solution framework is repeated in a manner of rolling horizon. We carry out experimental studies to illustrate the execution of our policy scheme.

Download Full-text

On the Computation of Optimal Monotone Mean-Variance Portfolios via Truncated Quadratic Utility

SSRN Electronic Journal ◽

10.2139/ssrn.1278623 ◽

2011 ◽

Author(s):

Ales Cerny ◽

Fabio Maccheroni ◽

Massimo Marinacci ◽

Aldo Rustichini

Keyword(s):

Quadratic Utility ◽

Mean Variance

Download Full-text

Large Scale Continuous-Time Mean-Variance Portfolio Allocation via Reinforcement Learning

SSRN Electronic Journal ◽

10.2139/ssrn.3428125 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haoran Wang

Keyword(s):

Reinforcement Learning ◽

Continuous Time ◽

Large Scale ◽

Portfolio Allocation ◽

Mean Variance Portfolio ◽

Mean Variance

Download Full-text

Mean-Variance Versus Direct Utility Maximization

The Journal of Finance ◽

10.1111/j.1540-6261.1984.tb03859.x ◽

1984 ◽

Vol 39 (1) ◽

pp. 47-61 ◽

Cited By ~ 177

Author(s):

YORAM KROLL ◽

HAIM LEVY ◽

HARRY M. MARKOWITZ

Keyword(s):

Utility Maximization ◽

Mean Variance

Download Full-text

Quadratic Utility and Linear Mean-Variance: A Pedagogic Note

Review of Agricultural Economics ◽

10.2307/1349644 ◽

1991 ◽

Vol 13 (2) ◽

pp. 289 ◽

Cited By ~ 1

Author(s):

Robert A. Collins ◽

Edward E. Gbur

Keyword(s):

Quadratic Utility ◽

Mean Variance

Download Full-text

Utility Maximization for an Investor with Asymmetric Attitude to Gains and Losses over the Mean–Variance Efficient Frontier

Journal of Physics Conference Series ◽

10.1088/1742-6596/1141/1/012017 ◽

2018 ◽

Vol 1141 ◽

pp. 012017 ◽

Cited By ~ 1

Author(s):

A. R. Faizliev ◽

E. V. Korotkovskaya ◽

S. P. Sidorov ◽

F. M. Smolov ◽

A.A. Vlasov

Keyword(s):

Utility Maximization ◽

Efficient Frontier ◽

The Mean ◽

Gains And Losses ◽

Mean Variance ◽

Asymmetric Attitude

Download Full-text

A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018500343 ◽

2018 ◽

Vol 27 (08) ◽

pp. 1850034 ◽

Cited By ~ 3

Author(s):

Erick Asiain ◽

Julio B. Clempner ◽

Alexander S. Poznyak

Keyword(s):

Reinforcement Learning ◽

Transition Probabilities ◽

Learning Algorithm ◽

Action Observation ◽

State Variables ◽

Transition Matrices ◽

The Mean ◽

Partially Observable Markov ◽

Partially Observable ◽

Mean Variance

In problems involving control of financial processes, it is usually complicated to quantify exactly the state variables. It could be expensive to acquire the exact value of a given state, even if it may be physically possible to do so. In such cases it may be interesting to support the decision-making process on inaccurate information pertaining to the system state. In addition, for modeling real-world application, it is necessary to compute the values of the parameters of the environment (transition probabilities and observation probabilities) and the reward functions, which are typically, hand-tuned by experts in the field until it has acquired a satisfactory value. This results in an undesired process. To address these shortcomings, this paper provides a new Reinforcement Learning (RL) framework for computing the mean-variance customer portfolio with transaction costs in controllable Partially Observable Markov Decision Processes (POMDPs). The solution is restricted to finite state, action, observation sets and average reward problems. For solving this problem, a controller/actor-critic architecture is proposed, which balance the difficult tasks of exploitation and exploration of the environment. The architecture consists of three modules: controller, fast-tracked portfolio learning and an actor-critic module. Each module involves the design of a convergent Temporal Difference (TD) learning algorithm. We employ three different learning rules to estimate the real values of: (a) the transition matrices [Formula: see text], (b) the rewards [Formula: see text] and (c) the resources destined for carrying out a promotion [Formula: see text]. We present a proof for the estimated transition matrix rule [Formula: see text] and showing that it converges when t → ∞. For solving the optimization programming problem we extend the c-variable method for partially observable Markov chains. The c-variable is conceptualized as joint strategy given by the product of the control policy, the observation kernel Q(y|s) and the stationary distribution vector. A major advantage of this procedure is that it can be implemented efficiently for real settings in controllable POMDP. A numerical example illustrates the results of the proposed method.

Download Full-text

TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning

Transactions of the Japanese Society for Artificial Intelligence ◽

10.1527/tjsai.16.353 ◽

2001 ◽

Vol 16 (3) ◽

pp. 353-362 ◽

Cited By ~ 17

Author(s):

Makoto Sato ◽

Hajime Kimura ◽

Shibenobu Kobayashi

Keyword(s):

Reinforcement Learning ◽

Mean Variance

Download Full-text

On the computation of optimal monotone mean–variance portfolios via truncated quadratic utility

Journal of Mathematical Economics ◽

10.1016/j.jmateco.2012.08.006 ◽

2012 ◽

Vol 48 (6) ◽

pp. 386-395 ◽

Cited By ~ 5

Author(s):

Aleš Černý ◽

Fabio Maccheroni ◽

Massimo Marinacci ◽

Aldo Rustichini

Keyword(s):

Quadratic Utility ◽

Mean Variance

Download Full-text

Direct Expected Quadratic Utility Maximization for Mean-Variance Controlled Reinforcement Learning

Mean-variance versus utility maximization revisited: The case of constant relative risk aversion

A Two-level Reinforcement Learning Algorithm for Ambiguous Mean-variance Portfolio Selection Problem

On the Computation of Optimal Monotone Mean-Variance Portfolios via Truncated Quadratic Utility

Large Scale Continuous-Time Mean-Variance Portfolio Allocation via Reinforcement Learning

Mean-Variance Versus Direct Utility Maximization

Quadratic Utility and Linear Mean-Variance: A Pedagogic Note

Utility Maximization for an Investor with Asymmetric Attitude to Gains and Losses over the Mean–Variance Efficient Frontier

A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models

TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning

On the computation of optimal monotone mean–variance portfolios via truncated quadratic utility

Export Citation Format