Value Function Calculus and Applications

The main objective of this paper is to present an algorithm of pricing perpetual American put options with asset-dependent discounting. The value function of such an instrument can be described as VAPutω(s)=supτ∈TEs[e−∫0τω(Sw)dw(K−Sτ)+], where T is a family of stopping times, ω is a discount function and E is an expectation taken with respect to a martingale measure. Moreover, we assume that the asset price process St is a geometric Lévy process with negative exponential jumps, i.e., St=seζt+σBt−∑i=1NtYi. The asset-dependent discounting is reflected in the ω function, so this approach is a generalisation of the classic case when ω is constant. It turns out that under certain conditions on the ω function, the value function VAPutω(s) is convex and can be represented in a closed form. We provide an option pricing algorithm in this scenario and we present exact calculations for the particular choices of ω such that VAPutω(s) takes a simplified form.

Download Full-text

Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence ◽

10.1145/3409501.3409517 ◽

2020 ◽

Author(s):

Huifan Gao ◽

Yinghui Pan ◽

Jing Tang ◽

Yifeng Zeng ◽

Peihua Chai ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Dynamic Estimation ◽

Data Adequacy

Download Full-text

Robo-Advising: Learning Investors’ Risk Preferences via Portfolio Choices*

Journal of Financial Econometrics ◽

10.1093/jjfinec/nbz040 ◽

2020 ◽

Author(s):

Humoud Alsabah ◽

Agostino Capponi ◽

Octavio Ruiz Lacedelli ◽

Matt Stern

Keyword(s):

Opportunity Cost ◽

Value Function ◽

Risk Preference ◽

Portfolio Decisions ◽

Learning Framework ◽

Portfolio Choices ◽

Trading Decisions ◽

Exploration Exploitation ◽

The Value Function ◽

Over Time

Abstract We introduce a reinforcement learning framework for retail robo-advising. The robo-advisor does not know the investor’s risk preference but learns it over time by observing her portfolio choices in different market environments. We develop an exploration–exploitation algorithm that trades off costly solicitations of portfolio choices by the investor with autonomous trading decisions based on stale estimates of investor’s risk aversion. We show that the approximate value function constructed by the algorithm converges to the value function of an omniscient robo-advisor over a number of periods that is polynomial in the state and action space. By correcting for the investor’s mistakes, the robo-advisor may outperform a stand-alone investor, regardless of the investor’s opportunity cost for making portfolio decisions.

Download Full-text