scholarly journals Assumed Density Filtering Q-learning

Author(s):  
Heejin Jeong ◽  
Clark Zhang ◽  
George J. Pappas ◽  
Daniel D. Lee

While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods, called as ADFQ, which updates beliefs on state-action values, Q, through an online Bayesian inference method known as Assumed Density Filtering. We formulate an efficient closed-form solution for the value update by approximately estimating analytic parameters of the posterior of the Q-beliefs. Uncertainty measures in the beliefs not only are used in exploration but also provide a natural regularization for the value update considering all next available actions. ADFQ converges to Q-learning as the uncertainty measures of the Q-beliefs decrease and improves common drawbacks of other Bayesian RL algorithms such as computational complexity. We extend ADFQ with a neural network. Our empirical results demonstrate that ADFQ outperforms comparable algorithms on various Atari 2600 games, with drastic improvements in highly stochastic domains or domains with a large action space.

2020 ◽  
Vol 15 (01) ◽  
pp. 2080001
Author(s):  
SUBHOJIT BISWAS ◽  
SAIF JAWAID ◽  
DIGANTA MUKHERJEE

We consider an investor who seeks to maximize his expected utility of the portfolio, consisting of multiple risky assets and one risk-free asset, derived from the terminal wealth relative to the maximum wealth achieved over a fixed time horizon. This is achieved under a portfolio draw down constraint, in a market with local stochastic volatility. In empirical application, considering two risky assets, the assets have been identified with the help of pairs trading. In the absence of closed form solution of the value function and the optimal strategy, we obtain the approximates of these quantities using coefficient series expansion techniques and finite difference schemes. We utilize the risk tolerance factor function to ease our approximations of this value functions and the strategies. All the parameters were estimated from the triplets and used to illustrate and compare the stochastic volatility with the constant volatility situation, and how an investor can deploy different portfolio plans.


2013 ◽  
Vol 40 (2) ◽  
pp. 106-114
Author(s):  
J. Venetis ◽  
Aimilios (Preferred name Emilios) Sideridis

1995 ◽  
Vol 23 (1) ◽  
pp. 2-10 ◽  
Author(s):  
J. K. Thompson

Abstract Vehicle interior noise is the result of numerous sources of excitation. One source involving tire pavement interaction is the tire air cavity resonance and the forcing it provides to the vehicle spindle: This paper applies fundamental principles combined with experimental verification to describe the tire cavity resonance. A closed form solution is developed to predict the resonance frequencies from geometric data. Tire test results are used to examine the accuracy of predictions of undeflected and deflected tire resonances. Errors in predicted and actual frequencies are shown to be less than 2%. The nature of the forcing this resonance as it applies to the vehicle spindle is also examined.


Author(s):  
Nguyen N. Tran ◽  
Ha X. Nguyen

A capacity analysis for generally correlated wireless multi-hop multi-input multi-output (MIMO) channels is presented in this paper. The channel at each hop is spatially correlated, the source symbols are mutually correlated, and the additive Gaussian noises are colored. First, by invoking Karush-Kuhn-Tucker condition for the optimality of convex programming, we derive the optimal source symbol covariance for the maximum mutual information between the channel input and the channel output when having the full knowledge of channel at the transmitter. Secondly, we formulate the average mutual information maximization problem when having only the channel statistics at the transmitter. Since this problem is almost impossible to be solved analytically, the numerical interior-point-method is employed to obtain the optimal solution. Furthermore, to reduce the computational complexity, an asymptotic closed-form solution is derived by maximizing an upper bound of the objective function. Simulation results show that the average mutual information obtained by the asymptotic design is very closed to that obtained by the optimal design, while saving a huge computational complexity.


Entropy ◽  
2018 ◽  
Vol 20 (11) ◽  
pp. 828 ◽  
Author(s):  
Jixia Wang ◽  
Yameng Zhang

This paper is dedicated to the study of the geometric average Asian call option pricing under non-extensive statistical mechanics for a time-varying coefficient diffusion model. We employed the non-extensive Tsallis entropy distribution, which can describe the leptokurtosis and fat-tail characteristics of returns, to model the motion of the underlying asset price. Considering that economic variables change over time, we allowed the drift and diffusion terms in our model to be time-varying functions. We used the I t o ^ formula, Feynman–Kac formula, and P a d e ´ ansatz to obtain a closed-form solution of geometric average Asian option pricing with a paying dividend yield for a time-varying model. Moreover, the simulation study shows that the results obtained by our method fit the simulation data better than that of Zhao et al. From the analysis of real data, we identify the best value for q which can fit the real stock data, and the result shows that investors underestimate the risk using the Black–Scholes model compared to our model.


2021 ◽  
Vol 10 (7) ◽  
pp. 435
Author(s):  
Yongbo Wang ◽  
Nanshan Zheng ◽  
Zhengfu Bian

Since pairwise registration is a necessary step for the seamless fusion of point clouds from neighboring stations, a closed-form solution to planar feature-based registration of LiDAR (Light Detection and Ranging) point clouds is proposed in this paper. Based on the Plücker coordinate-based representation of linear features in three-dimensional space, a quad tuple-based representation of planar features is introduced, which makes it possible to directly determine the difference between any two planar features. Dual quaternions are employed to represent spatial transformation and operations between dual quaternions and the quad tuple-based representation of planar features are given, with which an error norm is constructed. Based on L2-norm-minimization, detailed derivations of the proposed solution are explained step by step. Two experiments were designed in which simulated data and real data were both used to verify the correctness and the feasibility of the proposed solution. With the simulated data, the calculated registration results were consistent with the pre-established parameters, which verifies the correctness of the presented solution. With the real data, the calculated registration results were consistent with the results calculated by iterative methods. Conclusions can be drawn from the two experiments: (1) The proposed solution does not require any initial estimates of the unknown parameters in advance, which assures the stability and robustness of the solution; (2) Using dual quaternions to represent spatial transformation greatly reduces the additional constraints in the estimation process.


Sign in / Sign up

Export Citation Format

Share Document