scholarly journals Forward and Backward Bellman Equations Improve the Efficiency of the EM Algorithm for DEC-POMDP

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 551
Author(s):  
Takehiro Tottori ◽  
Tetsuya J. Kobayashi

Decentralized partially observable Markov decision process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by the EM algorithm. However, in EM for DEC-POMDP, the forward–backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose the Bellman EM algorithm (BEM) and the modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward–backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM calculates an inverse matrix. We circumvent this shortcoming in MBEM by calculating the forward and backward Bellman equations without the inverse matrix. Our numerical experiments demonstrate that the convergence of MBEM is faster than that of EM.

1995 ◽  
Vol 12 (5) ◽  
pp. 515-527 ◽  
Author(s):  
Jeanine J. Houwing-Duistermaat ◽  
Lodewijk A. Sandkuijl ◽  
Arthur A. B. Bergen ◽  
Hans C. van Houwelingen

2002 ◽  
Vol 27 (3) ◽  
pp. 291-317 ◽  
Author(s):  
Natasha Rossi ◽  
Xiaohui Wang ◽  
James O. Ramsay

The methods of functional data analysis are used to estimate item response functions (IRFs) nonparametrically. The EM algorithm is used to maximize the penalized marginal likelihood of the data. The penalty controls the smoothness of the estimated IRFs, and is chosen so that, as the penalty is increased, the estimates converge to shapes closely represented by the three-parameter logistic family. The one-dimensional latent trait model is recast as a problem of estimating a space curve or manifold, and, expressed in this way, the model no longer involves any latent constructs, and is invariant with respect to choice of latent variable. Some results from differential geometry are used to develop a data-anchored measure of ability and a new technique for assessing item discriminability. Functional data-analytic techniques are used to explore the functional variation in the estimated IRFs. Applications involving simulated and actual data are included.


Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2413
Author(s):  
Ruijie Guan ◽  
Xu Zhao ◽  
Weihu Cheng ◽  
Yaohua Rong

In this paper, a new generalized t (new Gt) distribution based on a distribution construction approach is proposed and proved to be suitable for fitting both the data with high kurtosis and heavy tail. The main innovation of this article consists of four parts. First of all, the main characteristics and properties of this new distribution are outined. Secondly, we derive the explicit expression for the moments of order statistics as well as its corresponding variance–covariance matrix. Thirdly, we focus on the parameter estimation of this new Gt distribution and introduce several estimation methods, such as a modified method of moments (MMOM), a maximum likelihood estimation (MLE) using the EM algorithm, a novel iterative algorithm to acquire MLE, and improved probability weighted moments (IPWM). Through simulation studies, it can be concluded that the IPWM estimation performs better than the MLE using the EM algorithm and the MMOM in general. The newly-proposed iterative algorithm has better performance than the EM algorithm when the sample kurtosis is greater than 2.7. For four parameters of the new Gt distribution, a profile maximum likelihood approach using the EM algorithm is developed to deal with the estimation problem and obtain acceptable.


2019 ◽  
Vol 49 (1) ◽  
pp. 117-146
Author(s):  
Rexford M. Akakpo ◽  
Michelle Xia ◽  
Alan M. Polansky

AbstractIn insurance underwriting, misrepresentation represents the type of insurance fraud when an applicant purposely makes a false statement on a risk factor that may lower his or her cost of insurance. Under the insurance ratemaking context, we propose to use the expectation-maximization (EM) algorithm to perform maximum likelihood estimation of the regression effects and the prevalence of misrepresentation for the misrepresentation model proposed by Xia and Gustafson [(2016) The Canadian Journal of Statistics, 44, 198–218]. For applying the EM algorithm, the unobserved status of misrepresentation is treated as a latent variable in the complete-data likelihood function. We derive the iterative formulas for the EM algorithm and obtain the analytical form of the Fisher information matrix for frequentist inference on the parameters of interest for lognormal losses. We implement the algorithm and demonstrate that valid inference can be obtained on the risk effect despite the unobserved status of misrepresentation. Applying the proposed algorithm, we perform a loss severity analysis with the Medical Expenditure Panel Survey data. The analysis reveals not only the potential impact misrepresentation may have on the risk effect but also statistical evidence on the presence of misrepresentation in the self-reported insurance status.


2021 ◽  
Author(s):  
Masahiro Kuroda

Mixture models become increasingly popular due to their modeling flexibility and are applied to the clustering and classification of heterogeneous data. The EM algorithm is largely used for the maximum likelihood estimation of mixture models because the algorithm is stable in convergence and simple in implementation. Despite such advantages, it is pointed out that the EM algorithm is local and has slow convergence as the main drawback. To avoid the local convergence of the EM algorithm, multiple runs from several different initial values are usually used. Then the algorithm may take a large number of iterations and long computation time to find the maximum likelihood estimates. The speedup of computation of the EM algorithm is available for these problems. We give the algorithms to accelerate the convergence of the EM algorithm and apply them to mixture model estimation. Numerical experiments examine the performance of the acceleration algorithms in terms of the number of iterations and computation time.


2021 ◽  
Author(s):  
Xiaocheng Li ◽  
Huaiyang Zhong ◽  
Margaret L. Brandeau

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.


Sign in / Sign up

Export Citation Format

Share Document