scholarly journals SPAN: A Stochastic Projected Approximate Newton Method

2020 ◽  
Vol 34 (02) ◽  
pp. 1520-1527
Author(s):  
Xunpeng Huang ◽  
Xianfeng Liang ◽  
Zhengyang Liu ◽  
Lei Li ◽  
Yue Yu ◽  
...  

Second-order optimization methods have desirable convergence properties. However, the exact Newton method requires expensive computation for the Hessian and its inverse. In this paper, we propose SPAN, a novel approximate and fast Newton method. SPAN computes the inverse of the Hessian matrix via low-rank approximation and stochastic Hessian-vector products. Our experiments on multiple benchmark datasets demonstrate that SPAN outperforms existing first-order and second-order optimization methods in terms of the convergence wall-clock time. Furthermore, we provide a theoretical analysis of the per-iteration complexity, the approximation error, and the convergence rate. Both the theoretical analysis and experimental results show that our proposed method achieves a better trade-off between the convergence rate and the per-iteration efficiency.

2021 ◽  
Vol 15 (4) ◽  
pp. 1731-1750
Author(s):  
Olalekan Babaniyi ◽  
Ruanui Nicholson ◽  
Umberto Villa ◽  
Noémi Petra

Abstract. We consider the problem of inferring the basal sliding coefficient field for an uncertain Stokes ice sheet forward model from synthetic surface velocity measurements. The uncertainty in the forward model stems from unknown (or uncertain) auxiliary parameters (e.g., rheology parameters). This inverse problem is posed within the Bayesian framework, which provides a systematic means of quantifying uncertainty in the solution. To account for the associated model uncertainty (error), we employ the Bayesian approximation error (BAE) approach to approximately premarginalize simultaneously over both the noise in measurements and uncertainty in the forward model. We also carry out approximative posterior uncertainty quantification based on a linearization of the parameter-to-observable map centered at the maximum a posteriori (MAP) basal sliding coefficient estimate, i.e., by taking the Laplace approximation. The MAP estimate is found by minimizing the negative log posterior using an inexact Newton conjugate gradient method. The gradient and Hessian actions to vectors are efficiently computed using adjoints. Sampling from the approximate covariance is made tractable by invoking a low-rank approximation of the data misfit component of the Hessian. We study the performance of the BAE approach in the context of three numerical examples in two and three dimensions. For each example, the basal sliding coefficient field is the parameter of primary interest which we seek to infer, and the rheology parameters (e.g., the flow rate factor or the Glen's flow law exponent coefficient field) represent so-called nuisance (secondary uncertain) parameters. Our results indicate that accounting for model uncertainty stemming from the presence of nuisance parameters is crucial. Namely our findings suggest that using nominal values for these parameters, as is often done in practice, without taking into account the resulting modeling error, can lead to overconfident and heavily biased results. We also show that the BAE approach can be used to account for the additional model uncertainty at no additional cost at the online stage.


Author(s):  
К.В. Воронин ◽  
С.А. Соловьев

Предложен алгоритм решения задачи Гельмгольца в трехмерных неоднородных средах с использованием метода аппроксимации матрицами малого ранга. Рассматриваемый метод применяется в качестве предобусловливателя для двух итерационных процессов. Первый - простой в реализации и экономичный метод итерационного уточнения, второй - метод BiCGStab крыловского типа. Скорость сходимости обоих методов исследуется относительно качества предобусловливателя, которое определяется точностью малоранговой аппроксимации. Показано, что для типичных задач сейсморазведки скорость сходимости двух рассматриваемых методов примерно одинакова начиная с некоторой точности малоранговой аппроксимации. Вычислительные эксперименты показали, что при точности, достаточной для решения практических задач, предложенный метод более чем в 2 раза экономнее по использованию памяти и в 3 раза производительнее, чем прямой метод PARDISO библиотеки Intel MKL. An algorithm for solving the Helmholtz problem in 3D heterogeneous media using the low-rank approximation technique is proposed. This technique is applied as a preconditioner for two different iterative processes: an iterative refinement and BiCGStab. The iterative refinement approach is known to be very simple and straightforward but can suffer from the lack of convergence; BiCGStab is more stable and more sophisticated as well. A dependence of the convergence rate on low-rank approximation quality is studied for these iterative processes. For typical problems of seismic exploration, it is shown that, starting with some low-rank accuracy, the convergence rate of the iterative refinement is very similar to BiCGStab. Therefore, it is preferable to use the more efficient iterative refinement method. Numerical experiments also show that, for reasonable (from the practical standpoint) low-rank accuracy, the proposed method provides three times performance gain (for sequential code) and reduces the memory usage up to a factor of two in comparison with the Intel MKL PARDISO high performance direct solver.


2011 ◽  
Vol 2 (4) ◽  
pp. 12-34 ◽  
Author(s):  
Andreas Janecek ◽  
Ying Tan

The Non-negative Matrix Factorization (NMF) is a special low-rank approximation which allows for an additive parts-based and interpretable representation of the data. This article presents efforts to improve the convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on swarm intelligence. Several properties of the NMF objective function motivate the utilization of meta-heuristics: this function is non-convex, discontinuous, and may possess many local minima. The proposed optimization strategies are two-fold: On the one hand, a new initialization strategy for NMF is presented in order to initialize the NMF factors prior to the factorization; on the other hand, an iterative update strategy is proposed, which improves the accuracy per runtime for the multiplicative update NMF algorithm. The success of the proposed optimization strategies are shown by applying them on synthetic data and data sets coming from the areas of spam filtering/email classification, and evaluate them also in their application context. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and better classification accuracy. Especially the initialization strategy leads to significant reductions of the runtime per accuracy ratio for both, the NMF approximation as well as the classification results achieved with NMF.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Xiaoyi Guo ◽  
Wei Zhou ◽  
Yan Yu ◽  
Yijie Ding ◽  
Jijun Tang ◽  
...  

All drugs usually have side effects, which endanger the health of patients. To identify potential side effects of drugs, biological and pharmacological experiments are done but are expensive and time-consuming. So, computation-based methods have been developed to accurately and quickly predict side effects. To predict potential associations between drugs and side effects, we propose a novel method called the Triple Matrix Factorization- (TMF-) based model. TMF is built by the biprojection matrix and latent feature of kernels, which is based on Low Rank Approximation (LRA). LRA could construct a lower rank matrix to approximate the original matrix, which not only retains the characteristics of the original matrix but also reduces the storage space and computational complexity of the data. To fuse multivariate information, multiple kernel matrices are constructed and integrated via Kernel Target Alignment-based Multiple Kernel Learning (KTA-MKL) in drug and side effect space, respectively. Compared with other methods, our model achieves better performance on three benchmark datasets. The values of the Area Under the Precision-Recall curve (AUPR) are 0.677, 0.685, and 0.680 on three datasets, respectively.


2020 ◽  
Author(s):  
Olalekan Babaniyi ◽  
Ruanui Nicholson ◽  
Umberto Villa ◽  
Noémi Petra

Abstract. We consider the problem of inferring the basal sliding coefficient field for an uncertain Stokes ice sheet forward model from surface velocity measurements. The uncertainty in the forward model stems from unknown (or uncertain) auxiliary parameters (e.g., rheology parameters). This inverse problem is posed within the Bayesian framework, which provides a systematic means of quantifying uncertainty in the solution. To account for the associated model uncertainty (error), we employ the Bayesian Approximation Error (BAE) approach to approximately premarginalize simultaneously over both the noise in measurements and uncertainty in the forward model. We also carry out approximative posterior uncertainty quantification based on a linearization of the parameter-to-observable map centered at the maximum a posteriori (MAP) basal sliding coefficient estimate, i.e., by taking the Laplace approximation. The MAP estimate is found by minimizing the negative log posterior using an inexact Newton conjugate gradient method. The gradient and Hessian actions to vectors are efficiently computed using adjoints. Sampling from the approximate covariance is made tractable by invoking a low-rank approximation of the data misfit component of the Hessian. We study the performance of the BAE approach in the context of three numerical examples in two and three dimensions. For each example the basal sliding coefficient field is the parameter of primary interest, which we seek to infer, and the rheology parameters (e.g., the flow rate factor, or the Glen's flow law exponent coefficient field) represent so called nuisance (secondary uncertain) parameters. Our results indicate that accounting for model uncertainty stemming from the presence of nuisance parameters is crucial. Namely our findings suggest that using nominal values for these parameters, as is often done in practice, without taking into account the resulting modeling error, can lead to overconfident and heavily biased results. We also show that the BAE approach can be used to account for the additional model uncertainty at no additional cost at the online stage.


2015 ◽  
Vol 53 ◽  
pp. 375-438 ◽  
Author(s):  
Timothy A. Mann ◽  
Shie Mannor ◽  
Doina Precup

Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state-space. Next we consider the problem of generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at the landmark states. We analyze both FVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our experimental results in three different domains demonstrate the key properties from the analysis. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.


2016 ◽  
pp. 1564-1589 ◽  
Author(s):  
Andreas Janecek ◽  
Ying Tan

Low-rank approximations allow for compact representations of data with reduced storage and runtime requirements and reduced redundancy and noise. The Non-Negative Matrix Factorization (NMF) is a special low-rank approximation that allows for additive parts-based, interpretable representation of the data. Various properties of NMF are similar to Swarm Intelligence (SI) methods: indeed, most NMF objective functions and most SI fitness functions are non-convex, discontinuous, and may possess many local minima. This chapter summarizes efforts on improving convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on SI and evolutionary computation. The authors present (1) new initialization strategies for NMF, and (2) an iterative update strategy for NMF. The applicability of the approach is illustrated on data sets coming from the areas of spam filtering and email classification. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and/or better classification accuracy.


Author(s):  
Andreas Janecek ◽  
Ying Tan

Low-rank approximations allow for compact representations of data with reduced storage and runtime requirements and reduced redundancy and noise. The Non-Negative Matrix Factorization (NMF) is a special low-rank approximation that allows for additive parts-based, interpretable representation of the data. Various properties of NMF are similar to Swarm Intelligence (SI) methods: indeed, most NMF objective functions and most SI fitness functions are non-convex, discontinuous, and may possess many local minima. This chapter summarizes efforts on improving convergence, approximation quality, and classification accuracy of NMF using five different meta-heuristics based on SI and evolutionary computation. The authors present (1) new initialization strategies for NMF, and (2) an iterative update strategy for NMF. The applicability of the approach is illustrated on data sets coming from the areas of spam filtering and email classification. Experimental results show that both optimization strategies are able to improve NMF in terms of faster convergence, lower approximation error, and/or better classification accuracy.


Author(s):  
Timothy A. Mann ◽  
Shie Mannor ◽  
Doina Precup

The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when planning with options is more efficient than planning with primitive actions. We provide a general analysis of the convergence rate of a popular Approximate Value Iteration (AVI) algorithm called Fitted Value Iteration (FVI) with options. Our analysis reveals that longer duration options and a pessimistic estimate of the value function both lead to faster convergence. Furthermore, options can improve convergence even when they are suboptimal and sparsely distributed throughout the state space. Next we consider generating useful options for planning based on a subset of landmark states. This suggests a new algorithm, Landmark-based AVI (LAVI), that represents the value function only at landmark states. We analyze OFVI and LAVI using the proposed landmark-based options and compare the two algorithms. Our theoretical and experimental results demonstrate that options can play an important role in AVI by decreasing approximation error and inducing fast convergence.


Sign in / Sign up

Export Citation Format

Share Document