Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Ömer Deniz Akyildiz; Dan Crisan; Joaquín Míguez

doi:10.1007/s11222-020-09964-4

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Statistics and Computing ◽

10.1007/s11222-020-09964-4 ◽

2020 ◽

Vol 30 (6) ◽

pp. 1645-1663

Author(s):

Ömer Deniz Akyildiz ◽

Dan Crisan ◽

Joaquín Míguez

Keyword(s):

Monte Carlo ◽

Cost Function ◽

Global Minimum ◽

Sequential Monte Carlo ◽

Convergence Rates ◽

Optimization Problems ◽

Search Space ◽

Gradient Based ◽

Multiple Minima ◽

The Cost

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques.

Download Full-text

An Adaptive Optimization Method Based on Learning Rate Schedule for Neural Networks

Applied Sciences ◽

10.3390/app11020850 ◽

2021 ◽

Vol 11 (2) ◽

pp. 850

Author(s):

Dokkyun Yi ◽

Sangmin Ji ◽

Jieun Park

Keyword(s):

Artificial Intelligence ◽

Cost Function ◽

Numerical Experiments ◽

Global Minimum ◽

Optimization Method ◽

Learning Method ◽

Adaptive Optimization ◽

The Cost ◽

Proof Of Convergence ◽

Learning Data

Artificial intelligence (AI) is achieved by optimizing the cost function constructed from learning data. Changing the parameters in the cost function is an AI learning process (or AI learning for convenience). If AI learning is well performed, then the value of the cost function is the global minimum. In order to obtain the well-learned AI learning, the parameter should be no change in the value of the cost function at the global minimum. One useful optimization method is the momentum method; however, the momentum method has difficulty stopping the parameter when the value of the cost function satisfies the global minimum (non-stop problem). The proposed method is based on the momentum method. In order to solve the non-stop problem of the momentum method, we use the value of the cost function to our method. Therefore, as the learning method processes, the mechanism in our method reduces the amount of change in the parameter by the effect of the value of the cost function. We verified the method through proof of convergence and numerical experiments with existing methods to ensure that the learning works well.

Download Full-text

Search Patterns Based on Trajectories Extracted from the Response of Second-Order Systems

Applied Sciences ◽

10.3390/app11083430 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3430

Author(s):

Erik Cuevas ◽

Héctor Becerra ◽

Héctor Escobar ◽

Alberto Luque-Chang ◽

Marco Pérez ◽

...

Keyword(s):

Convergence Rates ◽

Optimization Problems ◽

Search Space ◽

Second Order ◽

Order System ◽

Search Patterns ◽

Multiple Behaviors ◽

Complex Optimization ◽

Second Order Systems ◽

Order Algorithm

Recently, several new metaheuristic schemes have been introduced in the literature. Although all these approaches consider very different phenomena as metaphors, the search patterns used to explore the search space are very similar. On the other hand, second-order systems are models that present different temporal behaviors depending on the value of their parameters. Such temporal behaviors can be conceived as search patterns with multiple behaviors and simple configurations. In this paper, a set of new search patterns are introduced to explore the search space efficiently. They emulate the response of a second-order system. The proposed set of search patterns have been integrated as a complete search strategy, called Second-Order Algorithm (SOA), to obtain the global solution of complex optimization problems. To analyze the performance of the proposed scheme, it has been compared in a set of representative optimization problems, including multimodal, unimodal, and hybrid benchmark formulations. Numerical results demonstrate that the proposed SOA method exhibits remarkable performance in terms of accuracy and high convergence rates.

Download Full-text

Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smoother

Nonlinear Processes in Geophysics ◽

10.5194/npg-25-315-2018 ◽

2018 ◽

Vol 25 (2) ◽

pp. 315-334 ◽

Cited By ~ 2

Author(s):

Anthony Fillion ◽

Marc Bocquet ◽

Serge Gratton

Keyword(s):

Data Assimilation ◽

Cost Function ◽

Global Minimum ◽

Numerical Study ◽

Variational Data Assimilation ◽

Kalman Smoother ◽

Local Extrema ◽

Starting Point ◽

The Cost ◽

Temporal Extent

Abstract. The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss–Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.

Download Full-text

Land surface model parameter optimisation using in situ flux data: comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2)

Geoscientific Model Development ◽

10.5194/gmd-11-4739-2018 ◽

2018 ◽

Vol 11 (12) ◽

pp. 4739-4754 ◽

Cited By ~ 4

Author(s):

Vladislav Bastrikov ◽

Natasha MacBean ◽

Cédric Bacour ◽

Diego Santaren ◽

Sylvain Kuppel ◽

...

Keyword(s):

Genetic Algorithm ◽

Cost Function ◽

Land Surface ◽

Random Search ◽

Single Site ◽

Model Data ◽

Parameter Optimisation ◽

Gradient Based ◽

The Difference ◽

The Cost

Abstract. Land surface models (LSMs), which form the land component of earth system models, rely on numerous processes for describing carbon, water and energy budgets, often associated with highly uncertain parameters. Data assimilation (DA) is a useful approach for optimising the most critical parameters in order to improve model accuracy and refine future climate predictions. In this study, we compare two different DA methods for optimising the parameters of seven plant functional types (PFTs) of the ORCHIDEE LSM using daily averaged eddy-covariance observations of net ecosystem exchange and latent heat flux at 78 sites across the globe. We perform a technical investigation of two classes of minimisation methods – local gradient-based (the L-BFGS-B algorithm, limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm with bound constraints) and global random search (the genetic algorithm) – by evaluating their relative performance in terms of the model–data fit and the difference in retrieved parameter values. We examine the performance of each method for two cases: when optimising parameters at each site independently (“single-site” approach) and when simultaneously optimising the model at all sites for a given PFT using a common set of parameters (“multi-site” approach). We find that for the single site case the random search algorithm results in lower values of the cost function (i.e. lower model–data root mean square differences) than the gradient-based method; the difference between the two methods is smaller for the multi-site optimisation due to a smoothing of the cost function shape with a greater number of observations. The spread of the cost function, when performing the same tests with 16 random first-guess parameters, is much larger with the gradient-based method, due to the higher likelihood of being trapped in local minima. When using pseudo-observation tests, the genetic algorithm results in a closer approximation of the true posterior parameter value in the L-BFGS-B algorithm. We demonstrate the advantages and challenges of different DA techniques and provide some advice on using it for the LSM parameter optimisation.

Download Full-text

Fitting Parametric Vortices to Aliased Doppler Velocities Scanned from Hurricanes

Monthly Weather Review ◽

10.1175/mwr-d-12-00362.1 ◽

2014 ◽

Vol 142 (1) ◽

pp. 94-106 ◽

Cited By ~ 5

Author(s):

Qin Xu ◽

Yuan Jiang ◽

Liping Liu

Keyword(s):

Cost Function ◽

Global Minimum ◽

Least Squares Method ◽

Tangential Velocity ◽

Radial Distance ◽

Initial Guess ◽

Radial Velocities ◽

Vortex Center ◽

Robust Least Squares ◽

The Cost

Abstract An alias-robust least squares method that produces less errors than established methods is developed to produce reference radial velocities for automatically correcting raw aliased Doppler velocities scanned from hurricanes. This method estimates the maximum tangential velocity VM and its radial distance RM from the hurricane vortex center by fitting a parametric vortex model directly to raw aliased velocities at and around each selected vertical level. In this method, aliasing-caused zigzag discontinuities in the relationship between the observed and true radial velocities are formulated into the cost function by applying an alias operator to the entire analysis-minus-observation term to ensure the cost function to be smooth and concave around the global minimum. Simulated radar velocity observations are used to examine the cost function geometry around the global minimum in the space of control parameters (VM, RM). The results show that the global minimum point can estimate the true (VM, RM) approximately if the hurricane vortex center location is approximately known and the hurricane core and vicinity areas are adequately covered by the radar scans, and the global minimum can be found accurately by an efficient descent algorithm as long as the initial guess is in the concave vicinity of the global minimum. The method is used with elaborated refinements for automated dealiasing, and this utility is highlighted by an example applied to severely aliased radial velocities scanned from a hurricane.

Download Full-text

Subband Adaptive Filtering withl1-Norm Constraint for Sparse System Identification

Mathematical Problems in Engineering ◽

10.1155/2013/601623 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 9

Author(s):

Young-Seok Choi

Keyword(s):

System Identification ◽

Cost Function ◽

Sparse System ◽

Gradient Based ◽

Subband Adaptive Filter ◽

Sparse System Identification ◽

Norm Constraint ◽

Optimum Solution ◽

The Cost ◽

Normalized Subband Adaptive Filter

This paper presents a new approach of the normalized subband adaptive filter (NSAF) which directly exploits the sparsity condition of an underlying system for sparse system identification. The proposed NSAF integrates a weightedl1-norm constraint into the cost function of the NSAF algorithm. To get the optimum solution of the weightedl1-norm regularized cost function, a subgradient calculus is employed, resulting in a stochastic gradient based update recursion of the weightedl1-norm regularized NSAF. The choice of distinct weightedl1-norm regularization leads to two versions of thel1-norm regularized NSAF. Numerical results clearly indicate the superior convergence of thel1-norm regularized NSAFs over the classical NSAF especially when identifying a sparse system.

Download Full-text

On the selection of the cost function for gradient-based decomposition of surface electromyograms

2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2008.4650254 ◽

2008 ◽

Cited By ~ 4

Author(s):

A. Holobar ◽

D. Zazula

Keyword(s):

Cost Function ◽

Gradient Based ◽

The Cost ◽

Surface Electromyograms ◽

Selection Of

Download Full-text

SGN: Sparse Gauss-Newton for Accelerated Sensitivity Analysis

ACM Transactions on Graphics ◽

10.1145/3470005 ◽

2022 ◽

Vol 41 (1) ◽

pp. 1-10

Author(s):

Jonas Zehnder ◽

Stelian Coros ◽

Bernhard Thomaszewski

Keyword(s):

Sensitivity Analysis ◽

Nonlinear Programming ◽

Inverse Problems ◽

Convergence Rates ◽

Optimization Problems ◽

Sparse Matrix ◽

Constrained Optimization Problems ◽

Wide Range ◽

The Cost ◽

Problem Setting

We present a sparse Gauss-Newton solver for accelerated sensitivity analysis with applications to a wide range of equilibrium-constrained optimization problems. Dense Gauss-Newton solvers have shown promising convergence rates for inverse problems, but the cost of assembling and factorizing the associated matrices has so far been a major stumbling block. In this work, we show how the dense Gauss-Newton Hessian can be transformed into an equivalent sparse matrix that can be assembled and factorized much more efficiently. This leads to drastically reduced computation times for many inverse problems, which we demonstrate on a diverse set of examples. We furthermore show links between sensitivity analysis and nonlinear programming approaches based on Lagrange multipliers and prove equivalence under specific assumptions that apply for our problem setting.

Download Full-text

Full waveform inversion for bore reconstruction of woodwind-like instruments

Acta Acustica ◽

10.1051/aacus/2021038 ◽

2021 ◽

Vol 5 ◽

pp. 47

Author(s):

Augustin Ernoult ◽

Juliette Chabassier ◽

Samuel Rodriguez ◽

Augustin Humeau

Keyword(s):

Linear System ◽

Cost Function ◽

Waveform Inversion ◽

Full Waveform Inversion ◽

Viscous Effects ◽

Full Waveform ◽

Design Variables ◽

Gradient Based ◽

The Cost ◽

Spectral Finite Elements

The internal geometry of a wind instrument can be estimated from acoustic measurements. For woodwind instruments, this involves characterizing the inner shape (bore) but also the side holes (dimensions and location). In this study, the geometric parameters are recovered by a gradient-based optimization process, which minimizes the deviation between simulated and measured linear acoustic responses of the resonator for several fingerings through an observable function. The acoustic fields are computed by solving a linear system resulting from the 1D spectral finite elements spatial discretization of the wave propagation equations (including thermo-viscous effects, radiation and side holes). The “full waveform inversion” (FWI) technique exploits the fact that the gradient of the cost function can be computed by solving the same linear system as that of the direct problem but with a different source term. The gradient is computed with better accuracy and less additional cost than with finite-difference. The dependence of the cost function on the choice of the observed quantity, the frequency range and the fingerings used, is first analyzed. Then, the FWI is used to reconstruct, from measured impedances, an elementary instrument with 14 design variables. The results, obtained in about 1 minute on a laptop, are in excellent agreement with the direct geometric measurements.

Download Full-text

On Generating Optimal Signal Probabilities for Random Tests: A Genetic Approach

VLSI Design ◽

10.1155/1996/75798 ◽

1996 ◽

Vol 4 (3) ◽

pp. 207-215 ◽

Cited By ~ 1

Author(s):

M. Srinivas ◽

L. M. Patnaik

Keyword(s):

Genetic Algorithms ◽

Cost Function ◽

Gradient Descent ◽

Random Search ◽

Search Space ◽

Optimization Techniques ◽

Gradient Descent Methods ◽

Test Vectors ◽

The Cost ◽

Optimal Signal

Genetic Algorithms are robust search and optimization techniques. A Genetic Algorithm based approach for determining the optimal input distributions for generating random test vectors is proposed in the paper. A cost function based on the COP testability measure for determining the efficacy of the input distributions is discussed. A brief overview of Genetic Algorithms (GAs) and the specific details of our implementation are described. Experimental results based on ISCAS-85 benchmark circuits are presented. The performance of our GAbased approach is compared with previous results. While the GA generates more efficient input distributions than the previous methods which are based on gradient descent search, the overheads of the GA in computing the input distributions are larger.To account for the relatively quick convergence of the gradient descent methods, we analyze the landscape of the COP-based cost function. We prove that the cost function is unimodal in the search space. This feature makes the cost function amenable to optimization by gradient-descent techniques as compared to random search methods such as Genetic Algorithms.

Download Full-text