Evolutionary Gradient Descent for Non-convex Optimization

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/443 ◽

2021 ◽

Author(s):

Ke Xue ◽

Chao Qian ◽

Ling Xu ◽

Xudong Fei

Keyword(s):

Convex Optimization ◽

Stationary Point ◽

Gradient Descent ◽

Saddle Points ◽

Global Optimum ◽

General Purpose ◽

Good Ability ◽

Derivative Free Optimization ◽

Derivative Free ◽

Low Efficiency

Non-convex optimization is often involved in artificial intelligence tasks, which may have many saddle points, and is NP-hard to solve. Evolutionary algorithms (EAs) are general-purpose derivative-free optimization algorithms with a good ability to find the global optimum, which can be naturally applied to non-convex optimization. Their performance is, however, limited due to low efficiency. Gradient descent (GD) runs efficiently, but only converges to a first-order stationary point, which may be a saddle point and thus arbitrarily bad. Some recent efforts have been put into combining EAs and GD. However, previous works either utilized only a specific component of EAs, or just combined them heuristically without theoretical guarantee. In this paper, we propose an evolutionary GD (EGD) algorithm by combining typical components, i.e., population and mutation, of EAs with GD. We prove that EGD can converge to a second-order stationary point by escaping the saddle points, and is more efficient than previous algorithms. Empirical results on non-convex synthetic functions as well as reinforcement learning (RL) tasks also show its superiority.

Download Full-text

Escaping Saddle Points for Zeroth-order Non-convex Optimization using Estimated Gradient Descent

2020 54th Annual Conference on Information Sciences and Systems (CISS) ◽

10.1109/ciss48834.2020.1570627382 ◽

2020 ◽

Author(s):

Qinbo Bai ◽

Mridul Agarwal ◽

Vaneet Aggarwal

Keyword(s):

Convex Optimization ◽

Gradient Descent ◽

Saddle Points ◽

Zeroth Order

Download Full-text

Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

10.21236/ada622645 ◽

2015 ◽

Author(s):

Katya Scheinberg

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Learning Models ◽

Statistical Machine Learning ◽

Derivative Free Optimization ◽

Derivative Free ◽

Machine Learning Models

Download Full-text

Iteration Complexity of a Block Coordinate Gradient Descent Method for Convex Optimization

SIAM Journal on Optimization ◽

10.1137/140964795 ◽

2015 ◽

Vol 25 (3) ◽

pp. 1298-1313 ◽

Cited By ~ 1

Author(s):

Xiaoqin Hua ◽

Nobuo Yamashita

Keyword(s):

Convex Optimization ◽

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method ◽

Iteration Complexity ◽

Coordinate Gradient Descent

Download Full-text

Optimal Wells Placement to Maximize the Field Coverage Using Derivative-Free Optimization

Procedia Computer Science ◽

10.1016/j.procs.2020.11.008 ◽

2020 ◽

Vol 178 ◽

pp. 65-74

Author(s):

Ksenia Balabaeva ◽

Liya Akmadieva ◽

Sergey Kovalchuk

Keyword(s):

Derivative Free Optimization ◽

Derivative Free

Download Full-text

Black box operation optimization of basic oxygen furnace steelmaking process with derivative free optimization algorithm

Computers & Chemical Engineering ◽

10.1016/j.compchemeng.2021.107311 ◽

2021 ◽

Vol 150 ◽

pp. 107311

Author(s):

Yongxia Liu ◽

Lixin Tang ◽

Chang Liu ◽

Lijie Su ◽

Jian Wu

Keyword(s):

Optimization Algorithm ◽

Basic Oxygen Furnace ◽

Black Box ◽

Steelmaking Process ◽

Basic Oxygen ◽

Operation Optimization ◽

Derivative Free Optimization ◽

Derivative Free

Download Full-text

Research on the Forward Kinematics of Stewart Platform Using Memetic Algorithms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.268-270.1416 ◽

2012 ◽

Vol 268-270 ◽

pp. 1416-1421

Author(s):

Yu Hui Zhang ◽

Li Wen Guan ◽

Li Ping Wang ◽

Yong Zhi Hua

Keyword(s):

Nonlinear Equations ◽

Rapid Development ◽

Memetic Algorithms ◽

Stewart Platform ◽

Global Optimum ◽

General Purpose ◽

Forward Kinematics ◽

Systems Of Nonlinear Equations ◽

Artificial Intelligence Technology ◽

Difficult Issue

The forward kinematics analysis of parallel manipulator is a difficult issue, which has been studied by many researchers recently. In this paper, in order to solve the difficult issue, a new computing method with higher calculation accuracy, good operation steadiness and faster speed is mentioned. Firstly, the mathematical model of direct kinematics of the Stewart platform is founded, which is nonlinear equations. Secondly, with the rapid development of artificial intelligence technology, Memetic algorithms (MA) are applied to solve the systems of nonlinear equations more and more, replacing the traditional algorithms. MA is a kind of meta-heuristic algorithm combined genetic algorithms (GA) with local search at the end of iteration. Finally, the validity of this algorithm has been testified by simulating iteration operation. The numerical simulation shows that MA can surely and rapidly get global optimum solution and greatly improve convergence rate. Thereby, MA can be widely used as a general-purpose algorithm for solving the forward kinematics of parallel mechanism.

Download Full-text

Benchmarking and Field-Testing of the Distributed Quasi-Newton Derivative-Free Optimization Method for Field Development Optimization

10.2118/206267-ms ◽

2021 ◽

Author(s):

Faruk Alpak ◽

Yixuan Wang ◽

Guohua Gao ◽

Vivek Jain

Keyword(s):

Optimization Problems ◽

Field Testing ◽

Optimization Method ◽

Training Data ◽

Local Optima ◽

Field Development ◽

Derivative Free Optimization ◽

Derivative Free ◽

Data Points ◽

Quasi Newton

Abstract Recently, a novel distributed quasi-Newton (DQN) derivative-free optimization (DFO) method was developed for generic reservoir performance optimization problems including well-location optimization (WLO) and well-control optimization (WCO). DQN is designed to effectively locate multiple local optima of highly nonlinear optimization problems. However, its performance has neither been validated by realistic applications nor compared to other DFO methods. We have integrated DQN into a versatile field-development optimization platform designed specifically for iterative workflows enabled through distributed-parallel flow simulations. DQN is benchmarked against alternative DFO techniques, namely, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method hybridized with Direct Pattern Search (BFGS-DPS), Mesh Adaptive Direct Search (MADS), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). DQN is a multi-thread optimization method that distributes an ensemble of optimization tasks among multiple high-performance-computing nodes. Thus, it can locate multiple optima of the objective function in parallel within a single run. Simulation results computed from one DQN optimization thread are shared with others by updating a unified set of training data points composed of responses (implicit variables) of all successful simulation jobs. The sensitivity matrix at the current best solution of each optimization thread is approximated by a linear-interpolation technique using all or a subset of training-data points. The gradient of the objective function is analytically computed using the estimated sensitivities of implicit variables with respect to explicit variables. The Hessian matrix is then updated using the quasi-Newton method. A new search point for each thread is solved from a trust-region subproblem for the next iteration. In contrast, other DFO methods rely on a single-thread optimization paradigm that can only locate a single optimum. To locate multiple optima, one must repeat the same optimization process multiple times starting from different initial guesses for such methods. Moreover, simulation results generated from a single-thread optimization task cannot be shared with other tasks. Benchmarking results are presented for synthetic yet challenging WLO and WCO problems. Finally, DQN method is field-tested on two realistic applications. DQN identifies the global optimum with the least number of simulations and the shortest run time on a synthetic problem with known solution. On other benchmarking problems without a known solution, DQN identified compatible local optima with reasonably smaller numbers of simulations compared to alternative techniques. Field-testing results reinforce the auspicious computational attributes of DQN. Overall, the results indicate that DQN is a novel and effective parallel algorithm for field-scale development optimization problems.

Download Full-text

Geometry of sample sets in derivative-free optimization: polynomial regression and underdetermined interpolation

IMA Journal of Numerical Analysis ◽

10.1093/imanum/drn046 ◽

2008 ◽

Vol 28 (4) ◽

pp. 721-748 ◽

Cited By ~ 30

Author(s):

A. R. Conn ◽

K. Scheinberg ◽

L. N. Vicente

Keyword(s):

Polynomial Regression ◽

Derivative Free Optimization ◽

Derivative Free

Download Full-text

Architectural Symmetry Detection from 3D Urban Point Clouds: A Derivative-Free Optimization (DFO) Approach

Advances in Informatics and Computing in Civil and Construction Engineering ◽

10.1007/978-3-030-00220-6_61 ◽

2018 ◽

pp. 513-519

Author(s):

Fan Xue ◽

Ke Chen ◽

Weisheng Lu

Keyword(s):

Point Clouds ◽

Symmetry Detection ◽

Derivative Free Optimization ◽

Derivative Free

Download Full-text

A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization

Stochastic Systems ◽

10.1287/stsy.2021.0083 ◽

2021 ◽

Author(s):

Tianyi Liu ◽

Zhehui Chen ◽

Enlu Zhou ◽

Tuo Zhao

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Gradient Descent ◽

Deep Neural Networks ◽

Optimization Problems ◽

Saddle Points ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Nonconvex Optimization Problems ◽

Empirical Success

Momentum stochastic gradient descent (MSGD) algorithm has been widely applied to many nonconvex optimization problems in machine learning (e.g., training deep neural networks, variational Bayesian inference, etc.). Despite its empirical success, there is still a lack of theoretical understanding of convergence properties of MSGD. To fill this gap, we propose to analyze the algorithmic behavior of MSGD by diffusion approximations for nonconvex optimization problems with strict saddle points and isolated local optima. Our study shows that the momentum helps escape from saddle points but hurts the convergence within the neighborhood of optima (if without the step size annealing or momentum annealing). Our theoretical discovery partially corroborates the empirical success of MSGD in training deep neural networks.

Download Full-text