Delay-Adaptive Distributed Stochastic Optimization

Zhaolin Ren; Zhengyuan Zhou; Linhai Qiu; Ajay Deshpande; Jayant Kalagnanam

doi:10.1609/aaai.v34i04.6001

Delay-Adaptive Distributed Stochastic Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6001 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5503-5510

Author(s):

Zhaolin Ren ◽

Zhengyuan Zhou ◽

Linhai Qiu ◽

Ajay Deshpande ◽

Jayant Kalagnanam

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Optimization Problems ◽

Stochastic Gradient Descent ◽

Large Scale Optimization ◽

Step Size ◽

Gradient Information ◽

Convergence Results ◽

Scale Optimization ◽

Class Of Functions

In large-scale optimization problems, distributed asynchronous stochastic gradient descent (DASGD) is a commonly used algorithm. In most applications, there are often a large number of computing nodes asynchronously computing gradient information. As such, the gradient information received at a given iteration is often stale. In the presence of such delays, which can be unbounded, the convergence of DASGD is uncertain. The contribution of this paper is twofold. First, we propose a delay-adaptive variant of DASGD where we adjust each iteration's step-size based on the size of the delay, and prove asymptotic convergence of the algorithm on variationally coherent stochastic problems, a class of functions which properly includes convex, quasi-convex and star-convex functions. Second, we extend the convergence results of standard DASGD, used usually for problems with bounded domains, to problems with unbounded domains. In this way, we extend the frontier of theoretical guarantees for distributed asynchronous optimization, and provide new insights for practitioners working on large-scale optimization problems.

Download Full-text

New adaptive Barzilai--Borwein step size and its application in solving large-scale optimization problems

ANZIAM Journal ◽

10.21914/anziamj.v61i0.12874 ◽

2019 ◽

Vol 61 ◽

pp. 76

Author(s):

Ting Li ◽

Zhong Wan

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Large Scale Optimization ◽

Step Size ◽

Scale Optimization

Download Full-text

NEW ADAPTIVE BARZILAI–BORWEIN STEP SIZE AND ITS APPLICATION IN SOLVING LARGE-SCALE OPTIMIZATION PROBLEMS

The ANZIAM Journal ◽

10.1017/s1446181118000263 ◽

2018 ◽

Vol 61 (1) ◽

pp. 76-98 ◽

Cited By ~ 7

Author(s):

TING LI ◽

ZHONG WAN

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Test Problems ◽

Large Scale Optimization ◽

Step Size ◽

Quadratic Minimization ◽

Minimization Problems ◽

Ill Posed ◽

Scale Optimization

We propose a new adaptive and composite Barzilai–Borwein (BB) step size by integrating the advantages of such existing step sizes. Particularly, the proposed step size is an optimal weighted mean of two classical BB step sizes and the weights are updated at each iteration in accordance with the quality of the classical BB step sizes. Combined with the steepest descent direction, the adaptive and composite BB step size is incorporated into the development of an algorithm such that it is efficient to solve large-scale optimization problems. We prove that the developed algorithm is globally convergent and it R-linearly converges when applied to solve strictly convex quadratic minimization problems. Compared with the state-of-the-art algorithms available in the literature, the proposed step size is more efficient in solving ill-posed or large-scale benchmark test problems.

Download Full-text

Distributed Stochastic Optimization with Large Delays

Mathematics of Operations Research ◽

10.1287/moor.2021.1200 ◽

2021 ◽

Author(s):

Zhengyuan Zhou ◽

Panayotis Mertikopoulos ◽

Nicholas Bambos ◽

Peter Glynn ◽

Yinyu Ye

Keyword(s):

Stochastic Optimization ◽

Gradient Descent ◽

Large Scale ◽

Optimization Problems ◽

Broad Class ◽

Algorithm Design ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Step Size ◽

Critical Set

The recent surge of breakthroughs in machine learning and artificial intelligence has sparked renewed interest in large-scale stochastic optimization problems that are universally considered hard. One of the most widely used methods for solving such problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm’s step size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with a probability of one under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale nonconvex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.

Download Full-text

Meta-Descent for Online, Continual Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013943 ◽

2019 ◽

Vol 33 ◽

pp. 3943-3950

Author(s):

Andrew Jacobsen ◽

Matthew Schlegel ◽

Cameron Linke ◽

Thomas Degris ◽

Adam White ◽

...

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Time Series Prediction ◽

Real Data ◽

Second Order ◽

Stochastic Gradient Descent ◽

Step Size ◽

Vector Approximation ◽

Prediction Problems ◽

Stationary Problems

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update—a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the stepsize parameters to minimize prediction error. These metadescent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental metadescent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.

Download Full-text

A new three-term spectral conjugate gradient algorithm with higher numerical performance for solving large scale optimization problems based on Quasi-Newton equation

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962321500537 ◽

2021 ◽

pp. 2150053

Author(s):

Jie Guo ◽

Zhong Wan

Keyword(s):

Conjugate Gradient ◽

Large Scale ◽

Line Search ◽

Optimization Problems ◽

Gradient Algorithm ◽

Large Scale Optimization ◽

Newton Equation ◽

Numerical Performance ◽

Scale Optimization ◽

Quasi Newton

A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems. It is proved that the search directions in this algorithm always satisfy a sufficiently descent condition independent of any line search. Global convergence is established for general objective functions if the strong Wolfe line search is used. Numerical experiments are employed to show its high numerical performance in solving large-scale optimization problems. Particularly, the developed algorithm is implemented to solve the 100 benchmark test problems from CUTE with different sizes from 1000 to 10,000, in comparison with some similar ones in the literature. The numerical results demonstrate that our algorithm outperforms the state-of-the-art ones in terms of less CPU time, less number of iteration or less number of function evaluation.

Download Full-text

A global information based adaptive threshold for grouping large scale optimization problems

Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO '18 ◽

10.1145/3205455.3205641 ◽

2018 ◽

Author(s):

An Chen ◽

Yipeng Zhang ◽

Zhigang Ren ◽

Yang Yang ◽

Yongsheng Liang ◽

...

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Adaptive Threshold ◽

Global Information ◽

Large Scale Optimization ◽

Scale Optimization

Download Full-text

A modified competitive swarm optimizer for large scale optimization problems

Applied Soft Computing ◽

10.1016/j.asoc.2017.05.060 ◽

2017 ◽

Vol 59 ◽

pp. 340-362 ◽

Cited By ~ 33

Author(s):

Prabhujit Mohapatra ◽

Kedar Nath Das ◽

Santanu Roy

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Large Scale Optimization ◽

Scale Optimization

Download Full-text

Solving Large Scale Optimization Problems in the Transportation Industry and Beyond Through Column Generation

Optimization in Large Scale Problems - Springer Optimization and Its Applications ◽

10.1007/978-3-030-28565-4_23 ◽

2019 ◽

pp. 269-292

Author(s):

Yanqi Xu

Keyword(s):

Column Generation ◽

Large Scale ◽

Optimization Problems ◽

Large Scale Optimization ◽

Transportation Industry ◽

Scale Optimization

Download Full-text

Sufficient descent conjugate gradient methods for large-scale optimization problems

International Journal of Computer Mathematics ◽

10.1080/00207160.2011.592938 ◽

2011 ◽

Vol 88 (16) ◽

pp. 3436-3447 ◽

Cited By ~ 2

Author(s):

Xiuyun Zheng ◽

Hongwei Liu ◽

Aiguo Lu

Keyword(s):

Conjugate Gradient ◽

Large Scale ◽

Optimization Problems ◽

Gradient Methods ◽

Conjugate Gradient Methods ◽

Large Scale Optimization ◽

Sufficient Descent ◽

Scale Optimization

Download Full-text

An Adaptive Multi-Swarm Competition Particle Swarm Optimizer for Large-Scale Optimization

Mathematics ◽

10.3390/math7060521 ◽

2019 ◽

Vol 7 (6) ◽

pp. 521 ◽

Cited By ~ 6

Author(s):

Fanrong Kong ◽

Jianhui Jiang ◽

Yan Huang

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Particle Swarm ◽

The Other ◽

Large Scale Optimization ◽

Particle Swarm Optimizer ◽

Competition Mechanism ◽

Benchmark Suite ◽

Scale Optimization ◽

The One

As a powerful tool in optimization, particle swarm optimizers have been widely applied to many different optimization areas and drawn much attention. However, for large-scale optimization problems, the algorithms exhibit poor ability to pursue satisfactory results due to the lack of ability in diversity maintenance. In this paper, an adaptive multi-swarm particle swarm optimizer is proposed, which adaptively divides a swarm into several sub-swarms and a competition mechanism is employed to select exemplars. In this way, on the one hand, the diversity of exemplars increases, which helps the swarm preserve the exploitation ability. On the other hand, the number of sub-swarms adaptively changes from a large value to a small value, which helps the algorithm make a suitable balance between exploitation and exploration. By employing several peer algorithms, we conducted comparisons to validate the proposed algorithm on a large-scale optimization benchmark suite of CEC 2013. The experiments results demonstrate the proposed algorithm is effective and competitive to address large-scale optimization problems.

Download Full-text