scholarly journals Delay-Adaptive Distributed Stochastic Optimization

2020 ◽  
Vol 34 (04) ◽  
pp. 5503-5510
Author(s):  
Zhaolin Ren ◽  
Zhengyuan Zhou ◽  
Linhai Qiu ◽  
Ajay Deshpande ◽  
Jayant Kalagnanam

In large-scale optimization problems, distributed asynchronous stochastic gradient descent (DASGD) is a commonly used algorithm. In most applications, there are often a large number of computing nodes asynchronously computing gradient information. As such, the gradient information received at a given iteration is often stale. In the presence of such delays, which can be unbounded, the convergence of DASGD is uncertain. The contribution of this paper is twofold. First, we propose a delay-adaptive variant of DASGD where we adjust each iteration's step-size based on the size of the delay, and prove asymptotic convergence of the algorithm on variationally coherent stochastic problems, a class of functions which properly includes convex, quasi-convex and star-convex functions. Second, we extend the convergence results of standard DASGD, used usually for problems with bounded domains, to problems with unbounded domains. In this way, we extend the frontier of theoretical guarantees for distributed asynchronous optimization, and provide new insights for practitioners working on large-scale optimization problems.

2018 ◽  
Vol 61 (1) ◽  
pp. 76-98 ◽  
Author(s):  
TING LI ◽  
ZHONG WAN

We propose a new adaptive and composite Barzilai–Borwein (BB) step size by integrating the advantages of such existing step sizes. Particularly, the proposed step size is an optimal weighted mean of two classical BB step sizes and the weights are updated at each iteration in accordance with the quality of the classical BB step sizes. Combined with the steepest descent direction, the adaptive and composite BB step size is incorporated into the development of an algorithm such that it is efficient to solve large-scale optimization problems. We prove that the developed algorithm is globally convergent and it R-linearly converges when applied to solve strictly convex quadratic minimization problems. Compared with the state-of-the-art algorithms available in the literature, the proposed step size is more efficient in solving ill-posed or large-scale benchmark test problems.


Author(s):  
Zhengyuan Zhou ◽  
Panayotis Mertikopoulos ◽  
Nicholas Bambos ◽  
Peter Glynn ◽  
Yinyu Ye

The recent surge of breakthroughs in machine learning and artificial intelligence has sparked renewed interest in large-scale stochastic optimization problems that are universally considered hard. One of the most widely used methods for solving such problems is distributed asynchronous stochastic gradient descent (DASGD), a family of algorithms that result from parallelizing stochastic gradient descent on distributed computing architectures (possibly) asychronously. However, a key obstacle in the efficient implementation of DASGD is the issue of delays: when a computing node contributes a gradient update, the global model parameter may have already been updated by other nodes several times over, thereby rendering this gradient information stale. These delays can quickly add up if the computational throughput of a node is saturated, so the convergence of DASGD may be compromised in the presence of large delays. Our first contribution is that, by carefully tuning the algorithm’s step size, convergence to the critical set is still achieved in mean square, even if the delays grow unbounded at a polynomial rate. We also establish finer results in a broad class of structured optimization problems (called variationally coherent), where we show that DASGD converges to a global optimum with a probability of one under the same delay assumptions. Together, these results contribute to the broad landscape of large-scale nonconvex stochastic optimization by offering state-of-the-art theoretical guarantees and providing insights for algorithm design.


Author(s):  
Andrew Jacobsen ◽  
Matthew Schlegel ◽  
Cameron Linke ◽  
Thomas Degris ◽  
Adam White ◽  
...  

This paper investigates different vector step-size adaptation approaches for non-stationary online, continual prediction problems. Vanilla stochastic gradient descent can be considerably improved by scaling the update with a vector of appropriately chosen step-sizes. Many methods, including AdaGrad, RMSProp, and AMSGrad, keep statistics about the learning process to approximate a second order update—a vector approximation of the inverse Hessian. Another family of approaches use meta-gradient descent to adapt the stepsize parameters to minimize prediction error. These metadescent strategies are promising for non-stationary problems, but have not been as extensively explored as quasi-second order methods. We first derive a general, incremental metadescent algorithm, called AdaGain, designed to be applicable to a much broader range of algorithms, including those with semi-gradient updates or even those with accelerations, such as RMSProp. We provide an empirical comparison of methods from both families. We conclude that methods from both families can perform well, but in non-stationary prediction problems the meta-descent methods exhibit advantages. Our method is particularly robust across several prediction problems, and is competitive with the state-of-the-art method on a large-scale, time-series prediction problem on real data from a mobile robot.


Author(s):  
Jie Guo ◽  
Zhong Wan

A new spectral three-term conjugate gradient algorithm in virtue of the Quasi-Newton equation is developed for solving large-scale unconstrained optimization problems. It is proved that the search directions in this algorithm always satisfy a sufficiently descent condition independent of any line search. Global convergence is established for general objective functions if the strong Wolfe line search is used. Numerical experiments are employed to show its high numerical performance in solving large-scale optimization problems. Particularly, the developed algorithm is implemented to solve the 100 benchmark test problems from CUTE with different sizes from 1000 to 10,000, in comparison with some similar ones in the literature. The numerical results demonstrate that our algorithm outperforms the state-of-the-art ones in terms of less CPU time, less number of iteration or less number of function evaluation.


2017 ◽  
Vol 59 ◽  
pp. 340-362 ◽  
Author(s):  
Prabhujit Mohapatra ◽  
Kedar Nath Das ◽  
Santanu Roy

Mathematics ◽  
2019 ◽  
Vol 7 (6) ◽  
pp. 521 ◽  
Author(s):  
Fanrong Kong ◽  
Jianhui Jiang ◽  
Yan Huang

As a powerful tool in optimization, particle swarm optimizers have been widely applied to many different optimization areas and drawn much attention. However, for large-scale optimization problems, the algorithms exhibit poor ability to pursue satisfactory results due to the lack of ability in diversity maintenance. In this paper, an adaptive multi-swarm particle swarm optimizer is proposed, which adaptively divides a swarm into several sub-swarms and a competition mechanism is employed to select exemplars. In this way, on the one hand, the diversity of exemplars increases, which helps the swarm preserve the exploitation ability. On the other hand, the number of sub-swarms adaptively changes from a large value to a small value, which helps the algorithm make a suitable balance between exploitation and exploration. By employing several peer algorithms, we conducted comparisons to validate the proposed algorithm on a large-scale optimization benchmark suite of CEC 2013. The experiments results demonstrate the proposed algorithm is effective and competitive to address large-scale optimization problems.


Sign in / Sign up

Export Citation Format

Share Document