Proximal Gradient Methods with Adaptive Subspace Sampling

AbstractFor finite-dimensional problems, stochastic approximation methods have long been used to solve stochastic optimization problems. Their application to infinite-dimensional problems is less understood, particularly for nonconvex objectives. This paper presents convergence results for the stochastic proximal gradient method applied to Hilbert spaces, motivated by optimization problems with partial differential equation (PDE) constraints with random inputs and coefficients. We study stochastic algorithms for nonconvex and nonsmooth problems, where the nonsmooth part is convex and the nonconvex part is the expectation, which is assumed to have a Lipschitz continuous gradient. The optimization variable is an element of a Hilbert space. We show almost sure convergence of strong limit points of the random sequence generated by the algorithm to stationary points. We demonstrate the stochastic proximal gradient algorithm on a tracking-type functional with a $$L^1$$ L 1 -penalty term constrained by a semilinear PDE and box constraints, where input terms and coefficients are subject to uncertainty. We verify conditions for ensuring convergence of the algorithm and show a simulation.

Download Full-text

New inertial proximal gradient methods for unconstrained convex optimization problems

Journal of Inequalities and Applications ◽

10.1186/s13660-020-02522-6 ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Peichao Duan ◽

Yiqun Zhang ◽

Qinxiong Bu

Keyword(s):

Convex Optimization ◽

Optimization Problems ◽

Gradient Methods ◽

Gradient Algorithm ◽

Proximal Gradient Method ◽

Proximal Point ◽

Convex Optimization Problem ◽

Convex Optimization Problems ◽

Acceleration Methods ◽

Weak Convergence Theorem

AbstractThe proximal gradient method is a highly powerful tool for solving the composite convex optimization problem. In this paper, firstly, we propose inexact inertial acceleration methods based on the viscosity approximation and proximal scaled gradient algorithm to accelerate the convergence of the algorithm. Under reasonable parameters, we prove that our algorithms strongly converge to some solution of the problem, which is the unique solution of a variational inequality problem. Secondly, we propose an inexact alternated inertial proximal point algorithm. Under suitable conditions, the weak convergence theorem is proved. Finally, numerical results illustrate the performances of our algorithms and present a comparison with related algorithms. Our results improve and extend the corresponding results reported by many authors recently.

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218.v1 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Numerical experiments on stochastic block proximal-gradient type method for convex constrained optimization involving coordinatewise separable problems

Carpathian Journal of Mathematics ◽

10.37193/cjm.2019.03.11 ◽

2019 ◽

Vol 35 (3) ◽

pp. 371-378

Author(s):

PORNTIP PROMSINCHAI ◽

NARIN PETROT ◽

◽

Keyword(s):

Constrained Optimization ◽

Optimization Problems ◽

Gradient Algorithm ◽

Objective Functions ◽

Constrained Optimization Problems ◽

Type Method ◽

Coordinate Descent Algorithm ◽

Proximal Gradient Algorithm ◽

Separable Problems ◽

Gradient Type

In this paper, we consider convex constrained optimization problems with composite objective functions over the set of a minimizer of another function. The main aim is to test numerically a new algorithm, namely a stochastic block coordinate proximal-gradient algorithm with penalization, by comparing both the number of iterations and CPU times between this introduced algorithm and the other well-known types of block coordinate descent algorithm for finding solutions of the randomly generated optimization problems with regularization term.

Download Full-text

A Modified Three-Term Type CD Conjugate Gradient Algorithm for Unconstrained Optimization Problems

Mathematical Problems in Engineering ◽

10.1155/2020/4381515 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Zhan Wang ◽

Pengyuan Li ◽

Xiangrong Li ◽

Hongtruong Pham

Keyword(s):

Conjugate Gradient Method ◽

Conjugate Gradient ◽

Gradient Method ◽

Optimization Problems ◽

Gradient Methods ◽

Trust Region ◽

Conjugate Gradient Algorithm ◽

Gradient Algorithm ◽

General Function ◽

Term Type

Conjugate gradient methods are well-known methods which are widely applied in many practical fields. CD conjugate gradient method is one of the classical types. In this paper, a modified three-term type CD conjugate gradient algorithm is proposed. Some good features are presented as follows: (i) A modified three-term type CD conjugate gradient formula is presented. (ii) The given algorithm possesses sufficient descent property and trust region property. (iii) The algorithm has global convergence with the modified weak Wolfe–Powell (MWWP) line search technique and projection technique for general function. The new algorithm has made great progress in numerical experiments. It shows that the modified three-term type CD conjugate gradient method is more competitive than the classical CD conjugate gradient method.

Download Full-text

Global convergence of a modified Fletcher–Reeves conjugate gradient method with Wolfe line search

Asian-European Journal of Mathematics ◽

10.1142/s1793557120500813 ◽

2019 ◽

Vol 13 (04) ◽

pp. 2050081

Author(s):

Badreddine Sellami ◽

Mohamed Chiheb Eddine Sellami

Keyword(s):

Conjugate Gradient ◽

Line Search ◽

Optimization Problems ◽

Gradient Methods ◽

Conjugate Gradients ◽

Gradient Algorithm ◽

Conjugacy Condition ◽

Sufficient Descent Condition ◽

Strong Wolfe Line Search ◽

Wolfe Line Search

In this paper, we are concerned with the conjugate gradient methods for solving unconstrained optimization problems. we propose a modified Fletcher–Reeves (abbreviated FR) [Function minimization by conjugate gradients, Comput. J. 7 (1964) 149–154] conjugate gradient algorithm satisfying a parametrized sufficient descent condition with a parameter [Formula: see text] is proposed. The parameter [Formula: see text] is computed by means of the conjugacy condition, thus an algorithm which is a positive multiplicative modification of the Hestenes and Stiefel (abbreviated HS) [Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards Sec. B 48 (1952) 409–436] algorithm is obtained, which produces a descent search direction at every iteration that the line search satisfies the Wolfe conditions. Under appropriate conditions, we show that the modified FR method with the strong Wolfe line search is globally convergent of uniformly convex functions. We also present extensive preliminary numerical experiments to show the efficiency of the proposed method.

Download Full-text

An inexact proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth optimization problems

Journal of Inequalities and Applications ◽

10.1186/s13660-019-2078-7 ◽

2019 ◽

Vol 2019 (1) ◽

Author(s):

Zehui Jia ◽

Zhongming Wu ◽

Xiaomei Dong

Keyword(s):

Nonsmooth Optimization ◽

Optimization Problems ◽

Gradient Algorithm ◽

Proximal Gradient Algorithm ◽

Inexact Proximal ◽

Nonconvex Nonsmooth Optimization

Download Full-text

PROXIMAL ALGORITHMS FOR BI-LEVEL CONVEX OPTIMIZATION PROBLEMS

Journal of Numerical and Applied Mathematics ◽

10.17721/2706-9699.2021.1.19 ◽

2021 ◽

pp. 145-150

Author(s):

A. V. Luita ◽

S. O. Zhilina ◽

V. V. Semenov

Keyword(s):

Weak Convergence ◽

Convex Function ◽

Gradient Method ◽

Optimization Problems ◽

Rates Of Convergence ◽

Convex Minimization ◽

Gradient Algorithm ◽

Penalty Function Method ◽

Proximal Gradient Method ◽

Proximal Algorithms

In this paper, problems of bi-level convex minimization in a Hilbert space are considered. The bi-level convex minimization problem is to minimize the first convex function on the set of minima of the second convex function. This setting has many applications, but the implicit constraints generated by the internal problem make it difficult to obtain optimality conditions and construct algorithms. Multilevel optimization problems are formulated in a similar way, the source of which is the operation research problems (optimization according to sequentially specified criteria or lexicographic optimization). Attention is focused on problem solving using two proximal methods. The main theoretical results are theorems on the convergence of methods in various situations. The first of the methods is obtained by combining the penalty function method and the proximal method. Strong convergence is proved in the case of strong convexity of the function of the exterior problem. In the general case, only weak convergence has been proved. The second, the so-called proximal-gradient method, is a combination of one of the variants of the fast proximal-gradient algorithm with the method of penalty functions. The rates of convergence of the proximal-gradient method and its weak convergence are proved.

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Proximal Gradient Method for Solving Bilevel Optimization Problems

Mathematical and Computational Applications ◽

10.3390/mca25040066 ◽

2020 ◽

Vol 25 (4) ◽

pp. 66

Author(s):

Seifu Endris Yimer ◽

Poom Kumam ◽

Anteneh Getachew Gebrie

Keyword(s):

Optimization Problem ◽

Optimization Problems ◽

Gradient Methods ◽

Split Feasibility Problem ◽

Bilevel Optimization ◽

Feasibility Problem ◽

Proximal Gradient Method ◽

Level Problem ◽

Upper Level ◽

The Split Feasibility Problem

In this paper, we consider a bilevel optimization problem as a task of finding the optimum of the upper-level problem subject to the solution set of the split feasibility problem of fixed point problems and optimization problems. Based on proximal and gradient methods, we propose a strongly convergent iterative algorithm with an inertia effect solving the bilevel optimization problem under our consideration. Furthermore, we present a numerical example of our algorithm to illustrate its applicability.

Download Full-text