Convergence Analysis of Gradient Descent for Eigenvector Computation

We present a novel, simple and systematic convergence analysis of gradient descent for eigenvector computation. As a popular, practical, and provable approach to numerous machine learning problems, gradient descent has found successful applications to eigenvector computation as well. However, surprisingly, it lacks a thorough theoretical analysis for the underlying geodesically non-convex problem. In this work, the convergence of the gradient descent solver for the leading eigenvector computation is shown to be at a global rate O(min{ (lambda_1/Delta_p)^2 log(1/epsilon), 1/epsilon }), where Delta_p=lambda_p-lambda_p+1>0 represents the generalized positive eigengap and always exists without loss of generality with lambda_i being the i-th largest eigenvalue of the given real symmetric matrix and p being the multiplicity of lambda_1. The rate is linear at (lambda_1/Delta_p)^2 log(1/epsilon) if (lambda_1/Delta_p)^2=O(1), otherwise sub-linear at O(1/epsilon). We also show that the convergence only logarithmically instead of quadratically depends on the initial iterate. Particularly, this is the first time the linear convergence for the case that the conventionally considered eigengap Delta_1= lambda_1 - lambda_2=0 but the generalized eigengap Delta_p satisfies (lambda_1/Delta_p)^2=O(1), as well as the logarithmic dependence on the initial iterate are established for the gradient descent solver. We are also the first to leverage for analysis the log principal angle between the iterate and the space of globally optimal solutions. Theoretical properties are verified in experiments.

Download Full-text

Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix Completion via Gradient Descent

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413419 ◽

2021 ◽

Author(s):

Trung Vu ◽

Raviv Raich

Keyword(s):

Convergence Rate ◽

Gradient Descent ◽

Symmetric Matrix ◽

Matrix Completion ◽

Linear Convergence ◽

Low Rank ◽

Linear Convergence Rate ◽

Rate Analysis ◽

Convergence Rate Analysis

Download Full-text

Cost-sensitive design of error correcting output codes

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406217709303 ◽

2017 ◽

Vol 232 (10) ◽

pp. 1871-1881

Author(s):

Aijun Xue ◽

Xiaodan Wang

Keyword(s):

Classification Problem ◽

Learning Problems ◽

Classification Problems ◽

Cost Sensitive Learning ◽

Misclassification Costs ◽

Real World Applications ◽

Cost Sensitive Classification ◽

Comparable Performance ◽

The Given ◽

Error Correcting Output Codes

Many real world applications involve multiclass cost-sensitive learning problems. However, some well-worked binary cost-sensitive learning algorithms cannot be extended into multiclass cost-sensitive learning directly. It is meaningful to decompose the complex multiclass cost-sensitive classification problem into a series of binary cost-sensitive classification problems. So, in this paper we propose an alternative and efficient decomposition framework, using the original error correcting output codes. The main problem in our framework is how to evaluate the binary costs for each binary cost-sensitive base classifier. To solve this problem, we proposed to compute the expected misclassification costs starting from the given multiclass cost matrix. Furthermore, the general formulations to compute the binary costs are given. Experimental results on several synthetic and UCI datasets show that our method can obtain comparable performance in comparison with the state-of-the-art methods.

Download Full-text

New Orthogonal Transforms for Signal and Image Processing

Applied Sciences ◽

10.3390/app11167433 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7433

Author(s):

Andrzej Dziech

Keyword(s):

Symmetric Matrix ◽

Sparse Matrices ◽

Orthogonal Expansion ◽

Orthogonal Matrices ◽

Exponential Form ◽

Orthogonal Transforms ◽

Hadamard Transforms ◽

The Given ◽

Transform Domains ◽

Signal Approximation

In the paper, orthogonal transforms based on proposed symmetric, orthogonal matrices are created. These transforms can be considered as generalized Walsh–Hadamard Transforms. The simplicity of calculating the forward and inverse transforms is one of the important features of the presented approach. The conditions for creating symmetric, orthogonal matrices are defined. It is shown that for the selection of the elements of an orthogonal matrix that meets the given conditions, it is necessary to select only a limited number of elements. The general form of the orthogonal, symmetric matrix having an exponential form is also presented. Orthogonal basis functions based on the created matrices can be used for orthogonal expansion leading to signal approximation. An exponential form of orthogonal, sparse matrices with variable parameters is also created. Various versions of orthogonal transforms related to the created full and sparse matrices are proposed. Fast computation of the presented transforms in comparison to fast algorithms of selected orthogonal transforms is discussed. Possible applications for signal approximation and examples of image spectrum in the considered transform domains are presented.

Download Full-text

Diagonally scaled memoryless quasi–Newton methods with application to compressed sensing

Journal of Industrial and Management Optimization ◽

10.3934/jimo.2021191 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0

Author(s):

Zohre Aminifard ◽

Saman Babaie-Kafaki

Keyword(s):

Compressed Sensing ◽

Convergence Analysis ◽

Newton Methods ◽

The Given ◽

Quasi Newton

<p style='text-indent:20px;'>Memoryless quasi–Newton updating formulas of BFGS (Broyden–Fletcher–Goldfarb–Shanno) and DFP (Davidon–Fletcher–Powell) are scaled using well-structured diagonal matrices. In the scaling approach, diagonal elements as well as eigenvalues of the scaled memoryless quasi–Newton updating formulas play significant roles. Convergence analysis of the given diagonally scaled quasi–Newton methods is discussed. At last, performance of the methods is numerically tested on a set of CUTEr problems as well as the compressed sensing problem.</p>

Download Full-text

Adaptive Uzawa algorithm for the Stokes equation

ESAIM Mathematical Modelling and Numerical Analysis ◽

10.1051/m2an/2019039 ◽

2019 ◽

Vol 53 (6) ◽

pp. 1841-1870

Author(s):

Giovanni Di Fratta ◽

Thomas Führer ◽

Gregor Gantner ◽

Dirk Praetorius

Keyword(s):

Stokes System ◽

Total Error ◽

Linear Convergence ◽

Adaptive Finite Element Method ◽

Adaptive Finite Element ◽

Prior Work ◽

Uzawa Algorithm ◽

Interior Node ◽

The Given ◽

The Stokes Equation

Based on the Uzawa algorithm, we consider an adaptive finite element method for the Stokes system. We prove linear convergence with optimal algebraic rates for the residual estimator (which is equivalent to the total error), if the arising linear systems are solved iteratively,e.g., by PCG. Our analysis avoids the use of discrete efficiency of the estimator. Unlike prior work, our adaptive Uzawa algorithm can thus avoid to discretize the given data and does not rely on an interior node property for the refinement.

Download Full-text

Convergence analysis of time-discretisation schemes for rate-independent systems

ESAIM Control Optimisation and Calculus of Variations ◽

10.1051/cocv/2018048 ◽

2019 ◽

Vol 25 ◽

pp. 65 ◽

Cited By ~ 5

Author(s):

Dorothee Knees

Keyword(s):

Convergence Analysis ◽

Numerical Schemes ◽

Solution Concepts ◽

Continuous Solutions ◽

Energy Functionals ◽

Convergence Of Solutions ◽

Time Discretisation ◽

Energetic Solutions ◽

Nonconvex Energy ◽

The Given

It is well known that rate-independent systems involving nonconvex energy functionals in general do not allow for time-continuous solutions even if the given data are smooth. In the last years, several solution concepts were proposed that include discontinuities in the notion of solution, among them the class of global energetic solutions and the class of BV-solutions. In general, these solution concepts are not equivalent and numerical schemes are needed that reliably approximate that type of solutions one is interested in. In this paper, we analyse the convergence of solutions of three time-discretisation schemes, namely an approach based on local minimisation, a relaxed version of it and an alternate minimisation scheme. For all three cases, we show that under suitable conditions on the discretisation parameters discrete solutions converge to limit functions that belong to the class of BV-solutions. The proofs rely on a reparametrisation argument. We illustrate the different schemes with a toy example.

Download Full-text

ACCELERATED LEARNING BY ACTIVE EXAMPLE SELECTION

International Journal of Neural Systems ◽

10.1142/s0129065794000086 ◽

1994 ◽

Vol 05 (01) ◽

pp. 67-75 ◽

Cited By ~ 32

Author(s):

BYOUNG-TAK ZHANG

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithm ◽

Accelerated Learning ◽

Training Set ◽

Alternative Approach ◽

Speed Up ◽

Multilayer Neural Networks ◽

Training Examples ◽

The Given

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.

Download Full-text

The geometrical and algebraic interpretations of Euler–Rodrigues formula in Minkowski 3-space

International Journal of Geometric Methods in Modern Physics ◽

10.1142/s0219887816501164 ◽

2016 ◽

Vol 13 (10) ◽

pp. 1650116 ◽

Cited By ~ 1

Author(s):

Derya Kahvecí ◽

Yusuf Yayli ◽

Ísmaíl Gök

Keyword(s):

Rotation Angle ◽

Rotation Axis ◽

Symmetric Matrix ◽

Unit Length ◽

Exponential Map ◽

Spatial Displacement ◽

Split Quaternions ◽

Rodrigues Formula ◽

Lightlike Axis ◽

The Given

The aim of this paper is to give the geometrical and algebraic interpretations of Euler–Rodrigues formula in Minkowski 3-space. First, for the given non-lightlike axis of a unit length in [Formula: see text] and angle, the spatial displacement is represented by a [Formula: see text] semi-orthogonal rotation matrix using orthogonal projection. Second, we obtain the classifications of Euler–Rodrigues formula in terms of semi-skew-symmetric matrix corresponds to spacelike, timelike or lightlike axis and rotation angle with the help of exponential map. Finally, an alternative method is given to find rotation axis and the Euler–Rodrigues formula is expressed via split quaternions in Minkowski 3-space.

Download Full-text

Relatively Inexact Proximal Point Algorithm and Linear Convergence Analysis

International Journal of Mathematics and Mathematical Sciences ◽

10.1155/2009/691952 ◽

2009 ◽

Vol 2009 ◽

pp. 1-11

Author(s):

Ram U. Verma

Keyword(s):

Convergence Analysis ◽

Proximal Point Algorithm ◽

Evolution Equations ◽

Real Hilbert Space ◽

Linear Convergence ◽

Proximal Point ◽

Evolution Inclusions ◽

Inclusion Problems ◽

Set Valued Mapping ◽

Inexact Proximal

Based on a notion ofrelatively maximal(m)-relaxed monotonicity, the approximation solvability of a general class of inclusion problems is discussed, while generalizing Rockafellar's theorem (1976) on linear convergence using the proximal point algorithm in a real Hilbert space setting. Convergence analysis, based on this new model, is simpler and compact than that of the celebrated technique of Rockafellar in which the Lipschitz continuity at 0 of the inverse of the set-valued mapping is applied. Furthermore, it can be used to generalize the Yosida approximation, which, in turn, can be applied to first-order evolution equations as well as evolution inclusions.

Download Full-text

Neural spiking for causal inference

10.1101/253351 ◽

2018 ◽

Cited By ~ 3

Author(s):

Benjamin James Lansdell ◽

Konrad Paul Kording

Keyword(s):

Membrane Potential ◽

Causal Inference ◽

Gradient Descent ◽

Unbiased Estimate ◽

Learning Problems ◽

Causal Influence ◽

Neural Spiking ◽

Local Discontinuity

AbstractWhen a neuron is driven beyond its threshold it spikes, and the fact that it does not communicate its continuous membrane potential is usually seen as a computational liability. Here we show that this spiking mechanism allows neurons to produce an unbiased estimate of their causal influence, and a way of approximating gradient descent learning. Importantly, neither activity of upstream neurons, which act as confounders, nor downstream non-linearities bias the results. By introducing a local discontinuity with respect to their input drive, we show how spiking enables neurons to solve causal estimation and learning problems.

Download Full-text