scholarly journals Convergence Analysis of Gradient Descent for Eigenvector Computation

Author(s):  
Zhiqiang Xu ◽  
Xin Cao ◽  
Xin Gao

We present a novel, simple and systematic convergence analysis of gradient descent for eigenvector computation. As a popular, practical, and provable approach to numerous machine learning problems, gradient descent has found successful applications to eigenvector computation as well. However, surprisingly, it lacks a thorough theoretical analysis for the underlying geodesically non-convex problem. In this work, the convergence of the gradient descent solver for the leading eigenvector computation is shown to be at a global rate O(min{ (lambda_1/Delta_p)^2 log(1/epsilon), 1/epsilon }), where Delta_p=lambda_p-lambda_p+1>0 represents the generalized positive eigengap and always exists without loss of generality with lambda_i being the i-th largest eigenvalue of the given real symmetric matrix and p being the multiplicity of lambda_1. The rate is linear at (lambda_1/Delta_p)^2 log(1/epsilon) if (lambda_1/Delta_p)^2=O(1), otherwise sub-linear at O(1/epsilon). We also show that the convergence only logarithmically instead of quadratically depends on the initial iterate. Particularly, this is the first time the linear convergence for the case that the conventionally considered eigengap Delta_1= lambda_1 - lambda_2=0 but the generalized eigengap Delta_p satisfies (lambda_1/Delta_p)^2=O(1), as well as the logarithmic dependence on the initial iterate are established for the gradient descent solver. We are also the first to leverage for analysis the log principal angle between the iterate and the space of globally optimal solutions. Theoretical properties are verified in experiments.

Author(s):  
Aijun Xue ◽  
Xiaodan Wang

Many real world applications involve multiclass cost-sensitive learning problems. However, some well-worked binary cost-sensitive learning algorithms cannot be extended into multiclass cost-sensitive learning directly. It is meaningful to decompose the complex multiclass cost-sensitive classification problem into a series of binary cost-sensitive classification problems. So, in this paper we propose an alternative and efficient decomposition framework, using the original error correcting output codes. The main problem in our framework is how to evaluate the binary costs for each binary cost-sensitive base classifier. To solve this problem, we proposed to compute the expected misclassification costs starting from the given multiclass cost matrix. Furthermore, the general formulations to compute the binary costs are given. Experimental results on several synthetic and UCI datasets show that our method can obtain comparable performance in comparison with the state-of-the-art methods.


2021 ◽  
Vol 11 (16) ◽  
pp. 7433
Author(s):  
Andrzej Dziech

In the paper, orthogonal transforms based on proposed symmetric, orthogonal matrices are created. These transforms can be considered as generalized Walsh–Hadamard Transforms. The simplicity of calculating the forward and inverse transforms is one of the important features of the presented approach. The conditions for creating symmetric, orthogonal matrices are defined. It is shown that for the selection of the elements of an orthogonal matrix that meets the given conditions, it is necessary to select only a limited number of elements. The general form of the orthogonal, symmetric matrix having an exponential form is also presented. Orthogonal basis functions based on the created matrices can be used for orthogonal expansion leading to signal approximation. An exponential form of orthogonal, sparse matrices with variable parameters is also created. Various versions of orthogonal transforms related to the created full and sparse matrices are proposed. Fast computation of the presented transforms in comparison to fast algorithms of selected orthogonal transforms is discussed. Possible applications for signal approximation and examples of image spectrum in the considered transform domains are presented.


2021 ◽  
Vol 0 (0) ◽  
pp. 0
Author(s):  
Zohre Aminifard ◽  
Saman Babaie-Kafaki

<p style='text-indent:20px;'>Memoryless quasi–Newton updating formulas of BFGS (Broyden–Fletcher–Goldfarb–Shanno) and DFP (Davidon–Fletcher–Powell) are scaled using well-structured diagonal matrices. In the scaling approach, diagonal elements as well as eigenvalues of the scaled memoryless quasi–Newton updating formulas play significant roles. Convergence analysis of the given diagonally scaled quasi–Newton methods is discussed. At last, performance of the methods is numerically tested on a set of CUTEr problems as well as the compressed sensing problem.</p>


2019 ◽  
Vol 53 (6) ◽  
pp. 1841-1870
Author(s):  
Giovanni Di Fratta ◽  
Thomas Führer ◽  
Gregor Gantner ◽  
Dirk Praetorius

Based on the Uzawa algorithm, we consider an adaptive finite element method for the Stokes system. We prove linear convergence with optimal algebraic rates for the residual estimator (which is equivalent to the total error), if the arising linear systems are solved iteratively,e.g., by PCG. Our analysis avoids the use of discrete efficiency of the estimator. Unlike prior work, our adaptive Uzawa algorithm can thus avoid to discretize the given data and does not rely on an interior node property for the refinement.


2019 ◽  
Vol 25 ◽  
pp. 65 ◽  
Author(s):  
Dorothee Knees

It is well known that rate-independent systems involving nonconvex energy functionals in general do not allow for time-continuous solutions even if the given data are smooth. In the last years, several solution concepts were proposed that include discontinuities in the notion of solution, among them the class of global energetic solutions and the class of BV-solutions. In general, these solution concepts are not equivalent and numerical schemes are needed that reliably approximate that type of solutions one is interested in. In this paper, we analyse the convergence of solutions of three time-discretisation schemes, namely an approach based on local minimisation, a relaxed version of it and an alternate minimisation scheme. For all three cases, we show that under suitable conditions on the discretisation parameters discrete solutions converge to limit functions that belong to the class of BV-solutions. The proofs rely on a reparametrisation argument. We illustrate the different schemes with a toy example.


1994 ◽  
Vol 05 (01) ◽  
pp. 67-75 ◽  
Author(s):  
BYOUNG-TAK ZHANG

Much previous work on training multilayer neural networks has attempted to speed up the backpropagation algorithm using more sophisticated weight modification rules, whereby all the given training examples are used in a random or predetermined sequence. In this paper we investigate an alternative approach in which the learning proceeds on an increasing number of selected training examples, starting with a small training set. We derive a measure of criticality of examples and present an incremental learning algorithm that uses this measure to select a critical subset of given examples for solving the particular task. Our experimental results suggest that the method can significantly improve training speed and generalization performance in many real applications of neural networks. This method can be used in conjunction with other variations of gradient descent algorithms.


2016 ◽  
Vol 13 (10) ◽  
pp. 1650116 ◽  
Author(s):  
Derya Kahvecí ◽  
Yusuf Yayli ◽  
Ísmaíl Gök

The aim of this paper is to give the geometrical and algebraic interpretations of Euler–Rodrigues formula in Minkowski 3-space. First, for the given non-lightlike axis of a unit length in [Formula: see text] and angle, the spatial displacement is represented by a [Formula: see text] semi-orthogonal rotation matrix using orthogonal projection. Second, we obtain the classifications of Euler–Rodrigues formula in terms of semi-skew-symmetric matrix corresponds to spacelike, timelike or lightlike axis and rotation angle with the help of exponential map. Finally, an alternative method is given to find rotation axis and the Euler–Rodrigues formula is expressed via split quaternions in Minkowski 3-space.


2009 ◽  
Vol 2009 ◽  
pp. 1-11
Author(s):  
Ram U. Verma

Based on a notion ofrelatively maximal(m)-relaxed monotonicity, the approximation solvability of a general class of inclusion problems is discussed, while generalizing Rockafellar's theorem (1976) on linear convergence using the proximal point algorithm in a real Hilbert space setting. Convergence analysis, based on this new model, is simpler and compact than that of the celebrated technique of Rockafellar in which the Lipschitz continuity at 0 of the inverse of the set-valued mapping is applied. Furthermore, it can be used to generalize the Yosida approximation, which, in turn, can be applied to first-order evolution equations as well as evolution inclusions.


2018 ◽  
Author(s):  
Benjamin James Lansdell ◽  
Konrad Paul Kording

AbstractWhen a neuron is driven beyond its threshold it spikes, and the fact that it does not communicate its continuous membrane potential is usually seen as a computational liability. Here we show that this spiking mechanism allows neurons to produce an unbiased estimate of their causal influence, and a way of approximating gradient descent learning. Importantly, neither activity of upstream neurons, which act as confounders, nor downstream non-linearities bias the results. By introducing a local discontinuity with respect to their input drive, we show how spiking enables neurons to solve causal estimation and learning problems.


Sign in / Sign up

Export Citation Format

Share Document