Dynamics of the adaptive natural gradient descent method for soft committee machines

2004 ◽  
Vol 69 (5) ◽  
Author(s):  
Masato Inoue ◽  
Hyeyoung Park ◽  
Masato Okada
2019 ◽  
Vol 9 (21) ◽  
pp. 4568
Author(s):  
Hyeyoung Park ◽  
Kwanyong Lee

Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.


1998 ◽  
Vol 10 (8) ◽  
pp. 2137-2157 ◽  
Author(s):  
Howard Hua Yang ◽  
Shun-ichi Amari

The natural gradient descent method is applied to train an n-m-1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n-m-1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the natural gradient without inverting the Fisher information matrix explicitly. When the input dimension n is much larger than the number of hidden neurons m, the time complexity of computing the natural gradient is O(n).


2020 ◽  
Vol 34 (04) ◽  
pp. 6909-6916 ◽  
Author(s):  
Pu Zhao ◽  
Pin-yu Chen ◽  
Siyue Wang ◽  
Xue Lin

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.


Sign in / Sign up

Export Citation Format

Share Document