Dynamics of the adaptive natural gradient descent method for soft committee machines

Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.

Download Full-text

Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons

Neural Computation ◽

10.1162/089976698300017007 ◽

1998 ◽

Vol 10 (8) ◽

pp. 2137-2157 ◽

Cited By ~ 36

Author(s):

Howard Hua Yang ◽

Shun-ichi Amari

Keyword(s):

Fisher Information ◽

Multilayer Perceptron ◽

Gradient Descent ◽

Fisher Information Matrix ◽

Information Matrix ◽

Descent Method ◽

Multilayer Perceptrons ◽

Gradient Descent Method ◽

Natural Gradient ◽

Hidden Neurons

The natural gradient descent method is applied to train an n-m-1 multilayer perceptron. Based on an efficient scheme to represent the Fisher information matrix for an n-m-1 stochastic multilayer perceptron, a new algorithm is proposed to calculate the natural gradient without inverting the Fisher information matrix explicitly. When the input dimension n is much larger than the number of hidden neurons m, the time complexity of computing the natural gradient is O(n).

Download Full-text

Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6173 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6909-6916 ◽

Cited By ~ 2

Author(s):

Pu Zhao ◽

Pin-yu Chen ◽

Siyue Wang ◽

Xue Lin

Keyword(s):

Gradient Descent ◽

State Of The Art ◽

High Reliability ◽

High Sensitivity ◽

Black Box ◽

Descent Method ◽

Learning Performance ◽

Gradient Descent Method ◽

Natural Gradient ◽

Zeroth Order

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-of-the-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zeroth-order gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

Download Full-text