Adaptive Natural Gradient Method for Learning of Stochastic Neural Networks in Mini-Batch Mode

Gradient descent method is an essential algorithm for learning of neural networks. Among diverse variations of gradient descent method that have been developed for accelerating learning speed, the natural gradient learning is based on the theory of information geometry on stochastic neuromanifold, and is known to have ideal convergence properties. Despite its theoretical advantages, the pure natural gradient has some limitations that prevent its practical usage. In order to get the explicit value of the natural gradient, it is required to know true probability distribution of input variables, and to calculate inverse of a matrix with the square size of the number of parameters. Though an adaptive estimation of the natural gradient has been proposed as a solution, it was originally developed for online learning mode, which is computationally inefficient for the learning of large data set. In this paper, we propose a novel adaptive natural gradient estimation for mini-batch learning mode, which is commonly adopted for big data analysis. For two representative stochastic neural network models, we present explicit rules of parameter updates and learning algorithm. Through experiments on three benchmark problems, we confirm that the proposed method has superior convergence properties to the conventional methods.

Download Full-text

SOME METHODS OF ADAPTIVE MULTILAYER NEURAL NETWORKS TRAINING

International Journal of Computing ◽

10.47839/ijc.3.1.259 ◽

2014 ◽

pp. 99-106

Author(s):

Leonid Makhnist ◽

Nikolaj Maniakov ◽

Nikolaj Maniakov

Keyword(s):

Neural Networks ◽

Basic Concept ◽

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method ◽

New Techniques ◽

Adaptive Training ◽

Multilayer Neural Networks

Is proposed two new techniques for multilayer neural networks training. Its basic concept is based on the gradient descent method. For every methodic are showed formulas for calculation of the adaptive training steps. Presented matrix algorithmizations for all of these techniques are very helpful in its program realization.

Download Full-text

Meteorological Data Forecast using RNN

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch050 ◽

2020 ◽

pp. 905-920

Author(s):

Stefan Balluff ◽

Jörg Bendfeld ◽

Stefan Krauter

Keyword(s):

Neural Networks ◽

Wind Speed ◽

Linear Prediction ◽

Learning Algorithm ◽

Meteorological Data ◽

System Modeling ◽

Descent Method ◽

Gradient Descent Method ◽

Earth System Modeling ◽

Set Up

Gathering knowledge not only of the current but also the upcoming wind speed is getting more and more important as the experience of operating and maintaining wind turbines is increasing. Not only with regards to operation and maintenance tasks such as gearbox and generator checks but moreover due to the fact that energy providers have to sell the right amount of their converted energy at the European energy markets, the knowledge of the wind and hence electrical power of the next day is of key importance. Selling more energy as has been offered is penalized as well as offering less energy as contractually promised. In addition to that the price per offered kWh decreases in case of a surplus of energy. Achieving a forecast there are various methods in computer science: fuzzy logic, linear prediction or neural networks. This paper presents current results of wind speed forecasts using recurrent neural networks (RNN) and the gradient descent method plus a backpropagation learning algorithm. Data used has been extracted from NASA's Modern Era-Retrospective analysis for Research and Applications (MERRA) which is calculated by a GEOS-5 Earth System Modeling and Data Assimilation system. The presented results show that wind speed data can be forecasted using historical data for training the RNN. Nevertheless, the current set up system lacks robustness and can be improved further with regards to accuracy.

Download Full-text

INITIAL IMPROVEMENT OF THE HYBRID ACCELERATED GRADIENT DESCENT PROCESS

Bulletin of the Australian Mathematical Society ◽

10.1017/s0004972718000552 ◽

2018 ◽

Vol 98 (2) ◽

pp. 331-338 ◽

Cited By ~ 3

Author(s):

STEFAN PANIĆ ◽

MILENA J. PETROVIĆ ◽

MIROSLAVA MIHAJLOV CAREVIĆ

Keyword(s):

Gradient Descent ◽

Line Search ◽

Initial Step ◽

Step Length ◽

Descent Method ◽

Gradient Descent Method ◽

Convergence Properties ◽

Backtracking Line Search ◽

Initial Improvement ◽

Accelerated Gradient

We improve the convergence properties of the iterative scheme for solving unconstrained optimisation problems introduced in Petrovic et al. [‘Hybridization of accelerated gradient descent method’, Numer. Algorithms (2017), doi:10.1007/s11075-017-0460-4] by optimising the value of the initial step length parameter in the backtracking line search procedure. We prove the validity of the algorithm and illustrate its advantages by numerical experiments and comparisons.

Download Full-text

Dynamics of the adaptive natural gradient descent method for soft committee machines

Physical Review E ◽

10.1103/physreve.69.056120 ◽

2004 ◽

Vol 69 (5) ◽

Cited By ~ 3

Author(s):

Masato Inoue ◽

Hyeyoung Park ◽

Masato Okada

Keyword(s):

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method ◽

Natural Gradient

Download Full-text

Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information

Neural Computation ◽

10.1162/neco.1997.9.7.1457 ◽

1997 ◽

Vol 9 (7) ◽

pp. 1457-1482 ◽

Cited By ~ 218

Author(s):

Howard Hua Yang ◽

Shun-ichi Amari

Keyword(s):

Mutual Information ◽

Maximum Entropy ◽

Learning Algorithm ◽

Adaptive Method ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Natural Gradient ◽

Blind Separation ◽

Efficient Learning

There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

Download Full-text

Comparison of gradient descent method, Kalman filtering and decoupled kalman in training neural networks used for fingerprint-based positioning

IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004 ◽

10.1109/vetecf.2004.1404859 ◽

2005 ◽

Cited By ~ 6

Author(s):

C.M. Takenga ◽

K. Rao Anne ◽

K. Kyamakya ◽

J. Chamberlain Chedjou

Keyword(s):

Neural Networks ◽

Kalman Filtering ◽

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method

Download Full-text

A Pre-Trained Fuzzy Reinforcement Learning Method for the Pursuing Satellite in a One-to-One Game in Space

Sensors ◽

10.3390/s20082253 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2253

Author(s):

Xiao Wang ◽

Peng Shi ◽

Yushan Zhao ◽

Yue Sun

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Control Policy ◽

Descent Method ◽

Gradient Descent Method ◽

One To One ◽

Inference Systems ◽

First Time

In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.

Download Full-text

Globally convergent stochastic optimization with optimal asymptotic distribution

Journal of Applied Probability ◽

10.1017/s0021900200015023 ◽

1998 ◽

Vol 35 (02) ◽

pp. 395-406 ◽

Cited By ~ 3

Author(s):

Jürgen Dippon

Keyword(s):

Neural Networks ◽

Stochastic Optimization ◽

Asymptotic Distribution ◽

Gradient Descent ◽

Likelihood Estimation ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Optimal Convergence Rate ◽

Globally Convergent

A stochastic gradient descent method is combined with a consistent auxiliary estimate to achieve global convergence of the recursion. Using step lengths converging to zero slower than 1/n and averaging the trajectories, yields the optimal convergence rate of 1/√n and the optimal variance of the asymptotic distribution. Possible applications can be found in maximum likelihood estimation, regression analysis, training of artificial neural networks, and stochastic optimization.

Download Full-text

IDENTIFICATION OF AREAS OF CORONAVIRUS COVID-19 INCIDENCE SPREADING BASED ON CLUSTER ANALYSIS METHOD

Innovative technologies and scientific solutions for industries ◽

10.30837/itssi.2021.15.005 ◽

2021 ◽

pp. 5-13

Author(s):

Kseniia Bazilevych ◽

Ievgen Meniailov ◽

Dmytro Chumachenko

Keyword(s):

Neural Network ◽

Neural Networks ◽

Cluster Analysis ◽

Data Analysis ◽

Gradient Descent ◽

Descent Method ◽

Gradient Descent Method ◽

Adaptive Boosting ◽

Software Product ◽

Boosting Method

Subject: the use of the mathematical apparatus of neural networks for the scientific substantiation of anti-epidemic measures in order to reduce the incidence of diseases when making effective management decisions. Purpose: to apply cluster analysis, based on a neural network, to solve the problem of identifying areas of incidence. Tasks: to analyze methods of data analysis to solve the clustering problem; to develop a neural network method for clustering the territory of Ukraine according to the nature of the epidemic process COVID-19; on the basis of the developed method, to implement a data analysis software product to identify the areas of incidence of the disease using the example of the coronavirus COVID-19. Methods: models and methods of data analysis, models and methods of systems theory (based on the information approach), machine learning methods, in particular the Adaptive Boosting method (based on the gradient descent method), methods for training neural networks. Results: we used the data of the Center for Public Health of the Ministry of Health of Ukraine distributed over the regions of Ukraine on the incidence of COVID-19, the number of laboratory examined persons, the number of laboratory tests performed by PCR and ELISA methods, the number of laboratory tests of IgA, IgM, IgG; the model used data from March 2020 to December 2020, the modeling did not take into account data from the temporarily occupied territories of Ukraine; for cluster analysis, a neural network of 60 input neurons, 100 hidden neurons with an activation Fermi function and 4 output neurons was built; for the software implementation of the model, the programming language Python was used. Conclusions: analysis of methods for constructing neural networks; analysis of training methods for neural networks, including the use of the gradient descent method for the Adaptive Boosting method; all theoretical information described in this work was used to implement a software product for processing test data for COVID-19 in Ukraine; the division of the regions of Ukraine into zones of infection with the COVID-19 virus was carried out and a map of this division was presented.

Download Full-text