Training Deep Neural Networks Using Conjugate Gradient-like Methods

Hideaki Iiduka; Yu Kobayashi

doi:10.3390/electronics9111809

Training Deep Neural Networks Using Conjugate Gradient-like Methods

Electronics ◽

10.3390/electronics9111809 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1809

Author(s):

Hideaki Iiduka ◽

Yu Kobayashi

Keyword(s):

Neural Networks ◽

Nonconvex Optimization ◽

Adaptive Learning ◽

Stationary Point ◽

Optimization Problem ◽

Deep Neural Networks ◽

Optimization Algorithms ◽

Learning Rate ◽

Adaptive Learning Rate ◽

Rate Optimization

The goal of this article is to train deep neural networks that accelerate useful adaptive learning rate optimization algorithms such as AdaGrad, RMSProp, Adam, and AMSGrad. To reach this goal, we devise an iterative algorithm combining the existing adaptive learning rate optimization algorithms with conjugate gradient-like methods, which are useful for constrained optimization. Convergence analyses show that the proposed algorithm with a small constant learning rate approximates a stationary point of a nonconvex optimization problem in deep learning. Furthermore, it is shown that the proposed algorithm with diminishing learning rates converges to a stationary point of the nonconvex optimization problem. The convergence and performance of the algorithm are demonstrated through numerical comparisons with the existing adaptive learning rate optimization algorithms for image and text classification. The numerical results show that the proposed algorithm with a constant learning rate is superior for training neural networks.

Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2021.3107415 ◽

2021 ◽

pp. 1-12

Author(s):

Hideaki Iiduka

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Deep Neural Networks ◽

Optimization Algorithms ◽

Learning Rate ◽

Adaptive Learning Rate ◽

Learning Rates ◽

Rate Optimization

Adaptive Learning Rate and Momentum for Training Deep Neural Networks

10.1007/978-3-030-86523-8_23 ◽

2021 ◽

pp. 381-396

Author(s):

Zhiyong Hao ◽

Yixuan Jiang ◽

Huihua Yu ◽

Hsiao-Dong Chiang

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Deep Neural Networks ◽

Learning Rate ◽

Adaptive Learning Rate

Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/267 ◽

2017 ◽

Cited By ~ 3

Author(s):

Yasutoshi Ida ◽

Yasuhiro Fujiwara ◽

Sotetsu Iwamura

Keyword(s):

Neural Networks ◽

Stochastic Optimization ◽

Covariance Matrix ◽

Adaptive Learning ◽

Deep Neural Networks ◽

Learning Rate ◽

First Order ◽

Adaptive Learning Rate ◽

Approximate Hessian

Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.

MEMS Inertial Sensor Fault Diagnosis Using a CNN-Based Data-Driven Method

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142059048x ◽

2020 ◽

Vol 34 (14) ◽

pp. 2059048

Author(s):

Tong Gao ◽

Wei Sheng ◽

Mingliang Zhou ◽

Bin Fang ◽

Liping Zheng

Keyword(s):

Fault Diagnosis ◽

Adaptive Learning ◽

Inertial Sensors ◽

Inertial Sensor ◽

Optimization Method ◽

Learning Rate ◽

Data Driven ◽

Adaptive Learning Rate ◽

Improved Performance ◽

Rate Optimization

In this paper, we propose a novel fault diagnosis (FD) approach for micro-electromechanical systems (MEMS) inertial sensors that recognize the fault patterns of MEMS inertial sensors in an end-to-end manner. We use a convolutional neural network (CNN)-based data-driven method to classify the temperature-related sensor faults in unmanned aerial vehicles (UAVs). First, we formulate the FD problem for MEMS inertial sensors into a deep learning framework. Second, we design a multi-scale CNN which uses the raw data of MEMS inertial sensors as input and which outputs classification results indicating faults. Then we extract fault features in the temperature domain to solve the non-uniform sampling problem. Finally, we propose an improved adaptive learning rate optimization method which accelerates the loss convergence by using the Kalman filter (KF) to train the network efficiently with a small dataset. Our experimental results show that our method achieved high fault recognition accuracy and that our proposed adaptive learning rate method improved performance in terms of loss convergence and robustness on a small training batch.

Neural Networks for Solving the Superposition Problem Using Approximation Method and Adaptive Learning Rate

Agent and Multi-Agent Systems: Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13541-5_10 ◽

2010 ◽

pp. 92-99 ◽

Cited By ~ 1

Author(s):

Théophile K. Dagba ◽

Villevo Adanhounmè ◽

Sèmiyou A. Adédjouma

Keyword(s):

Neural Networks ◽

Approximation Method ◽

Adaptive Learning ◽

Learning Rate ◽

Adaptive Learning Rate

CONSTRUCTIVE APPROACHES FOR TRAINING OF WAVELET NEURAL NETWORKS USING ADAPTIVE LEARNING RATE

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691313500215 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1350021

Author(s):

MOHAMED ZINE EL ABIDINE SKHIRI ◽

MOHAMED CHTOUROU

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Fixed Number ◽

Stability Theorem ◽

Learning Rate ◽

Training Algorithms ◽

Lyapunov Stability Theorem ◽

Constructive Approach ◽

Wavelet Neural Networks ◽

Adaptive Learning Rate

This paper investigates the applicability of the constructive approach proposed in Ref. 1 to wavelet neural networks (WNN). In fact, two incremental training algorithms will be presented. The first one, known as one pattern at a time (OPAT) approach, is the WNN version of the method applied in Ref. 1. The second approach however proposes a modified version of Ref. 1, known as one epoch at a time (OEAT) approach. In the OPAT approach, the input patterns are trained incrementally one by one until all patterns are presented. If the algorithm gets stuck in a local minimum and could not escape after a fixed number of successive attempts, then a new wavelet called also wavelon, will be recruited. In the OEAT approach however, all the input patterns are presented one epoch at a time. During one epoch, each pattern is trained only once until all patterns are trained. If the resulting overall error is reduced, then all the patterns will be retrained for one more epoch. Otherwise, a new wavelon will be recruited. To guarantee the convergence of the trained networks, an adaptive learning rate has been introduced using the discrete Lyapunov stability theorem.

Plant Diseases Identification through a Discount Momentum Optimizer in Deep Learning

Applied Sciences ◽

10.3390/app11209468 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9468

Author(s):

Yunyun Sun ◽

Yutong Liu ◽

Haocheng Zhou ◽

Huijuan Hu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Adaptive Learning ◽

Learning Rate ◽

Plant Diseases ◽

Stochastic Gradient Descent ◽

Automatic Identification ◽

Deep Convolutional Neural Networks ◽

Adaptive Learning Rate

Deep learning proves its promising results in various domains. The automatic identification of plant diseases with deep convolutional neural networks attracts a lot of attention at present. This article extends stochastic gradient descent momentum optimizer and presents a discount momentum (DM) deep learning optimizer for plant diseases identification. To examine the recognition and generalization capability of the DM optimizer, we discuss the hyper-parameter tuning and convolutional neural networks models across the plantvillage dataset. We further conduct comparison experiments on popular non-adaptive learning rate methods. The proposed approach achieves an average validation accuracy of no less than 97% for plant diseases prediction on several state-of-the-art deep learning models and holds a low sensitivity to hyper-parameter settings. Experimental results demonstrate that the DM method can bring a higher identification performance, while still maintaining a competitive performance over other non-adaptive learning rate methods in terms of both training speed and generalization.

An analytical approach to hardware-friendly adaptive learning rate neural networks

Proceedings. The 16th International Conference on Microelectronics, 2004. ICM 2004. ◽

10.1109/icm.2004.1434278 ◽

2005 ◽

Author(s):

M.G. Rezaie ◽

F. Farbiz ◽

S.M. Fakhraie

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Analytical Approach ◽

Learning Rate ◽

Adaptive Learning Rate

Active random noise control using adaptive learning rate neural networks with an immune feedback law

10.1117/12.664558 ◽

2005 ◽

Author(s):

Minoru Sasaki ◽

Takumi Kuribayashi ◽

Satoshi Ito

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Noise Control ◽

Random Noise ◽

Learning Rate ◽

Adaptive Learning Rate

Sliding Mode Control Approach for Training On-line Neural Networks with Adaptive Learning Rate

Sliding Mode Control ◽

10.5772/15918 ◽

2011 ◽

Author(s):

Ademir Nied ◽

Jos de

Keyword(s):

Neural Networks ◽

Sliding Mode Control ◽

Adaptive Learning ◽

Sliding Mode ◽

Learning Rate ◽

Control Approach ◽

Mode Control ◽

Adaptive Learning Rate ◽

On Line