Associated Learning: Decomposing End-to-End Backpropagation Based on Autoencoders and Target Propagation

Backpropagation (BP) is the cornerstone of today's deep learning algorithms, but it is inefficient partially because of backward locking, which means updating the weights of one layer locks the weight updates in the other layers. Consequently, it is challenging to apply parallel computing or a pipeline structure to update the weights in different layers simultaneously. In this letter, we introduce a novel learning structure, associated learning (AL), that modularizes the network into smaller components, each of which has a local objective. Because the objectives are mutually independent, AL can learn the parameters in different layers independently and simultaneously, so it is feasible to apply a pipeline structure to improve the training throughput. Specifically, this pipeline structure improves the complexity of the training time from [Formula: see text], which is the time complexity when using BP and stochastic gradient descent (SGD) for training, to [Formula: see text], where [Formula: see text] is the number of training instances and [Formula: see text] is the number of hidden layers. Surprisingly, even though most of the parameters in AL do not directly interact with the target variable, training deep models by this method yields accuracies comparable to those from models trained using typical BP methods, in which all parameters are used to predict the target variable. Consequently, because of the scalability and the predictive power demonstrated in the experiments, AL deserves further study to determine the better hyperparameter settings, such as activation function selection, learning rate scheduling, and weight initialization, to accumulate experience, as we have done over the years with the typical BP method. In addition, perhaps our design can also inspire new network designs for deep learning. Our implementation is available at https://github.com/SamYWK/Associated_Learning .

Download Full-text

MapReduce and Optimized Deep Network for Rainfall Prediction in Agriculture

The Computer Journal ◽

10.1093/comjnl/bxz164 ◽

2020 ◽

Vol 63 (6) ◽

pp. 900-912

Author(s):

Oswalt Manoj S ◽

Ananth J P

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Prediction Models ◽

Short Term Memory ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Mean Square ◽

Rainfall Prediction ◽

Gradient Descent Algorithm ◽

Major Factors

Abstract Rainfall prediction is the active area of research as it enables the farmers to move with the effective decision-making regarding agriculture in both cultivation and irrigation. The existing prediction models are scary as the prediction of rainfall depended on three major factors including the humidity, rainfall and rainfall recorded in the previous years, which resulted in huge time consumption and leveraged huge computational efforts associated with the analysis. Thus, this paper introduces the rainfall prediction model based on the deep learning network, convolutional long short-term memory (convLSTM) system, which promises a prediction based on the spatial-temporal patterns. The weights of the convLSTM are tuned optimally using the proposed Salp-stochastic gradient descent algorithm (S-SGD), which is the integration of Salp swarm algorithm (SSA) in the stochastic gradient descent (SGD) algorithm in order to facilitate the global optimal tuning of the weights and to assure a better prediction accuracy. On the other hand, the proposed deep learning framework is built in the MapReduce framework that enables the effective handling of the big data. The analysis using the rainfall prediction database reveals that the proposed model acquired the minimal mean square error (MSE) and percentage root mean square difference (PRD) of 0.001 and 0.0021.

Download Full-text

Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2020.0334 ◽

2020 ◽

Vol 476 (2239) ◽

pp. 20200334 ◽

Cited By ~ 2

Author(s):

Ameya D. Jagtap ◽

Kenji Kawaguchi ◽

George Em Karniadakis

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Gradient Descent ◽

Activation Function ◽

Stochastic Gradient Descent ◽

Activation Functions ◽

Gradient Descent Algorithm ◽

Locally Adaptive ◽

The Matrix ◽

Base Method

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

Download Full-text

A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) ◽

10.1109/ccgrid.2019.00053 ◽

2019 ◽

Author(s):

Wenbin Jiang ◽

Geyan Ye ◽

Laurence T. Yang ◽

Jian Zhu ◽

Yang Ma ◽

...

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Heterogeneous Cluster ◽

Cluster Systems ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent

The 9th International Symposium on Chinese Spoken Language Processing ◽

10.1109/iscslp.2014.6936596 ◽

2014 ◽

Cited By ~ 2

Author(s):

Zhao You ◽

Bo Xu

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Time

Download Full-text

An Efficient, Distributed Stochastic Gradient Descent Algorithm for Deep-Learning Applications

2017 46th International Conference on Parallel Processing (ICPP) ◽

10.1109/icpp.2017.10 ◽

2017 ◽

Cited By ~ 2

Author(s):

Guojing Cong ◽

Onkar Bhardwaj ◽

Minwei Feng

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

Local Regularizer Improves Generalization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6167 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6861-6868 ◽

Cited By ~ 1

Author(s):

Yikai Zhang ◽

Hui Qu ◽

Dimitris Metaxas ◽

Chao Chen

Keyword(s):

Deep Learning ◽

Theoretical Analysis ◽

Experimental Evidence ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Training Methods ◽

Theoretical Understanding ◽

Better Than

Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.

Download Full-text

Hyperparameter-free optimizer of stochastic gradient descent that incorporates unit correction and moment estimation

10.1101/348557 ◽

2018 ◽

Author(s):

Kazunori D Yamada

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Mathematical Optimization ◽

Descent Method ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Moment Estimation ◽

Estimation System

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.

Download Full-text

Design and Development of Image Recognition Toolkit Based on Deep Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421590023 ◽

2020 ◽

pp. 2159002

Author(s):

Hui Zhao ◽

Hai-Xia Zhang ◽

Qing-Jiao Cao ◽

Sheng-Juan Sun ◽

Xuanzhe Han ◽

...

Keyword(s):

Deep Learning ◽

Image Recognition ◽

Gradient Descent ◽

Recognition Accuracy ◽

Superior Performance ◽

Stochastic Gradient Descent ◽

Experimental Result ◽

Descent Algorithm ◽

Learning Platform ◽

Gradient Descent Algorithm

Deep learning algorithms have shown superior performance than traditional algorithms when dealing with computationally intensive tasks in many fields. The algorithm model based on deep learning has good performance and can improve the recognition accuracy in relevant applications in the field of computer vision. TensorFlow is a flexible opensource machine learning platform proposed by Google, which can run on a variety of platforms, such as CPU, GPU, and mobile devices. TensorFlow platform can also support current popular deep learning models. In this paper, an image recognition toolkit based on TensorFlow is designed and developed to simplify the development process of more and more image recognition applications. The toolkit uses convolutional neural networks to build a training model, which consists of two convolutional layers: one batch normalization layer before each convolutional layer, and the other pooling layer after each convolutional layer. The last two layers of the model use the full connection layer to output recognition results. Batch gradient descent algorithm is adopted in the optimization algorithm, and it integrates the advantages of both the gradient descent algorithm and the stochastic gradient descent algorithm, which greatly reduces the number of convergence iterations and has little influence on the convergence effect. The total training parameters of the toolkit model reach 1.7 million. In order to prevent overfitting problems, the dropout layer before each full connection layer is added and the threshold of 0.5 is set in the design. The convolution neural network model is trained and tested by the MNIST set on TensorFlow. The experimental result shows that the toolkit achieves the recognition accuracy of 99% on the MNIST test set. The development of the toolkit provides powerful technical support for the development of various image recognition applications, reduces its difficulty, and improves the efficiency of resource utilization.

Download Full-text

A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning

2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/padsw.2018.8644932 ◽

2018 ◽

Cited By ~ 3

Author(s):

Shaohuai Shi ◽

Qiang Wang ◽

Xiaowen Chu ◽

Bo Li

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Dag Model

Download Full-text

Using a Support Vector Machine Based Decision Stage to Improve the Fault Diagnosis on Gearboxes

Computational Intelligence and Neuroscience ◽

10.1155/2019/1383752 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 5

Author(s):

Rodrigo P. Monteiro ◽

Mariela Cerrada ◽

Diego R. Cabrera ◽

René V. Sánchez ◽

Carmelo J. A. Bastos-Filho

Keyword(s):

Deep Learning ◽

Fault Diagnosis ◽

Activation Function ◽

Economic Losses ◽

Support Vector ◽

Training Time ◽

Decision Stage ◽

Average Accuracy ◽

Graphical Processing ◽

Mechanical Devices

Gearboxes are mechanical devices that play an essential role in several applications, e.g., the transmission of automotive vehicles. Their malfunctioning may result in economic losses and accidents, among others. The rise of powerful graphical processing units spreads the use of deep learning-based solutions to many problems, which includes the fault diagnosis on gearboxes. Those solutions usually require a significant amount of data, high computational power, and a long training process. The training of deep learning-based systems may not be feasible when GPUs are not available. This paper proposes a solution to reduce the training time of deep learning-based fault diagnosis systems without compromising their accuracy. The solution is based on the use of a decision stage to interpret all the probability outputs of a classifier whose output layer has the softmax activation function. Two classification algorithms were applied to perform the decision. We have reduced the training time by almost 80% without compromising the average accuracy of the fault diagnosis system.

Download Full-text