scholarly journals Associated Learning: Decomposing End-to-End Backpropagation Based on Autoencoders and Target Propagation

2021 ◽  
Vol 33 (1) ◽  
pp. 174-193
Author(s):  
Yu-Wei Kao ◽  
Hung-Hsuan Chen

Backpropagation (BP) is the cornerstone of today's deep learning algorithms, but it is inefficient partially because of backward locking, which means updating the weights of one layer locks the weight updates in the other layers. Consequently, it is challenging to apply parallel computing or a pipeline structure to update the weights in different layers simultaneously. In this letter, we introduce a novel learning structure, associated learning (AL), that modularizes the network into smaller components, each of which has a local objective. Because the objectives are mutually independent, AL can learn the parameters in different layers independently and simultaneously, so it is feasible to apply a pipeline structure to improve the training throughput. Specifically, this pipeline structure improves the complexity of the training time from [Formula: see text], which is the time complexity when using BP and stochastic gradient descent (SGD) for training, to [Formula: see text], where [Formula: see text] is the number of training instances and [Formula: see text] is the number of hidden layers. Surprisingly, even though most of the parameters in AL do not directly interact with the target variable, training deep models by this method yields accuracies comparable to those from models trained using typical BP methods, in which all parameters are used to predict the target variable. Consequently, because of the scalability and the predictive power demonstrated in the experiments, AL deserves further study to determine the better hyperparameter settings, such as activation function selection, learning rate scheduling, and weight initialization, to accumulate experience, as we have done over the years with the typical BP method. In addition, perhaps our design can also inspire new network designs for deep learning. Our implementation is available at https://github.com/SamYWK/Associated_Learning .

2020 ◽  
Vol 63 (6) ◽  
pp. 900-912
Author(s):  
Oswalt Manoj S ◽  
Ananth J P

Abstract Rainfall prediction is the active area of research as it enables the farmers to move with the effective decision-making regarding agriculture in both cultivation and irrigation. The existing prediction models are scary as the prediction of rainfall depended on three major factors including the humidity, rainfall and rainfall recorded in the previous years, which resulted in huge time consumption and leveraged huge computational efforts associated with the analysis. Thus, this paper introduces the rainfall prediction model based on the deep learning network, convolutional long short-term memory (convLSTM) system, which promises a prediction based on the spatial-temporal patterns. The weights of the convLSTM are tuned optimally using the proposed Salp-stochastic gradient descent algorithm (S-SGD), which is the integration of Salp swarm algorithm (SSA) in the stochastic gradient descent (SGD) algorithm in order to facilitate the global optimal tuning of the weights and to assure a better prediction accuracy. On the other hand, the proposed deep learning framework is built in the MapReduce framework that enables the effective handling of the big data. The analysis using the rainfall prediction database reveals that the proposed model acquired the minimal mean square error (MSE) and percentage root mean square difference (PRD) of 0.001 and 0.0021.


Author(s):  
Ameya D. Jagtap ◽  
Kenji Kawaguchi ◽  
George Em Karniadakis

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.


2020 ◽  
Vol 34 (04) ◽  
pp. 6861-6868 ◽  
Author(s):  
Yikai Zhang ◽  
Hui Qu ◽  
Dimitris Metaxas ◽  
Chao Chen

Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD.


2018 ◽  
Author(s):  
Kazunori D Yamada

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.


Author(s):  
Hui Zhao ◽  
Hai-Xia Zhang ◽  
Qing-Jiao Cao ◽  
Sheng-Juan Sun ◽  
Xuanzhe Han ◽  
...  

Deep learning algorithms have shown superior performance than traditional algorithms when dealing with computationally intensive tasks in many fields. The algorithm model based on deep learning has good performance and can improve the recognition accuracy in relevant applications in the field of computer vision. TensorFlow is a flexible opensource machine learning platform proposed by Google, which can run on a variety of platforms, such as CPU, GPU, and mobile devices. TensorFlow platform can also support current popular deep learning models. In this paper, an image recognition toolkit based on TensorFlow is designed and developed to simplify the development process of more and more image recognition applications. The toolkit uses convolutional neural networks to build a training model, which consists of two convolutional layers: one batch normalization layer before each convolutional layer, and the other pooling layer after each convolutional layer. The last two layers of the model use the full connection layer to output recognition results. Batch gradient descent algorithm is adopted in the optimization algorithm, and it integrates the advantages of both the gradient descent algorithm and the stochastic gradient descent algorithm, which greatly reduces the number of convergence iterations and has little influence on the convergence effect. The total training parameters of the toolkit model reach 1.7 million. In order to prevent overfitting problems, the dropout layer before each full connection layer is added and the threshold of 0.5 is set in the design. The convolution neural network model is trained and tested by the MNIST set on TensorFlow. The experimental result shows that the toolkit achieves the recognition accuracy of 99% on the MNIST test set. The development of the toolkit provides powerful technical support for the development of various image recognition applications, reduces its difficulty, and improves the efficiency of resource utilization.


2019 ◽  
Vol 2019 ◽  
pp. 1-13 ◽  
Author(s):  
Rodrigo P. Monteiro ◽  
Mariela Cerrada ◽  
Diego R. Cabrera ◽  
René V. Sánchez ◽  
Carmelo J. A. Bastos-Filho

Gearboxes are mechanical devices that play an essential role in several applications, e.g., the transmission of automotive vehicles. Their malfunctioning may result in economic losses and accidents, among others. The rise of powerful graphical processing units spreads the use of deep learning-based solutions to many problems, which includes the fault diagnosis on gearboxes. Those solutions usually require a significant amount of data, high computational power, and a long training process. The training of deep learning-based systems may not be feasible when GPUs are not available. This paper proposes a solution to reduce the training time of deep learning-based fault diagnosis systems without compromising their accuracy. The solution is based on the use of a decision stage to interpret all the probability outputs of a classifier whose output layer has the softmax activation function. Two classification algorithms were applied to perform the decision. We have reduced the training time by almost 80% without compromising the average accuracy of the fault diagnosis system.


Sign in / Sign up

Export Citation Format

Share Document