scholarly journals Accelerating DNN Training Through Selective Localized Learning

2022 ◽  
Vol 15 ◽  
Author(s):  
Sarada Krithivasan ◽  
Sanchari Sen ◽  
Swagath Venkataramani ◽  
Anand Raghunathan

Training Deep Neural Networks (DNNs) places immense compute requirements on the underlying hardware platforms, expending large amounts of time and energy. We propose LoCal+SGD, a new algorithmic approach to accelerate DNN training by selectively combining localized or Hebbian learning within a Stochastic Gradient Descent (SGD) based training framework. Back-propagation is a computationally expensive process that requires 2 Generalized Matrix Multiply (GEMM) operations to compute the error and weight gradients for each layer. We alleviate this by selectively updating some layers' weights using localized learning rules that require only 1 GEMM operation per layer. Further, since localized weight updates are performed during the forward pass itself, the layer activations for such layers do not need to be stored until the backward pass, resulting in a reduced memory footprint. Localized updates can substantially boost training speed, but need to be used judiciously in order to preserve accuracy and convergence. We address this challenge through a Learning Mode Selection Algorithm, which gradually selects and moves layers to localized learning as training progresses. Specifically, for each epoch, the algorithm identifies a Localized→SGD transition layer that delineates the network into two regions. Layers before the transition layer use localized updates, while the transition layer and later layers use gradient-based updates. We propose both static and dynamic approaches to the design of the learning mode selection algorithm. The static algorithm utilizes a pre-defined scheduler function to identify the position of the transition layer, while the dynamic algorithm analyzes the dynamics of the weight updates made to the transition layer to determine how the boundary between SGD and localized updates is shifted in future epochs. We also propose a low-cost weak supervision mechanism that controls the learning rate of localized updates based on the overall training loss. We applied LoCal+SGD to 8 image recognition CNNs (including ResNet50 and MobileNetV2) across 3 datasets (Cifar10, Cifar100, and ImageNet). Our measurements on an Nvidia GTX 1080Ti GPU demonstrate upto 1.5× improvement in end-to-end training time with ~0.5% loss in Top-1 classification accuracy.

Author(s):  
Zejian Zhou ◽  
Yingmeng Xiang ◽  
Hao Xu ◽  
Yishen Wang ◽  
Di Shi ◽  
...  

Non-intrusive load monitoring (NILM) is a critical technique for advanced smart grid management due to the convenience of monitoring and analysing individual appliances’ power consumption in a non-intrusive fashion. Inspired by emerging machine learning technologies, many recent non-intrusive load monitoring studies have adopted artificial neural networks (ANN) to disaggregate appliances’ power from the non-intrusive sensors’ measurements. However, back-propagation ANNs have a very limit ability to disaggregate appliances caused by the great training time and uncertainty of convergence, which are critical flaws for low-cost devices. In this paper, a novel self-organizing probabilistic neural network (SPNN)-based non-intrusive load monitoring algorithm has been developed specifically for low-cost residential measuring devices. The proposed SPNN has been designed to estimate the probability density function classifying the different types of appliances. Compared to back-propagation ANNs, the SPNN requires less iterative synaptic weights update and provides guaranteed convergence. Meanwhile, the novel SPNN has less space complexity when compared with conventional PNNs by the self-organizing mechanism which automatically edits the neuron numbers. These advantages make the algorithm especially favourable to low-cost residential NILM devices. The effectiveness of the proposed algorithm is demonstrated through numerical simulation by using the public REDD dataset. Performance comparisons with well-known benchmark algorithms have also been provided in the experiment section.


2020 ◽  
Vol 39 (5) ◽  
pp. 6419-6430
Author(s):  
Dusan Marcek

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.


2004 ◽  
Vol 7 (1) ◽  
pp. 35-36 ◽  
Author(s):  
BRIAN MACWHINNEY

Truscott and Sharwood Smith (henceforth T&SS) attempt to show how second language acquisition can occur without any learning. In their APT model, change depends only on the tuning of innate principles through the normal course of processing of L2. There are some features of their model that I find attractive. Specifically, their acceptance of the concepts of competition and activation strength brings them in line with standard processing accounts like the Competition Model (Bates and MacWhinney, 1982; MacWhinney, 1987, in press). At the same time, their reliance on parameters as the core constructs guiding learning leaves this model squarely within the framework of Chomsky's theory of Principles and Parameters (P&P). As such, it stipulates that the specific functional categories of Universal Grammar serve as the fundamental guide to both first and second language acquisition. Like other accounts in the P&P framework, this model attempts to view second language acquisition as involving no real learning beyond the deductive process of parameter-setting based on the detection of certain triggers. The specific innovation of the APT model is that changes in activation strength during processing function as the trigger to the setting of parameters. Unlike other P&P models, APT does not set parameters in an absolute fashion, allowing their activation weight to change by the processing of new input over time. The use of the concept of activation in APT is far more restricted than its use in connectionist models that allow for Hebbian learning, self-organizing features maps, or back-propagation.


Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 500 ◽  
Author(s):  
Sergey A. Lobov ◽  
Andrey V. Chernyshov ◽  
Nadia P. Krilova ◽  
Maxim O. Shamshin ◽  
Victor B. Kazantsev

One of the modern trends in the design of human–machine interfaces (HMI) is to involve the so called spiking neuron networks (SNNs) in signal processing. The SNNs can be trained by simple and efficient biologically inspired algorithms. In particular, we have shown that sensory neurons in the input layer of SNNs can simultaneously encode the input signal based both on the spiking frequency rate and on varying the latency in generating spikes. In the case of such mixed temporal-rate coding, the SNN should implement learning working properly for both types of coding. Based on this, we investigate how a single neuron can be trained with pure rate and temporal patterns, and then build a universal SNN that is trained using mixed coding. In particular, we study Hebbian and competitive learning in SNN in the context of temporal and rate coding problems. We show that the use of Hebbian learning through pair-based and triplet-based spike timing-dependent plasticity (STDP) rule is accomplishable for temporal coding, but not for rate coding. Synaptic competition inducing depression of poorly used synapses is required to ensure a neural selectivity in the rate coding. This kind of competition can be implemented by the so-called forgetting function that is dependent on neuron activity. We show that coherent use of the triplet-based STDP and synaptic competition with the forgetting function is sufficient for the rate coding. Next, we propose a SNN capable of classifying electromyographical (EMG) patterns using an unsupervised learning procedure. The neuron competition achieved via lateral inhibition ensures the “winner takes all” principle among classifier neurons. The SNN also provides gradual output response dependent on muscular contraction strength. Furthermore, we modify the SNN to implement a supervised learning method based on stimulation of the target classifier neuron synchronously with the network input. In a problem of discrimination of three EMG patterns, the SNN with supervised learning shows median accuracy 99.5% that is close to the result demonstrated by multi-layer perceptron learned by back propagation of an error algorithm.


2020 ◽  
Vol 10 (2) ◽  
pp. 19
Author(s):  
Alfio Di Mauro ◽  
Hamed Fatemi ◽  
Jose Pineda de Gyvez ◽  
Luca Benini

Power management is a crucial concern in micro-controller platforms for the Internet of Things (IoT) edge. Many applications present a variable and difficult to predict workload profile, usually driven by external inputs. The dynamic tuning of power consumption to the application requirements is indeed a viable approach to save energy. In this paper, we propose the implementation of a power management strategy for a novel low-cost low-power heterogeneous dual-core SoC for IoT edge fabricated in 28 nm FD-SOI technology. Ss with more complex power management policies implemented on high-end application processors, we propose a power management strategy where the power mode is dynamically selected to ensure user-specified target idleness. We demonstrate that the dynamic power mode selection introduced by our power manager allows achieving more than 43% power consumption reduction with respect to static worst-case power mode selection, without any significant penalty in the performance of a running application.


2011 ◽  
Vol 21 (01) ◽  
pp. 31-47 ◽  
Author(s):  
NOEL LOPES ◽  
BERNARDETE RIBEIRO

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.


Sign in / Sign up

Export Citation Format

Share Document