scholarly journals DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Author(s):  
Rafael Stahl ◽  
Alexander Hoffman ◽  
Daniel Mueller-Gritschneder ◽  
Andreas Gerstlauer ◽  
Ulf Schlichtmann

AbstractPerforming inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

2020 ◽  
Author(s):  
Gang Liu

<p>Artificial neural networks (ANNs) have won numerous contests in pattern recognition, machine learning, or artificial intelligence in recent years. The neuron of ANNs was designed by the stereotypical knowledge of biological neurons 70 years ago. Artificial Neuron is expressed as f(wx+b) or f(WX). This design does not consider dendrites' information processing capacity. However, some studies recently show that biological dendrites participate in the pre-calculation of input data. Concretely, biological dendrites play a role in extracting the interaction information among inputs (features). Therefore, it may be time to improve the neuron of ANNs. According to our previous studies (DD), this paper adds the dendrites' function to artificial Neuron. The dendrite function can be expressed as W<sup>i,i-1</sup>A<sup>i-1</sup> ○ A<sup>0|1|2|...|i-1</sup> . The generalized new neuron can be expressed as f(W(W<sup>i,i-1</sup>A<sup>i-1</sup> ○ A<sup>0|1|2|...|i-1</sup>)).The simplified new neuron be expressed as f(∑(WA ○ X)). After improving the neuron, there are so many networks to try. This paper shows some basic architecture for reference in the future. </p> <p>Interesting things: (1) The computational complexity of dendrite modules (W<sup>i,i-1</sup>A<sup>i-1</sup> ○ A<sup>i-1</sup>) connected in series is far lower than Horner's method. Will this speed up the calculation of basic functions in computers? (2) The range of sight of animals has a gradient, but the convolution layer does not have this characteristic. This paper proposes receptive fields with a gradient. (3) The networks using Gang neurons can delete traditional networks' Fully-connected Layer. In other words, the Fully-connected Layers' parameters are assigned to a single neuron, which reduces the parameters of a network for the same mapping capacity.</p>


Author(s):  
Wei Huang ◽  
Weitao Du ◽  
Richard Yi Da Xu

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for nonlinear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.


Author(s):  
Akhilan Boopathy ◽  
Tsui-Wei Weng ◽  
Pin-Yu Chen ◽  
Sijia Liu ◽  
Luca Daniel

Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NP-complete problem, obtaining a non-trivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fully-connected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNN-Cert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general – we can handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient – by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speed-up compared to the state-of-the-art certification algorithms (e.g. Fast-Lin, CROWN) and 366 times of speed-up compared to the dual-LP approach while our algorithm obtains similar or even better verification bounds. In addition, CNN-Cert generalizes state-of-the-art algorithms e.g. Fast-Lin and CROWN. We demonstrate by extensive experiments that our method outperforms state-of-the-art lowerbound-based certification algorithms in terms of both bound quality and speed.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Sumedh Yadav ◽  
Mathis Bode

Abstract A scalable graphical method is presented for selecting and partitioning datasets for the training phase of a classification task. For the heuristic, a clustering algorithm is required to get its computation cost in a reasonable proportion to the task itself. This step is succeeded by construction of an information graph of the underlying classification patterns using approximate nearest neighbor methods. The presented method consists of two approaches, one for reducing a given training set, and another for partitioning the selected/reduced set. The heuristic targets large datasets, since the primary goal is a significant reduction in training computation run-time without compromising prediction accuracy. Test results show that both approaches significantly speed-up the training task when compared against that of state-of-the-art shrinking heuristics available in LIBSVM. Furthermore, the approaches closely follow or even outperform in prediction accuracy. A network design is also presented for a partitioning based distributed training formulation. Added speed-up in training run-time is observed when compared to that of serial implementation of the approaches.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 47
Author(s):  
Vasyl Teslyuk ◽  
Artem Kazarian ◽  
Natalia Kryvinska ◽  
Ivan Tsmots

In the process of the “smart” house systems work, there is a need to process fuzzy input data. The models based on the artificial neural networks are used to process fuzzy input data from the sensors. However, each artificial neural network has a certain advantage and, with a different accuracy, allows one to process different types of data and generate control signals. To solve this problem, a method of choosing the optimal type of artificial neural network has been proposed. It is based on solving an optimization problem, where the optimization criterion is an error of a certain type of artificial neural network determined to control the corresponding subsystem of a “smart” house. In the process of learning different types of artificial neural networks, the same historical input data are used. The research presents the dependencies between the types of neural networks, the number of inner layers of the artificial neural network, the number of neurons on each inner layer, the error of the settings parameters calculation of the relative expected results.


Author(s):  
Naoki Matsumura ◽  
Yasuaki Ito ◽  
Koji Nakano ◽  
Akihiko Kasagi ◽  
Tsuguchika Tabaru

Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2005
Author(s):  
Veronika Scholz ◽  
Peter Winkler ◽  
Andreas Hornig ◽  
Maik Gude ◽  
Angelos Filippatos

Damage identification of composite structures is a major ongoing challenge for a secure operational life-cycle due to the complex, gradual damage behaviour of composite materials. Especially for composite rotors in aero-engines and wind-turbines, a cost-intensive maintenance service has to be performed in order to avoid critical failure. A major advantage of composite structures is that they are able to safely operate after damage initiation and under ongoing damage propagation. Therefore, a robust, efficient diagnostic damage identification method would allow monitoring the damage process with intervention occurring only when necessary. This study investigates the structural vibration response of composite rotors by applying machine learning methods and the ability to identify, localise and quantify the present damage. To this end, multiple fully connected neural networks and convolutional neural networks were trained on vibration response spectra from damaged composite rotors with barely visible damage, mostly matrix cracks and local delaminations using dimensionality reduction and data augmentation. A databank containing 720 simulated test cases with different damage states is used as a basis for the generation of multiple data sets. The trained models are tested using k-fold cross validation and they are evaluated based on the sensitivity, specificity and accuracy. Convolutional neural networks perform slightly better providing a performance accuracy of up to 99.3% for the damage localisation and quantification.


2016 ◽  
Vol 182 ◽  
pp. 154-164 ◽  
Author(s):  
Junfei Qiao ◽  
Fanjun Li ◽  
Honggui Han ◽  
Wenjing Li

Water ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 705
Author(s):  
Josué Trejo-Alonso ◽  
Carlos Fuentes ◽  
Carlos Chávez ◽  
Antonio Quevedo ◽  
Alfonso Gutierrez-Lopez ◽  
...  

In the present work, we construct several artificial neural networks (varying the input data) to calculate the saturated hydraulic conductivity (KS) using a database with 900 measured samples obtained from the Irrigation District 023, in San Juan del Rio, Queretaro, Mexico. All of them were constructed using two hidden layers, a back-propagation algorithm for the learning process, and a logistic function as a nonlinear transfer function. In order to explore different arrays for neurons into hidden layers, we performed the bootstrap technique for each neural network and selected the one with the least Root Mean Square Error (RMSE) value. We also compared these results with pedotransfer functions and another neural networks from the literature. The results show that our artificial neural networks obtained from 0.0459 to 0.0413 in the RMSE measurement, and 0.9725 to 0.9780 for R2, which are in good agreement with other works. We also found that reducing the amount of the input data offered us better results.


Sign in / Sign up

Export Citation Format

Share Document