Efficient Binarized Convolutional Layers for Visual Inspection Applications on Resource-Limited FPGAs and ASICs

Taylor Simons; Dah-Jye Lee

doi:10.3390/electronics10131511

Efficient Binarized Convolutional Layers for Visual Inspection Applications on Resource-Limited FPGAs and ASICs

Electronics ◽

10.3390/electronics10131511 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1511

Author(s):

Taylor Simons ◽

Dah-Jye Lee

Keyword(s):

Neural Networks ◽

Visual Inspection ◽

Deep Neural Networks ◽

Computational Cost ◽

Quality Inspection ◽

Agricultural Produce ◽

Resource Limited ◽

Inspection Tasks ◽

Computational Resources ◽

Small Models

There has been a recent surge in publications related to binarized neural networks (BNNs), which use binary values to represent both the weights and activations in deep neural networks (DNNs). Due to the bitwise nature of BNNs, there have been many efforts to implement BNNs on ASICs and FPGAs. While BNNs are excellent candidates for these kinds of resource-limited systems, most implementations still require very large FPGAs or CPU-FPGA co-processing systems. Our work focuses on reducing the computational cost of BNNs even further, making them more efficient to implement on FPGAs. We target embedded visual inspection tasks, like quality inspection sorting on manufactured parts and agricultural produce sorting. We propose a new binarized convolutional layer, called the neural jet features layer, that learns well-known classic computer vision kernels that are efficient to calculate as a group. We show that on visual inspection tasks, neural jet features perform comparably to standard BNN convolutional layers while using less computational resources. We also show that neural jet features tend to be more stable than BNN convolution layers when training small models.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Pruning for Hardware-Based Deep Spiking Neural Networks Using Gated Schottky Diode as Synaptic Devices

Journal of Nanoscience and Nanotechnology ◽

10.1166/jnn.2020.18772 ◽

2020 ◽

Vol 20 (11) ◽

pp. 6603-6608 ◽

Cited By ~ 1

Author(s):

Sung-Tae Lee ◽

Suhwan Lim ◽

Jong-Ho Bae ◽

Dongseok Kwon ◽

Hyeong-Su Kim ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Schottky Diodes ◽

Computational Cost ◽

Spiking Neural Networks ◽

Training Procedure ◽

Learning Tasks ◽

L1 Regularization ◽

The Cost ◽

High Computational Cost

Deep learning represents state-of-the-art results in various machine learning tasks, but for applications that require real-time inference, the high computational cost of deep neural networks becomes a bottleneck for the efficiency. To overcome the high computational cost of deep neural networks, spiking neural networks (SNN) have been proposed. Herein, we propose a hardware implementation of the SNN with gated Schottky diodes as synaptic devices. In addition, we apply L1 regularization for connection pruning of the deep spiking neural networks using gated Schottky diodes as synap-tic devices. Applying L1 regularization eliminates the need for a re-training procedure because it prunes the weights based on the cost function. The compressed hardware-based SNN is energy efficient while achieving a classification accuracy of 97.85% which is comparable to 98.13% of the software deep neural networks (DNN).

Download Full-text

CircConv: A Structured Convolution with Low Complexity

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014287 ◽

2019 ◽

Vol 33 ◽

pp. 4287-4294

Author(s):

Siyu Liao ◽

Bo Yuan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Computational Cost ◽

Low Complexity ◽

Deep Convolutional Neural Networks ◽

Significant Saving ◽

Machine Learning Applications ◽

Fast Multiplication ◽

Large Model

Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structureimposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor.

Download Full-text

Fully Convolutional Deep Neural Networks with Optimized Hyperparameters for Detection of Shockable and Non-Shockable Rhythms

Sensors ◽

10.3390/s20102875 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2875 ◽

Cited By ~ 1

Author(s):

Vessela Krasteva ◽

Sarah Ménétré ◽

Jean-Philippe Didon ◽

Irena Jekova

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Random Search ◽

Computational Cost ◽

Dropout Rate ◽

Machine Learning Algorithms ◽

Dense Layer ◽

Max Pooling ◽

Resuscitation Guidelines ◽

Advisory Systems

Deep neural networks (DNN) are state-of-the-art machine learning algorithms that can be learned to self-extract significant features of the electrocardiogram (ECG) and can generally provide high-output diagnostic accuracy if subjected to robust training and optimization on large datasets at high computational cost. So far, limited research and optimization of DNNs in shock advisory systems is found on large ECG arrhythmia databases from out-of-hospital cardiac arrests (OHCA). The objective of this study is to optimize the hyperparameters (HPs) of deep convolutional neural networks (CNN) for detection of shockable (Sh) and nonshockable (NSh) rhythms, and to validate the best HP settings for short and long analysis durations (2–10 s). Large numbers of (Sh + NSh) ECG samples were used for training (720 + 3170) and validation (739 + 5921) from Holters and defibrillators in OHCA. An end-to-end deep CNN architecture was implemented with one-lead raw ECG input layer (5 s, 125 Hz, 2.5 uV/LSB), configurable number of 5 to 23 hidden layers and output layer with diagnostic probability p ∈ [0: Sh,1: NSh]. The hidden layers contain N convolutional blocks × 3 layers (Conv1D (filters = Fi, kernel size = Ki), max-pooling (pool size = 2), dropout (rate = 0.3)), one global max-pooling and one dense layer. Random search optimization of HPs = {N, Fi, Ki}, i = 1, … N in a large grid of N = [1, 2, … 7], Fi = [5;50], Ki = [5;100] was performed. During training, the model with maximal balanced accuracy BAC = (Sensitivity + Specificity)/2 over 400 epochs was stored. The optimization principle is based on finding the common HPs space of a few top-ranked models and prediction of a robust HP setting by their median value. The optimal models for 1–7 CNN layers were trained with different learning rates LR = [10−5; 10−2] and the best model was finally validated on 2–10 s analysis durations. A number of 4216 random search models were trained. The optimal models with more than three convolutional layers did not exhibit substantial differences in performance BAC = (99.31–99.5%). Among them, the best model was found with {N = 5, Fi = {20, 15, 15, 10, 5}, Ki = {10, 10, 10, 10, 10}, 7521 trainable parameters} with maximal validation performance for 5-s analysis (BAC = 99.5%, Se = 99.6%, Sp = 99.4%) and tolerable drop in performance (<2% points) for very short 2-s analysis (BAC = 98.2%, Se = 97.6%, Sp = 98.7%). DNN application in future-generation shock advisory systems can improve the detection performance of Sh and NSh rhythms and can considerably shorten the analysis duration complying with resuscitation guidelines for minimal hands-off pauses.

Download Full-text

Automating Visual Inspection of Lyophilized Drug Products With Multi-Input Deep Neural Networks

2019 IEEE 15th International Conference on Automation Science and Engineering (CASE) ◽

10.1109/coase.2019.8843069 ◽

2019 ◽

Author(s):

Calvin Tsay ◽

Zheng Li

Keyword(s):

Neural Networks ◽

Visual Inspection ◽

Deep Neural Networks ◽

Drug Products

Download Full-text

Improved Training of Deep Convolutional Networks via Minimum-Variance Regularized Adaptive Sampling

10.21203/rs.3.rs-983472/v1 ◽

2021 ◽

Author(s):

Alfonso Rojas-Domínguez ◽

Ivvan Valdez ◽

Manuel Ornelas-Rodríguez ◽

Martín Carpio

Keyword(s):

Neural Networks ◽

Adaptive Sampling ◽

Sampling Method ◽

Deep Neural Networks ◽

Computational Cost ◽

Stochastic Gradient Descent ◽

Experimental Comparison ◽

Great Success ◽

Convolutional Networks ◽

Training Examples

Abstract Fostered by technological and theoretical developments, deep neural networks have achieved great success in many applications, but their training by means of mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. Said computational cost is exacerbated by the inefficiency of the uniform sampling method typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal. A better strategy is to form the mini-batches by sampling the training examples under a distribution where the probability of being selected is proportional to the relevance of each individual example. This can be achieved through Importance Sampling (IS), which also achieves the minimization of the gradients’ variance w.r.t. the network parameters, further improving convergence. In this paper, an IS-based adaptive sampling method is studied that exploits side information to construct the required probability distribution. Said method is modified to enable its application to deep neural networks, and the improved method is dubbed Regularized Adaptive Sampling (RAS). Experimental comparison (using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets) of RAS against SGD and against another sampling method in the state of the art, shows that RAS achieves relative improvements of the training process, without incurring significant overhead or affecting the accuracy of the networks.

Download Full-text

Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/ac3ae3 ◽

2021 ◽

Vol 2021 (12) ◽

pp. 124010

Author(s):

Ryo Karakida ◽

Kazuki Osawa

Keyword(s):

Neural Networks ◽

Function Space ◽

Fisher Information ◽

Gradient Descent ◽

Large Scale ◽

Deep Neural Networks ◽

Theoretical Perspective ◽

Computational Cost ◽

Fast Convergence ◽

Natural Gradient

Abstract Natural gradient descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast in practice. Nevertheless, it remains unclear from the theoretical perspective why and under what conditions such heuristic approximations work well. In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD. We consider deep neural networks in the infinite-width limit, and analyze the asymptotic training dynamics of NGD in function space via the neural tangent kernel. In the function space, the training dynamics with the approximate Fisher information are identical to those with the exact Fisher information, and they converge quickly. The fast convergence holds in layer-wise approximations; for instance, in block diagonal approximation where each block corresponds to a layer as well as in block tri-diagonal and K-FAC approximations. We also find that a unit-wise approximation achieves the same fast convergence under some assumptions. All of these different approximations have an isotropic gradient in the function space, and this plays a fundamental role in achieving the same convergence properties in training. Thus, the current study gives a novel and unified theoretical foundation with which to understand NGD methods in deep learning.

Download Full-text

Explaining probabilistic Artificial Intelligence (AI) models by discretizing Deep Neural Networks

10.36227/techrxiv.14792160.v1 ◽

2021 ◽

Author(s):

Rabia Saleem ◽

Bo Yuan ◽

Fatih Kurugollu ◽

Ashiq Anjum

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Decision Making ◽

Process Model ◽

Deep Neural Networks ◽

Computational Cost ◽

Black Box ◽

Decision Making Process ◽

Deterministic Models ◽

Know How

Artificial Intelligence (AI) models can learn from data and make decisions without any human intervention. However, the deployment of such models is challenging and risky because we do not know how the internal decision- making is happening in these models. Especially, the high-risk decisions such as medical diagnosis or automated navigation demand explainability and verification of the decision making process in AI algorithms. This research paper aims to explain Artificial Intelligence (AI) models by discretizing the black-box process model of deep neural networks using partial differential equations. The PDEs based deterministic models would minimize the time and computational cost of the decision-making process and reduce the chances of uncertainty that make the prediction more trustworthy.

Download Full-text

Energy-efficient Amortized Inference with Cascaded Deep Classifiers

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/302 ◽

2018 ◽

Author(s):

Jiaqi Guan ◽

Yang Liu ◽

Qiang Liu ◽

Jian Peng

Keyword(s):

Neural Networks ◽

Energy Cost ◽

Deep Neural Networks ◽

Predictive Accuracy ◽

Computational Cost ◽

Mobile Sensing ◽

Test Time ◽

Trade Off ◽

Effective Cost ◽

Energy Constrained

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. Moreover, we demonstrate our method's effectiveness with extensive experiments on CIFAR-10/100, ImageNet32x32 and original ImageNet dataset.

Download Full-text

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks

Computational Intelligence and Neuroscience ◽

10.1155/2020/7839064 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Xin Long ◽

XiangRong Zeng ◽

Zongcheng Ben ◽

Dianle Zhou ◽

Maojun Zhang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Computational Cost ◽

Network Models ◽

Memory Consumption ◽

Neural Network Models ◽

The Neural Network ◽

The Neural Networks ◽

Novel Strategy

The increase in sophistication of neural network models in recent years has exponentially expanded memory consumption and computational cost, thereby hindering their applications on ASIC, FPGA, and other mobile devices. Therefore, compressing and accelerating the neural networks are necessary. In this study, we introduce a novel strategy to train low-bit networks with weights and activations quantized by several bits and address two corresponding fundamental issues. One is to approximate activations through low-bit discretization for decreasing network computational cost and dot-product memory. The other is to specify weight quantization and update mechanism for discrete weights to avoid gradient mismatch. With quantized low-bit weights and activations, the costly full-precision operation will be replaced by shift operation. We evaluate the proposed method on common datasets, and results show that this method can dramatically compress the neural network with slight accuracy loss.

Download Full-text