An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.

Configuration-Invariant Sound Localization Technique Using Azimuth-Frequency Representation and Convolutional Neural Networks

Sensors ◽

10.3390/s20133768 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3768

Author(s):

Chanjun Chun ◽

Kwang Myung Jeon ◽

Wooyeol Choi

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Sound Localization ◽

Speech Processing ◽

Image Method ◽

Frame Size ◽

Localization Technique ◽

Time Frequency ◽

Input Size ◽

Frequency Representation

Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.

Analysis of the Cherenkov Telescope Array first Large Size Telescope real data using convolutional neural networks

10.22323/1.395.0703 ◽

2021 ◽

Author(s):

Mathieu de Bony de Lavergne ◽

Hyuga Abe ◽

Arnau Aguasca ◽

Ivan Agudo ◽

Lucio Angelo Antonelli ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Real Data ◽

Telescope Array ◽

Cherenkov Telescope ◽

Large Size

Combining Learnable Low-dimensional Binary Filter Bases for Compressing Convolutional Neural Networks

10.36227/techrxiv.17031917 ◽

2021 ◽

Author(s):

Weichao Lan ◽

Yiu-ming Cheung ◽

Liang Lan

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Real Life ◽

Embedded Devices ◽

Compression Performance ◽

Benchmark Datasets ◽

Significant Performance ◽

Low Dimensional ◽

Convolution Filters ◽

Binary Filter

Existing convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks, but a large number of parameters in convolutional layers requires huge storage and computation resources which makes it difficult to deploy CNNs on memory-constraint embedded devices. In this paper, we propose a novel compression method that generates the convolution filters in each layer by combining a set of learnable low-dimensional binary filter bases. The proposed method designs more compact convolution filters by stacking the linear combinations of these filter bases. Because of binary filters, the compact filters can be represented using less number of bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficient through L1-ball projection when conducting linear combination to avoid overfitting. In addition, we analyze the compression performance of the proposed method in detail. Evaluations on four benchmark datasets under VGG-16 and ResNet-18 structures show that the proposed method can achieve a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.

Combining Learnable Low-dimensional Binary Filter Bases for Compressing Convolutional Neural Networks

10.36227/techrxiv.17031917.v1 ◽

2021 ◽

Author(s):

Weichao Lan ◽

Yiu-ming Cheung ◽

Liang Lan

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Real Life ◽

Embedded Devices ◽

Compression Performance ◽

Benchmark Datasets ◽

Significant Performance ◽

Low Dimensional ◽

Convolution Filters ◽

Binary Filter

Existing convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks, but a large number of parameters in convolutional layers requires huge storage and computation resources which makes it difficult to deploy CNNs on memory-constraint embedded devices. In this paper, we propose a novel compression method that generates the convolution filters in each layer by combining a set of learnable low-dimensional binary filter bases. The proposed method designs more compact convolution filters by stacking the linear combinations of these filter bases. Because of binary filters, the compact filters can be represented using less number of bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficient through L1-ball projection when conducting linear combination to avoid overfitting. In addition, we analyze the compression performance of the proposed method in detail. Evaluations on four benchmark datasets under VGG-16 and ResNet-18 structures show that the proposed method can achieve a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.

Optimized convolutional neural network architectures for efficient on-device vision-based object detection

Neural Computing and Applications ◽

10.1007/s00521-021-06830-w ◽

2021 ◽

Author(s):

Ivan Rodriguez-Conde ◽

Celso Campos ◽

Florentino Fdez-Riverola

Keyword(s):

Neural Networks ◽

Object Detection ◽

Convolutional Neural Networks ◽

Detection Accuracy ◽

Network Architectures ◽

Structural Level ◽

Embedded Devices ◽

Remote Services ◽

The Cost ◽

Accuracy Speed

AbstractConvolutional neural networks have pushed forward image analysis research and computer vision over the last decade, constituting a state-of-the-art approach in object detection today. The design of increasingly deeper and wider architectures has made it possible to achieve unprecedented levels of detection accuracy, albeit at the cost of both a dramatic computational burden and a large memory footprint. In such a context, cloud systems have become a mainstream technological solution due to their tremendous scalability, providing researchers and practitioners with virtually unlimited resources. However, these resources are typically made available as remote services, requiring communication over the network to be accessed, thus compromising the speed of response, availability, and security of the implemented solution. In view of these limitations, the on-device paradigm has emerged as a recent yet widely explored alternative, pursuing more compact and efficient networks to ultimately enable the execution of the derived models directly on resource-constrained client devices. This study provides an up-to-date review of the more relevant scientific research carried out in this vein, circumscribed to the object detection problem. In particular, the paper contributes to the field with a comprehensive architectural overview of both the existing lightweight object detection frameworks targeted to mobile and embedded devices, and the underlying convolutional neural networks that make up their internal structure. More specifically, it addresses the main structural-level strategies used for conceiving the various components of a detection pipeline (i.e., backbone, neck, and head), as well as the most salient techniques proposed for adapting such structures and the resulting architectures to more austere deployment environments. Finally, the study concludes with a discussion of the specific challenges and next steps to be taken to move toward a more convenient accuracy–speed trade-off.

Photonic convolutional neural networks using integrated diffractive optics

10.36227/techrxiv.11985186.v1 ◽

2020 ◽

Author(s):

Jun Rong Ong ◽

Thomas Yong Long Ang ◽

Chin Chun Ooi ◽

Soon Thor Lim ◽

Ching Eng Png

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Integrated Circuits ◽

Image Classification ◽

Convolutional Neural Networks ◽

Speech Processing ◽

Photonic Integrated Circuits ◽

Diffractive Optics ◽

Model Component ◽

The Fourier Transform

<div>With recent rapid advances in photonic integrated circuits, it has been demonstrated that programmable photonic chips can be used to implement artificial neural networks. Convolutional neural networks (CNN) are a class of deep learning methods that have been highly successful in applications such as image classification and speech processing. We present an architecture to implement a photonic CNN using the Fourier transform property of integrated star couplers. We show, in computer simulation, high accuracy image classification using the MNIST dataset. We also model component imperfections in photonic CNN and show that the performance degradation can be recovered in a programmable chip. Our proposed architecture provides a large reduction in physical footprint compared to current implementations as it utilizes the natural advantages of optics and hence offers a scalable pathway towards integrated photonic deep learning processors.</div>

DWM: A Decomposable Winograd Method for Convolution Acceleration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5838 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4174-4181 ◽

Cited By ~ 1

Author(s):

Di Huang ◽

Xishan Zhang ◽

Rui Zhang ◽

Tian Zhi ◽

Deyuan He ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

High Performance ◽

Kernel Size ◽

Numerical Accuracy ◽

Filtering Algorithm ◽

Large Size ◽

Accuracy Problem

Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ∼2, without affecting the numerical accuracy.

Recovery algorithm to correct silent data corruption of synaptic storage in convolutional neural networks

International Journal of Hybrid Intelligent Systems ◽

10.3233/his-200278 ◽

2020 ◽

Vol 16 (3) ◽

pp. 177-187

Author(s):

Arighna Roy ◽

Simone A. Ludwig

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Embedded Systems ◽

Convolutional Neural Networks ◽

Embedded Devices ◽

Data Set ◽

Recovery Algorithm ◽

Embedded Vision ◽

Data Corruption ◽

High Level

With the surge of computational power and efficient energy consumption management on embedded devices, embedded processing has grown exponentially during the last decade. In particular, computer vision has become prevalent in real-time embedded systems, which have always been a victim of transient fault due to its pervasive presence in harsh environments. Convolutional Neural Networks (CNN) are popular in the domain of embedded vision (computer vision in embedded systems) given the success they have shown. One problem encountered is that a pre-trained CNN on embedded devices is vastly affected by Silent Data Corruption (SDC). SDC refers to undetected data corruption that causes errors in data without any indication that the data is incorrect, and thus goes undetected. In this paper, we propose a software-based approach to recover the corrupted bits of a pre-trained CNN due to SDC. Our approach uses a rule-mining algorithm and we conduct experiments on the propagation of error through the topology of the CNN in order to detect the association of the bits for the weights of the pre-trained CNN. This approach increases the robustness of safety-critical embedded vision applications in volatile conditions. A proof of concept has been conducted for a combination of a CNN and a vision data set. We have successfully established the effectiveness of this approach for a very high level of SDC. The proposed approach can further be extended to other networks and data sets.

Photonic convolutional neural networks using integrated diffractive optics

10.36227/techrxiv.11985186 ◽

2020 ◽

Author(s):

Jun Rong Ong ◽

Thomas Yong Long Ang ◽

Chin Chun Ooi ◽

Soon Thor Lim ◽

Ching Eng Png

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Integrated Circuits ◽

Image Classification ◽

Convolutional Neural Networks ◽

Speech Processing ◽

Photonic Integrated Circuits ◽

Diffractive Optics ◽

Model Component ◽

The Fourier Transform

<div>With recent rapid advances in photonic integrated circuits, it has been demonstrated that programmable photonic chips can be used to implement artificial neural networks. Convolutional neural networks (CNN) are a class of deep learning methods that have been highly successful in applications such as image classification and speech processing. We present an architecture to implement a photonic CNN using the Fourier transform property of integrated star couplers. We show, in computer simulation, high accuracy image classification using the MNIST dataset. We also model component imperfections in photonic CNN and show that the performance degradation can be recovered in a programmable chip. Our proposed architecture provides a large reduction in physical footprint compared to current implementations as it utilizes the natural advantages of optics and hence offers a scalable pathway towards integrated photonic deep learning processors.</div>