scholarly journals An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

Electronics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 371 ◽  
Author(s):  
Qinyu Chen ◽  
Yuxiang Fu ◽  
Wenqing Song ◽  
Kaifeng Cheng ◽  
Zhonghai Lu ◽  
...  

Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of 0.17 mm 2 . It can achieve 7.03 TOPS/W energy efficiency and 4.14 TOPS/mm 2 area efficiency at 100.1 mW, which makes it a promising design for the embedded devices.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Hyungmin Cho

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.



Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3768
Author(s):  
Chanjun Chun ◽  
Kwang Myung Jeon ◽  
Wooyeol Choi

Deep neural networks (DNNs) have achieved significant advancements in speech processing, and numerous types of DNN architectures have been proposed in the field of sound localization. When a DNN model is deployed for sound localization, a fixed input size is required. This is generally determined by the number of microphones, the fast Fourier transform size, and the frame size. if the numbers or configurations of the microphones change, the DNN model should be retrained because the size of the input features changes. in this paper, we propose a configuration-invariant sound localization technique using the azimuth-frequency representation and convolutional neural networks (CNNs). the proposed CNN model receives the azimuth-frequency representation instead of time-frequency features as the input features. the proposed model was evaluated in different environments from the microphone configuration in which it was originally trained. for evaluation, single sound source is simulated using the image method. Through the evaluations, it was confirmed that the localization performance was superior to the conventional steered response power phase transform (SRP-PHAT) and multiple signal classification (MUSIC) methods.



2021 ◽  
Author(s):  
Mathieu de Bony de Lavergne ◽  
Hyuga Abe ◽  
Arnau Aguasca ◽  
Ivan Agudo ◽  
Lucio Angelo Antonelli ◽  
...  


2021 ◽  
Author(s):  
Weichao Lan ◽  
Yiu-ming Cheung ◽  
Liang Lan

Existing convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks, but a large number of parameters in convolutional layers requires huge storage and computation resources which makes it difficult to deploy CNNs on memory-constraint embedded devices. In this paper, we propose a novel compression method that generates the convolution filters in each layer by combining a set of learnable low-dimensional binary filter bases. The proposed method designs more compact convolution filters by stacking the linear combinations of these filter bases. Because of binary filters, the compact filters can be represented using less number of bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficient through L1-ball projection when conducting linear combination to avoid overfitting. In addition, we analyze the compression performance of the proposed method in detail. Evaluations on four benchmark datasets under VGG-16 and ResNet-18 structures show that the proposed method can achieve a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.



2021 ◽  
Author(s):  
Weichao Lan ◽  
Yiu-ming Cheung ◽  
Liang Lan

Existing convolutional neural networks (CNNs) have achieved significant performance on various real-life tasks, but a large number of parameters in convolutional layers requires huge storage and computation resources which makes it difficult to deploy CNNs on memory-constraint embedded devices. In this paper, we propose a novel compression method that generates the convolution filters in each layer by combining a set of learnable low-dimensional binary filter bases. The proposed method designs more compact convolution filters by stacking the linear combinations of these filter bases. Because of binary filters, the compact filters can be represented using less number of bits so that the network can be highly compressed. Furthermore, we explore the sparsity of coefficient through L1-ball projection when conducting linear combination to avoid overfitting. In addition, we analyze the compression performance of the proposed method in detail. Evaluations on four benchmark datasets under VGG-16 and ResNet-18 structures show that the proposed method can achieve a higher compression ratio with comparable accuracy compared with the existing state-of-the-art filter decomposition and network quantization methods.



Author(s):  
Ivan Rodriguez-Conde ◽  
Celso Campos ◽  
Florentino Fdez-Riverola

AbstractConvolutional neural networks have pushed forward image analysis research and computer vision over the last decade, constituting a state-of-the-art approach in object detection today. The design of increasingly deeper and wider architectures has made it possible to achieve unprecedented levels of detection accuracy, albeit at the cost of both a dramatic computational burden and a large memory footprint. In such a context, cloud systems have become a mainstream technological solution due to their tremendous scalability, providing researchers and practitioners with virtually unlimited resources. However, these resources are typically made available as remote services, requiring communication over the network to be accessed, thus compromising the speed of response, availability, and security of the implemented solution. In view of these limitations, the on-device paradigm has emerged as a recent yet widely explored alternative, pursuing more compact and efficient networks to ultimately enable the execution of the derived models directly on resource-constrained client devices. This study provides an up-to-date review of the more relevant scientific research carried out in this vein, circumscribed to the object detection problem. In particular, the paper contributes to the field with a comprehensive architectural overview of both the existing lightweight object detection frameworks targeted to mobile and embedded devices, and the underlying convolutional neural networks that make up their internal structure. More specifically, it addresses the main structural-level strategies used for conceiving the various components of a detection pipeline (i.e., backbone, neck, and head), as well as the most salient techniques proposed for adapting such structures and the resulting architectures to more austere deployment environments. Finally, the study concludes with a discussion of the specific challenges and next steps to be taken to move toward a more convenient accuracy–speed trade-off.



2020 ◽  
Author(s):  
Jun Rong Ong ◽  
Thomas Yong Long Ang ◽  
Chin Chun Ooi ◽  
Soon Thor Lim ◽  
Ching Eng Png

<div>With recent rapid advances in photonic integrated circuits, it has been demonstrated that programmable photonic chips can be used to implement artificial neural networks. Convolutional neural networks (CNN) are a class of deep learning methods that have been highly successful in applications such as image classification and speech processing. We present an architecture to implement a photonic CNN using the Fourier transform property of integrated star couplers. We show, in computer simulation, high accuracy image classification using the MNIST dataset. We also model component imperfections in photonic CNN and show that the performance degradation can be recovered in a programmable chip. Our proposed architecture provides a large reduction in physical footprint compared to current implementations as it utilizes the natural advantages of optics and hence offers a scalable pathway towards integrated photonic deep learning processors.</div>



2020 ◽  
Vol 34 (04) ◽  
pp. 4174-4181 ◽  
Author(s):  
Di Huang ◽  
Xishan Zhang ◽  
Rui Zhang ◽  
Tian Zhi ◽  
Deyuan He ◽  
...  

Winograd's minimal filtering algorithm has been widely used in Convolutional Neural Networks (CNNs) to reduce the number of multiplications for faster processing. However, it is only effective on convolutions with kernel size as 3x3 and stride as 1, because it suffers from significantly increased FLOPs and numerical accuracy problem for kernel size larger than 3x3 and fails on convolution with stride larger than 1. In this paper, we propose a novel Decomposable Winograd Method (DWM), which breaks through the limitation of original Winograd's minimal filtering algorithm to a wide and general convolutions. DWM decomposes kernels with large size or large stride to several small kernels with stride as 1 for further applying Winograd method, so that DWM can reduce the number of multiplications while keeping the numerical accuracy. It enables the fast exploring of larger kernel size and larger stride value in CNNs for high performance and accuracy and even the potential for new CNNs. Comparing against the original Winograd, the proposed DWM is able to support all kinds of convolutions with a speedup of ∼2, without affecting the numerical accuracy.



2020 ◽  
Vol 16 (3) ◽  
pp. 177-187
Author(s):  
Arighna Roy ◽  
Simone A. Ludwig

With the surge of computational power and efficient energy consumption management on embedded devices, embedded processing has grown exponentially during the last decade. In particular, computer vision has become prevalent in real-time embedded systems, which have always been a victim of transient fault due to its pervasive presence in harsh environments. Convolutional Neural Networks (CNN) are popular in the domain of embedded vision (computer vision in embedded systems) given the success they have shown. One problem encountered is that a pre-trained CNN on embedded devices is vastly affected by Silent Data Corruption (SDC). SDC refers to undetected data corruption that causes errors in data without any indication that the data is incorrect, and thus goes undetected. In this paper, we propose a software-based approach to recover the corrupted bits of a pre-trained CNN due to SDC. Our approach uses a rule-mining algorithm and we conduct experiments on the propagation of error through the topology of the CNN in order to detect the association of the bits for the weights of the pre-trained CNN. This approach increases the robustness of safety-critical embedded vision applications in volatile conditions. A proof of concept has been conducted for a combination of a CNN and a vision data set. We have successfully established the effectiveness of this approach for a very high level of SDC. The proposed approach can further be extended to other networks and data sets.



2020 ◽  
Author(s):  
Jun Rong Ong ◽  
Thomas Yong Long Ang ◽  
Chin Chun Ooi ◽  
Soon Thor Lim ◽  
Ching Eng Png

<div>With recent rapid advances in photonic integrated circuits, it has been demonstrated that programmable photonic chips can be used to implement artificial neural networks. Convolutional neural networks (CNN) are a class of deep learning methods that have been highly successful in applications such as image classification and speech processing. We present an architecture to implement a photonic CNN using the Fourier transform property of integrated star couplers. We show, in computer simulation, high accuracy image classification using the MNIST dataset. We also model component imperfections in photonic CNN and show that the performance degradation can be recovered in a programmable chip. Our proposed architecture provides a large reduction in physical footprint compared to current implementations as it utilizes the natural advantages of optics and hence offers a scalable pathway towards integrated photonic deep learning processors.</div>



Sign in / Sign up

Export Citation Format

Share Document