An Accurate and Efficient Nonlinear Depth Quantization Scheme

Author(s):  
Jian Jin ◽  
Yao Zhao ◽  
Chunyu Lin ◽  
Anhong Wang
Keyword(s):  
2021 ◽  
Vol 14 (4) ◽  
pp. 1-28
Author(s):  
Tao Yang ◽  
Zhezhi He ◽  
Tengchuan Kou ◽  
Qingzheng Li ◽  
Qi Han ◽  
...  

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.


Information ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 313 ◽  
Author(s):  
Liu Jun ◽  
Luo Zhongqiang ◽  
Xiong Xingzhong

An important function of next-generation (5G) and beyond mobile communication systems is aim to provide thousand-fold capacity growth and to support high-speed data transmission up to several megabits per second. However, the research community and industries have to face a dilemma of power consumption and hardware design to satisfy the increasing communication requirements. For the purpose of improving the system cost, power consumption, and implementation complexity, a novel scheme of symbol timing and frequency offset estimation with low-resolution analog-to-digital converters (ADCs) based on an orthogonal frequency division multiplexing ultra-wideband (OFDM-UWB) system is proposed in this paper. In our work, we first verified the principle that the autocorrelation of the pseudo-noise (PN) sequences was not affected by low-resolution quantization. With the help of this property, the timing synchronization could be strongly implemented against the influence of low-resolution quantization. Then, the transmitted signal structure and low-resolution quantization scheme under the synchronization scheme were designed. Finally, a frequency offset estimation model with one-bit timing synchronization was established. Theoretical analysis and simulation results corroborate that the performance of the proposed scheme not only approximates to that of the full-resolution synchronization scheme, but also has lower power consumption and computational complexity.


Author(s):  
Yuanrui Dong ◽  
Peng Zhao ◽  
Hanqiao Yu ◽  
Cong Zhao ◽  
Shusen Yang

The emerging edge-cloud collaborative Deep Learning (DL) paradigm aims at improving the performance of practical DL implementations in terms of cloud bandwidth consumption, response latency, and data privacy preservation. Focusing on bandwidth efficient edge-cloud collaborative training of DNN-based classifiers, we present CDC, a Classification Driven Compression framework that reduces bandwidth consumption while preserving classification accuracy of edge-cloud collaborative DL. Specifically, to reduce bandwidth consumption, for resource-limited edge servers, we develop a lightweight autoencoder with a classification guidance for compression with classification driven feature preservation, which allows edges to only upload the latent code of raw data for accurate global training on the Cloud. Additionally, we design an adjustable quantization scheme adaptively pursuing the tradeoff between bandwidth consumption and classification accuracy under different network conditions, where only fine-tuning is required for rapid compression ratio adjustment. Results of extensive experiments demonstrate that, compared with DNN training with raw data, CDC consumes 14.9× less bandwidth with an accuracy loss no more than 1.06%, and compared with DNN training with data compressed by AE without guidance, CDC introduces at least 100% lower accuracy loss.


2020 ◽  
Vol 54 (2) ◽  
pp. 203-210
Author(s):  
A.E. Eremenko

In this paper, A. Avila's theoremon convergence of the exact quantization scheme of A.~Vo\-rosis related to the reality proofs of eigenvalues of certain $PT$-symmetricboundary value problems.As a result, a special caseof a conjecture of C. Bender, S. Boettcherand P. Meisinger on reality of eigenvalues is proved.In particular the following Theorem~2 is proved:{\sl Consider the eigenvalue problem$$-w''+(-1)^\ell(iz)^mw=\lambda w,$$where $m\geq 2$ is real, and $(iz)^m$ is the principal branch,$(iz)^m>0$ when $z$ is on the negative imaginary ray,with boundary conditions $w(te^{i\beta})\to 0,\ t\to\infty,$where$ \beta=\pi/2\pm\frac{\ell+1}{m+2}\pi.$If $\ell=2$, and $m\geq 4$, then all eigenvalues are positive.}\


2018 ◽  
Vol 4 (1) ◽  
pp. 3 ◽  
Author(s):  
Run Cheng ◽  
Yong-Long Wang ◽  
Hua Jiang ◽  
Xiao-Jun Liu ◽  
Hong-Shi Zong

In the spirit of the thin-layer quantization scheme, we give the effective Shrödinger equation for a particle confined to a corrugated torus, in which the geometric potential is substantially changed by corrugation. We find the attractive wells reconstructed by the corrugation not being at identical depths, which is strikingly different from that of a corrugated nanotube, especially in the inner side of the torus. By numerically calculating the transmission probability, we find that the resonant tunneling peaks and the transmission gaps are merged and broadened by the corrugation of the inner side of torus. These results show that the quarter corrugated torus can be used not only to connect two tubes with different radiuses in different directions, but also to filter the particles with particular incident energies.


Sign in / Sign up

Export Citation Format

Share Document