uniform quantization
Recently Published Documents


TOTAL DOCUMENTS

138
(FIVE YEARS 50)

H-INDEX

11
(FIVE YEARS 5)

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1699
Author(s):  
Jelena Nikolić ◽  
Zoran Perić ◽  
Danijela Aleksić ◽  
Stefan Tomić ◽  
Aleksandra Jovanović

Driven by the need for the compression of weights in neural networks (NNs), which is especially beneficial for edge devices with a constrained resource, and by the need to utilize the simplest possible quantization model, in this paper, we study the performance of three-bit post-training uniform quantization. The goal is to put various choices of the key parameter of the quantizer in question (support region threshold) in one place and provide a detailed overview of this choice’s impact on the performance of post-training quantization for the MNIST dataset. Specifically, we analyze whether it is possible to preserve the accuracy of the two NN models (MLP and CNN) to a great extent with the very simple three-bit uniform quantizer, regardless of the choice of the key parameter. Moreover, our goal is to answer the question of whether it is of the utmost importance in post-training three-bit uniform quantization, as it is in quantization, to determine the optimal support region threshold value of the quantizer to achieve some predefined accuracy of the quantized neural network (QNN). The results show that the choice of the support region threshold value of the three-bit uniform quantizer does not have such a strong impact on the accuracy of the QNNs, which is not the case with two-bit uniform post-training quantization, when applied in MLP for the same classification task. Accordingly, one can anticipate that due to this special property, the post-training quantization model in question can be greatly exploited.


2021 ◽  
Vol 2134 (1) ◽  
pp. 012004
Author(s):  
D Chudakov ◽  
A Goncharenko ◽  
S Alyamkin ◽  
A Densidov

Abstract Quantization is one of the most popular and widely used methods of speeding up a neural network. At the moment, the standard is 8-bit uniform quantization. Nevertheless, the use of uniform low-bit quantization (4- and 6-bit quantization) has significant advantages in speed and resource requirements for inference. We present our quantization algorithm that offers advantages when using uniform low-bit quantization. It is faster than quantization-aware training from scratch and more accurate than methods aimed only at selecting thresholds and reducing noise from quantization. We also investigated quantization noise in neural networks for low-bit quantization and concluded that quantization noise is not always a good metric for quantization quality.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3091
Author(s):  
Jelena Nikolić ◽  
Danijela Aleksić ◽  
Zoran Perić ◽  
Milan Dinčić

Motivated by the fact that uniform quantization is not suitable for signals having non-uniform probability density functions (pdfs), as the Laplacian pdf is, in this paper we have divided the support region of the quantizer into two disjunctive regions and utilized the simplest uniform quantization with equal bit-rates within both regions. In particular, we assumed a narrow central granular region (CGR) covering the peak of the Laplacian pdf and a wider peripheral granular region (PGR) where the pdf is predominantly tailed. We performed optimization of the widths of CGR and PGR via distortion optimization per border–clipping threshold scaling ratio which resulted in an iterative formula enabling the parametrization of our piecewise uniform quantizer (PWUQ). For medium and high bit-rates, we demonstrated the convenience of our PWUQ over the uniform quantizer, paying special attention to the case where 99.99% of the signal amplitudes belong to the support region or clipping region. We believe that the resulting formulas for PWUQ design and performance assessment are greatly beneficial in neural networks where weights and activations are typically modelled by the Laplacian distribution, and where uniform quantization is commonly used to decrease memory footprint.


2021 ◽  
Author(s):  
Sunwoo Lee ◽  
Jaeyoung Jeon ◽  
Kitae Eom ◽  
Chaehwa Jeong ◽  
Yongsoo Yang ◽  
...  

Abstract Memristors are essential elements for hardware implementation of artificial neural networks. The key functionality of the memristors is to realize multiple non-volatile conductance states with high precision. However, the variation of device conductance limits the number of allowed states. Since actual data for neural network training inherently have a non-uniform distribution, the insufficient number of conductance states and the resultant inaccurate weight quantization may generate significant errors in the memristor-based computation. Herein, we demonstrate a multi-level memristor based on two-dimensional electron gas in a Pt/LaAlO3/SrTiO3 heterostructure. By redistributing oxygen vacancies, we precisely controlled the tunneling conductance of the device, achieving multiple conductance states (more than 27). The multi-level switching capability and the high retention performance allow us to implement a variance-aware weight quantization (VAQ), designed for improved computing accuracy. We verify that the VAQ provides greater accuracy in image classification process, as compared to conventional uniform quantization. These results provide valuable insight into developing high-precision multi-bit memristors for practical neuromorphic processors.


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1354
Author(s):  
Qunlin Chen ◽  
Derong Chen ◽  
Jiulu Gong

Block compressed sensing (BCS) is a promising technology for image sampling and compression for resource-constrained applications, but it needs to balance the sampling rate and quantization bit-depth for a bit-rate constraint. In this paper, we summarize the commonly used CS quantization frameworks into a unified framework, and a new bit-rate model and a model of the optimal bit-depth are proposed for the unified CS framework. The proposed bit-rate model reveals the relationship between the bit-rate, sampling rate, and bit-depth based on the information entropy of generalized Gaussian distribution. The optimal bit-depth model can predict the optimal bit-depth of CS measurements at a given bit-rate. Then, we propose a general algorithm for choosing sampling rate and bit-depth based on the proposed models. Experimental results show that the proposed algorithm achieves near-optimal rate-distortion performance for the uniform quantization framework and predictive quantization framework in BCS.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jiayan Wen ◽  
Haijiang Zhang ◽  
Guangxing Tan ◽  
Ning Cai ◽  
Guangming Xie

This article focuses on circle formation control problem of multiagent systems based on event-triggered strategy under limited communication bandwidth. In such system, each agent can only perceive the angular distance of its nearest neighbor in the counterclockwise direction, and the angular distance of the nearest neighbor in the clockwise direction needs to be obtained by communicating with each other. In order to address the aforementioned problem, a novel distributed algorithm based on the combination of nonuniform quantitative communication technology and event-triggered control is proposed. Sufficient conditions on circle formation control are derived under which the states of all agents can be confirmed to converge to some desired equilibrium point. Different from the traditional uniform quantization communication framework, nonuniform quantization can be beneficial for handling small signals and improving the performance of multiagent systems concerned. Furthermore, under the proposed policy, all the designed quantizers do not emerge saturated. Numerical simulation results are provided to verify the effectiveness of the proposed algorithm.


Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 933
Author(s):  
Zoran Perić ◽  
Milan Savić ◽  
Nikola Simić ◽  
Bojan Denić ◽  
Vladimir Despotović

Achieving real-time inference is one of the major issues in contemporary neural network applications, as complex algorithms are frequently being deployed to mobile devices that have constrained storage and computing power. Moving from a full-precision neural network model to a lower representation by applying quantization techniques is a popular approach to facilitate this issue. Here, we analyze in detail and design a 2-bit uniform quantization model for Laplacian source due to its significance in terms of implementation simplicity, which further leads to a shorter processing time and faster inference. The results show that it is possible to achieve high classification accuracy (more than 96% in the case of MLP and more than 98% in the case of CNN) by implementing the proposed model, which is competitive to the performance of the other quantization solutions with almost optimal precision.


2021 ◽  
Author(s):  
Soumyadeep Datta

<p>Cell-free (CF) massive multiple-input-multiple-output (mMIMO) deployments are usually investigated with half-duplex nodes and high-capacity fronthaul links. To leverage the possible gains in throughput and energy efficiency (EE) of full-duplex (FD) communications, we consider a FD CF mMIMO system with practical limited-capacity fronthaul links. We derive closed-form spectral efficiency (SE) lower bounds for this system with maximum-ratio combining/maximum-ratio transmission processing and optimal uniform quantization. We then optimize the weighted sum EE (WSEE) via downlink and uplink power control by using a two-layered approach: the first layer formulates the optimization as a generalized convex program, while the second layer solves the optimization decentrally using alternating direction method of multipliers. We analytically show that the proposed two-layered formulation yields a Karush-Kuhn-Tucker point of the original WSEE optimization. We numerically show the influence of weights on the individual EE of the users, which demonstrates the utility of WSEE metric to incorporate heterogeneous EE requirements of users. We show that the low fronthaul capacity reduces the number of users each AP can support, and the cell-free system, consequently, becomes user-centric.</p>


2021 ◽  
Author(s):  
Soumyadeep Datta

<p>Cell-free (CF) massive multiple-input-multiple-output (mMIMO) deployments are usually investigated with half-duplex nodes and high-capacity fronthaul links. To leverage the possible gains in throughput and energy efficiency (EE) of full-duplex (FD) communications, we consider a FD CF mMIMO system with practical limited-capacity fronthaul links. We derive closed-form spectral efficiency (SE) lower bounds for this system with maximum-ratio combining/maximum-ratio transmission processing and optimal uniform quantization. We then optimize the weighted sum EE (WSEE) via downlink and uplink power control by using a two-layered approach: the first layer formulates the optimization as a generalized convex program, while the second layer solves the optimization decentrally using alternating direction method of multipliers. We analytically show that the proposed two-layered formulation yields a Karush-Kuhn-Tucker point of the original WSEE optimization. We numerically show the influence of weights on the individual EE of the users, which demonstrates the utility of WSEE metric to incorporate heterogeneous EE requirements of users. We show that the low fronthaul capacity reduces the number of users each AP can support, and the cell-free system, consequently, becomes user-centric.</p>


Sign in / Sign up

Export Citation Format

Share Document