scholarly journals Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss

Author(s):  
Feng Zheng ◽  
Cheng Deng ◽  
Heng Huang

In order to solve the problem of memory consumption and computational requirements, this paper proposes a novel learning binary neural network framework to achieve a resource-efficient deep hashing. In contrast to floating-point (32-bit) full-precision networks, the proposed method achieves a 32x model compression rate. At the same time, computational burden in convolution is greatly reduced due to efficient Boolean operations. To this end, in our framework, a new quantization loss defined between the binary weights and the learned real values is minimized to reduce the model distortion, while, by minimizing a binary entropy function, the discrete optimization is successfully avoided and the stochastic gradient descend method can be used smoothly. More importantly, we provide two theories to demonstrate the necessity and effectiveness of minimizing the quantization losses for both weights and activations. Numerous experiments show that the proposed method can achieve fast code generation without sacrificing accuracy.

Algorithms ◽  
2018 ◽  
Vol 11 (10) ◽  
pp. 159 ◽  
Author(s):  
Yulin Zhao ◽  
Donghui Wang ◽  
Leiou Wang ◽  
Peng Liu

Convolutional neural networks have achieved remarkable improvements in image and video recognition but incur a heavy computational burden. To reduce the computational complexity of a convolutional neural network, this paper proposes an algorithm based on the Winograd minimal filtering algorithm and Strassen algorithm. Theoretical assessments of the proposed algorithm show that it can dramatically reduce computational complexity. Furthermore, the Visual Geometry Group (VGG) network is employed to evaluate the algorithm in practice. The results show that the proposed algorithm can provide the optimal performance by combining the savings of these two algorithms. It saves 75% of the runtime compared with the conventional algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Hongfei Ling ◽  
Weiwei Zhang ◽  
Yingjie Tao ◽  
Mi Zhou

ResNet has been widely used in the field of machine learning since it was proposed. This network model is successful in extracting features from input data by superimposing multiple layers of neural networks and thus achieves high accuracy in many applications. However, the superposition of multilayer neural networks increases their computational cost. For this reason, we propose a network model compression technique that removes multiple neural network layers from ResNet without decreasing the accuracy rate. The key idea is to provide a priority term to identify the importance of each neural network layer, and then select the unimportant layers to be removed during the training process based on the priority of the neural network layers. In addition, this paper also retrains the network model to avoid the accuracy degradation caused by the deletion of network layers. Experiments demonstrate that the network size can be reduced by 24.00%–42.86% of the number of layers without reducing the classification accuracy when classification is performed on CIFAR-10/100 and ImageNet.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Xin Long ◽  
XiangRong Zeng ◽  
Zongcheng Ben ◽  
Dianle Zhou ◽  
Maojun Zhang

The increase in sophistication of neural network models in recent years has exponentially expanded memory consumption and computational cost, thereby hindering their applications on ASIC, FPGA, and other mobile devices. Therefore, compressing and accelerating the neural networks are necessary. In this study, we introduce a novel strategy to train low-bit networks with weights and activations quantized by several bits and address two corresponding fundamental issues. One is to approximate activations through low-bit discretization for decreasing network computational cost and dot-product memory. The other is to specify weight quantization and update mechanism for discrete weights to avoid gradient mismatch. With quantized low-bit weights and activations, the costly full-precision operation will be replaced by shift operation. We evaluate the proposed method on common datasets, and results show that this method can dramatically compress the neural network with slight accuracy loss.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Febin P. Sunny ◽  
Asif Mirza ◽  
Mahdi Nikdast ◽  
Sudeep Pasricha

Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.


Author(s):  
T. Beran ◽  
T. Macek

This chapter describes a rather less traditional technique of text processing. The technique is based on the binary neural network Correlation Matrix Memory. We propose using the neural network for text searching tasks. Two methods of coding input words are described and tested. Further, we discuss the problems of using this approach for text processing.


Author(s):  
Shiwei Liu ◽  
Iftitahu Ni’mah ◽  
Vlado Menkovski ◽  
Decebal Constantin Mocanu ◽  
Mykola Pechenizkiy

AbstractRecurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.


2021 ◽  
pp. 1-13
Author(s):  
Jingfei Chang ◽  
Yang Lu ◽  
Ping Xue ◽  
Xing Wei ◽  
Zhen Wei

Deep convolutional neural network (CNN) is difficult to deploy to mobile and portable devices due to its large number of parameters and floating-point operations (FLOPs). To tackle this problem, we propose a novel channel pruning method. We use the modified squeeze-and-excitation blocks (MSEB) to measure the importance of the channels in the convolutional layers. The unimportant channels, including convolutional kernels related to them, are pruned directly, which greatly reduces the storage cost and the number of calculations. For ResNet with basic blocks, we propose an approach to consistently prune all residual blocks in the same stage to ensure that the compact network structure is dimensionally correct. After pruning we retrain the compact network from scratch to restore the accuracy. Finally, we verify our method on CIFAR-10, CIFAR-100 and ILSVRC-2012. The results indicate that the performance of the compact network is better than the original network when the pruning rate is small. Even when the pruning amplitude is large, the accuracy can also be maintained or decreased slightly. On the CIFAR-100, when reducing the parameters and FLOPs up to 82% and 62% respectively, the accuracy of VGG-19 even improve by 0.54% after retraining. The source code is available at https://github.com/JingfeiChang/UCP.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Krzysztof Wróbel ◽  
Michał Karwatowski ◽  
Maciej Wielgosz ◽  
Marcin Pietroń ◽  
Kazimierz Wiatr

Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were applied to other domains, including Natural Language Processing (NLP). Nowadays, the solutions based on artificial intelligence appear on mobile devices and in embedded systems, which places constraints on, among others, the memory and power consumption. Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described. The conducted simulations showed that 5-bit width is enough to ensure no drop in accuracy when compared to the floating point version of the network. Additionally, the memory footprint was significantly reduced (between 85% and 93% comparing to the original model).


Author(s):  
Xiaotian Zhu ◽  
Wengang Zhou ◽  
Houqiang Li

Modern deep learning models usually suffer high complexity in model size and computation when transplanted to resource constrained platforms. To this end, many works are dedicated to compressing deep neural networks. Adding group LASSO regularization is one of the most effective model compression methods since it generates structured sparse networks. We investigate the deep neural networks trained by group LASSO constraint and observe that even with strong sparsity regularization imposed, there still exists substantial filter correlation among the convolution filters, which is undesired for a compact neural network. We propose to suppress such correlation with a new kind of constraint called decorrelation regularization, which explicitly forces the network to learn a set of less correlated filters. The experiments on CIFAR10/100 and ILSVRC2012 datasets show that when combined our decorrelation regularization with group LASSO, the correlation between filters could be effectively weakened, which increases the sparsity of the resulting model and leads to better compressing performance.


2020 ◽  
Vol 34 (04) ◽  
pp. 5495-5502
Author(s):  
Ren Ao ◽  
Zhang Tao ◽  
Wang Yuhao ◽  
Lin Sheng ◽  
Dong Peiyan ◽  
...  

The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the model size and thus the amount of computation. And thereby, the state-of-the-art DNNs are able to be deployed on those devices with high runtime energy efficiency. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without diving into the deep to analyze the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined.In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the position characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that can effectively take advantage of the intrinsic characteristics of neural networks, and thereby outperform prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that DARB can achieve 13× to 25× pruning ratio, which are 2.8× to 4.3× improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover, DARB can achieve 14.3× decoding efficiency than block pruning with higher pruning ratio.


Sign in / Sign up

Export Citation Format

Share Document