scholarly journals Stochastic Dividers for Low Latency Neural Networks

Author(s):  
Shanshan Liu ◽  
Xiaochen Tang ◽  
Farzad Niknia ◽  
Pedro Reviriego ◽  
Weiqiang Liu ◽  
...  

Stochastic computing (SC) is attractive for hardware implementation due to its low complexity in arithmetic unit design; therefore, SC has attracted considerable interest to implement Artificial Neural Networks (ANNs) for resources-limited applications, because ANNs must usually perform a large number of arithmetic operations. To attain a high computation accuracy in an SC-based ANN, extended stochastic logic is utilized together with standard SC units and thus, a stochastic divider is required to perform the conversion between these logic representations. However, as the most complex SC arithmetic unit, the conventional divider incurs in a large computation latency; this limits an SC implementation for ANNs used in applications needing high performance. Therefore, there is a need to design fast stochastic dividers for SC-based ANNs. Recent works (e.g., a binary searching and triple modular redundancy (BS-TMR) based stochastic divider) are targeting a reduction in computation latency, while keeping nearly the same accuracy compared with the traditional (conventional) design. However, this divider still requires <i>N</i> iterations to deal with 2<i><sup>N</sup></i>-bit stochastic sequences, and thus the latency increases in proportion to the sequence length. In this paper, a decimal searching and TMR (DS-TMR) based stochastic divider is initially proposed to further reduce the computation latency; it only requires two iterations to calculate the quotient, so regardless of the sequence length. Moreover, a second trade-off design between accuracy and hardware is also presented. An SC-based Multi-Layer Perceptron (MLP) is then considered to show the effectiveness of the proposed dividers; results show that when utilizing the proposed dividers, MLP achieves the lowest computation latency while keeping the classification results at the same accuracy. When using as combined metric the product of the latency and power dissipation, the proposed designs are also shown to be superior to the SC-based MLPs employing other dividers found in the technical literature as well as the commonly used 32-bit floating point implementation. This makes the proposed dividers very attractive compared with the existing schemes for SC-based ANNs.

2021 ◽  
Author(s):  
Shanshan Liu ◽  
Xiaochen Tang ◽  
Farzad Niknia ◽  
Pedro Reviriego ◽  
Weiqiang Liu ◽  
...  

Stochastic computing (SC) is attractive for hardware implementation due to its low complexity in arithmetic unit design; therefore, SC has attracted considerable interest to implement Artificial Neural Networks (ANNs) for resources-limited applications, because ANNs must usually perform a large number of arithmetic operations. To attain a high computation accuracy in an SC-based ANN, extended stochastic logic is utilized together with standard SC units and thus, a stochastic divider is required to perform the conversion between these logic representations. However, as the most complex SC arithmetic unit, the conventional divider incurs in a large computation latency; this limits an SC implementation for ANNs used in applications needing high performance. Therefore, there is a need to design fast stochastic dividers for SC-based ANNs. Recent works (e.g., a binary searching and triple modular redundancy (BS-TMR) based stochastic divider) are targeting a reduction in computation latency, while keeping nearly the same accuracy compared with the traditional (conventional) design. However, this divider still requires <i>N</i> iterations to deal with 2<i><sup>N</sup></i>-bit stochastic sequences, and thus the latency increases in proportion to the sequence length. In this paper, a decimal searching and TMR (DS-TMR) based stochastic divider is initially proposed to further reduce the computation latency; it only requires two iterations to calculate the quotient, so regardless of the sequence length. Moreover, a second trade-off design between accuracy and hardware is also presented. An SC-based Multi-Layer Perceptron (MLP) is then considered to show the effectiveness of the proposed dividers; results show that when utilizing the proposed dividers, MLP achieves the lowest computation latency while keeping the classification results at the same accuracy. When using as combined metric the product of the latency and power dissipation, the proposed designs are also shown to be superior to the SC-based MLPs employing other dividers found in the technical literature as well as the commonly used 32-bit floating point implementation. This makes the proposed dividers very attractive compared with the existing schemes for SC-based ANNs.


Author(s):  
А.Д. Обухов ◽  
М.Н. Краснянский ◽  
М.С. Николюкин

Рассматривается проблема выбора оптимальных параметров интерфейса в информационных системах с целью его персонализации под предпочтения пользователя и возможности его оборудования. В настоящее время для ее решения используется алгоритмическое обеспечение и статистическая обработка предпочтений пользователей, что не обеспечивает достаточной гибкости и точности. Поэтому в данной работе предлагается применение разработанного метода адаптации параметров интерфейса, основанного на анализе и обработке пользовательской информации с помощью нейронных сетей. Научная новизна метода заключается в автоматизации сбора, анализа данных и настройки интерфейса за счет использования и интеграции нейронных сетей в информационную систему. Рассмотрена практическая реализация предлагаемого метода на Python. Экспертная оценка адаптивности интерфейса тестовой информационной системы после внедрения разработанного метода показала его перспективность и эффективность. Разработанный метод показывает лучшую точность и низкую сложность программной реализации относительно классического алгоритмического подхода. Полученные результаты могут использоваться для автоматизации процесса выбора компонентов интерфейса различных информационных систем. Дальнейшие исследования заключаются в развитии и интеграции разработанного метода в рамках фреймворка адаптации информационных систем Here we consider the problem of choosing the optimal parameters of the interface in information systems with the aim of personalizing it for the preferences of the user and the capabilities of his equipment. Currently, algorithmic support and statistical processing of user preferences are used to solve it, which does not provide sufficient flexibility and accuracy. Therefore, in this work, we propose the application of the developed method for adapting interface parameters based on the analysis and processing of user information using neural networks. The scientific novelty of the method is to automate the collection, analysis of data and interface settings through the use and integration of neural networks in the information system. We consider the practical implementation of the proposed method in Python. An expert assessment of the adaptability of the interface of the test information system after the implementation of the developed method showed its availability and efficiency. The developed method shows the best accuracy and low complexity of software implementation relative to the classical algorithmic approach. The results obtained can be used to automate the selection of interface components for various information systems. Further research consists in the development and integration of the developed method within the framework of the information systems adaptation framework


Author(s):  
Sai Venkatramana Prasada G.S ◽  
G. Seshikala ◽  
S. Niranjana

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 223
Author(s):  
Yen-Ling Tai ◽  
Shin-Jhe Huang ◽  
Chien-Chang Chen ◽  
Henry Horng-Shing Lu

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 700
Author(s):  
Yufei Zhu ◽  
Zuocheng Xing ◽  
Zerun Li ◽  
Yang Zhang ◽  
Yifan Hu

This paper presents a novel parallel quasi-cyclic low-density parity-check (QC-LDPC) encoding algorithm with low complexity, which is compatible with the 5th generation (5G) new radio (NR). Basing on the algorithm, we propose a high area-efficient parallel encoder with compatible architecture. The proposed encoder has the advantages of parallel encoding and pipelined operations. Furthermore, it is designed as a configurable encoding structure, which is fully compatible with different base graphs of 5G LDPC. Thus, the encoder architecture has flexible adaptability for various 5G LDPC codes. The proposed encoder was synthesized in a 65 nm CMOS technology. According to the encoder architecture, we implemented nine encoders for distributed lifting sizes of two base graphs. The eperimental results show that the encoder has high performance and significant area-efficiency, which is better than related prior art. This work includes a whole set of encoding algorithm and the compatible encoders, which are fully compatible with different base graphs of 5G LDPC codes. Therefore, it has more flexible adaptability for various 5G application scenarios.


Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1614
Author(s):  
Jonghun Jeong ◽  
Jong Sung Park ◽  
Hoeseok Yang

Recently, the necessity to run high-performance neural networks (NN) is increasing even in resource-constrained embedded systems such as wearable devices. However, due to the high computational and memory requirements of the NN applications, it is typically infeasible to execute them on a single device. Instead, it has been proposed to run a single NN application cooperatively on top of multiple devices, a so-called distributed neural network. In the distributed neural network, workloads of a single big NN application are distributed over multiple tiny devices. While the computation overhead could effectively be alleviated by this approach, the existing distributed NN techniques, such as MoDNN, still suffer from large traffics between the devices and vulnerability to communication failures. In order to get rid of such big communication overheads, a knowledge distillation based distributed NN, called Network of Neural Networks (NoNN), was proposed, which partitions the filters in the final convolutional layer of the original NN into multiple independent subsets and derives smaller NNs out of each subset. However, NoNN also has limitations in that the partitioning result may be unbalanced and it considerably compromises the correlation between filters in the original NN, which may result in an unacceptable accuracy degradation in case of communication failure. In this paper, in order to overcome these issues, we propose to enhance the partitioning strategy of NoNN in two aspects. First, we enhance the redundancy of the filters that are used to derive multiple smaller NNs by means of averaging to increase the immunity of the distributed NN to communication failure. Second, we propose a novel partitioning technique, modified from Eigenvector-based partitioning, to preserve the correlation between filters as much as possible while keeping the consistent number of filters distributed to each device. Throughout extensive experiments with the CIFAR-100 (Canadian Institute For Advanced Research-100) dataset, it has been observed that the proposed approach maintains high inference accuracy (over 70%, 1.53× improvement over the state-of-the-art approach), on average, even when a half of eight devices in a distributed NN fail to deliver their partial inference results.


Integration ◽  
2019 ◽  
Vol 65 ◽  
pp. 395-403 ◽  
Author(s):  
Ji Li ◽  
Zihao Yuan ◽  
Zhe Li ◽  
Ao Ren ◽  
Caiwen Ding ◽  
...  

Author(s):  
Withit Chatlatanagulchai ◽  
Peter H. Meckl

Flexibility at the joint of a manipulator is an intrinsic property. Even “rigid-joint” robots, in fact, possess a certain amount of flexibility. Previous experiments confirmed that joint flexibility should be explicitly included in the model when designing a high-performance controller for a manipulator because the flexibility, if not dealt with, can excite system natural frequencies and cause severe damage. However, control design for a flexible-joint robot manipulator is still an open problem. Besides being described by a complicated system model for which the passivity property does not hold, the manipulator is also underactuated, that is, the control input does not drive the link directly, but through the flexible dynamics. Our work offers another possible solution to this open problem. We use three-layer neural networks to represent the system model. Their weights are adapted in real time and from scratch, which means we do not need the mathematical model of the robot in our control algorithm. All uncertainties are handled by variable-structure control. Backstepping structure allows input efforts to be applied to each subsystem where they are needed. Control laws to adjust all adjustable parameters are devised using Lyapunov’s second method to ensure that error trajectories are globally uniformly ultimately bounded. We present two state-feedback schemes: first, when neural networks are used to represent the unknown plant, and second, when neural networks are used to represent the unknown parts of the control laws. In the former case, we also design an observer to enable us to design a control law using only output signals—the link positions. We use simulations to compare our algorithms with some other well-known techniques. We use experiments to demonstrate the practicality of our algorithms.


Sign in / Sign up

Export Citation Format

Share Document