Range-Lookup Approximate Computing Acceleration for Any Activation Functions in Low-Power Neural Network

We propose the concept of using superconducting quantum interferometers for the implementation of neural network algorithms with extremely low power dissipation. These adiabatic elements are Josephson cells with sigmoid- and Gaussian-like activation functions. We optimize their parameters for application in three-layer perceptron and radial basis function networks.

Download Full-text

An Ultra-Low Power Always-On Keyword Spotting Accelerator Using Quantized Convolutional Neural Network and Voltage-Domain Analog Switching Network-Based Approximate Computing

IEEE Access ◽

10.1109/access.2019.2960948 ◽

2019 ◽

Vol 7 ◽

pp. 186456-186469 ◽

Cited By ~ 2

Author(s):

Bo Liu ◽

Zhen Wang ◽

Wentao Zhu ◽

Yuhao Sun ◽

Zeyu Shen ◽

...

Keyword(s):

Neural Network ◽

Low Power ◽

Convolutional Neural Network ◽

Switching Network ◽

Keyword Spotting ◽

Approximate Computing ◽

Ultra Low Power

Download Full-text

Power-efficient Spike Sorting Scheme Using Analog Spiking Neural Network Classifier

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3432814 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-29

Author(s):

Anand Kumar Mukhopadhyay ◽

Atul Sharma ◽

Indrajit Chakrabarti ◽

Arindam Basu ◽

Mrigank Sharad

Keyword(s):

Neural Network ◽

Low Power ◽

Signal Acquisition ◽

Spiking Neural Network ◽

Spike Sorting ◽

Approximate Computing ◽

Training Module ◽

Power Efficient ◽

On Chip ◽

The Impact

The method to map the neural signals to the neuron from which it originates is spike sorting. A low-power spike sorting system is presented for a neural implant device. The spike sorter constitutes a two-step trainer module that is shared by the signal acquisition channel associated with multiple electrodes. A low-power Spiking Neural Network (SNN) module is responsible for assigning the spike class. The two-step shared supervised on-chip training module is presented for improved training accuracy for the SNN. Post implant, the relatively power-hungry training module can be activated conditionally based on a statistics-driven retraining algorithm that allows on the fly training and adaptation. A low-power analog implementation for the SNN classifier is proposed based on resistive crossbar memory exploiting its approximate computing nature. Owing to the direct mapping of SNN functionality using physical characteristics of devices, the analog mode implementation can achieve ∼21 × lower power than its fully digital counterpart. We also incorporate the effect of device variation in the training process to suppress the impact of inevitable inaccuracies in such resistive crossbar devices on the classification accuracy. A variation-aware, digitally calibrated analog front-end is also presented, which consumes less than ∼50 nW power and interfaces with the digital training module as well as the analog SNN spike sorting module. Hence, the proposed scheme is a low-power, variation-tolerant, adaptive, digitally trained, all-analog spike sorter device, applicable to implantable and wearable multichannel brain-machine interfaces.

Download Full-text

Enabling Resistive-RAM-based Activation Functions for Deep Neural Network Acceleration

Proceedings of the 2020 on Great Lakes Symposium on VLSI ◽

10.1145/3386263.3406915 ◽

2020 ◽

Author(s):

Zihan Zhang ◽

Taozhong Li ◽

Ning Guan ◽

Qin Wang ◽

Guanghui He ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Activation Functions ◽

Resistive Ram

Download Full-text

Neural Networks: Low‐Power Self‐Rectifying Memristive Artificial Neural Network for Near Internet‐of‐Things Sensor Computing (Adv. Electron. Mater. 6/2021)

Advanced Electronic Materials ◽

10.1002/aelm.202170017 ◽

2021 ◽

Vol 7 (6) ◽

pp. 2170017

Author(s):

Seok Choi ◽

Yong Kim ◽

Tien Van Nguyen ◽

Won Hee Jeong ◽

Kyeong‐Sik Min ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Network ◽

Internet Of Things ◽

Low Power ◽

Artificial Neural

Download Full-text

Binary Precision Neural Network Manycore Accelerator

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3423136 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-27

Author(s):

Morteza Hosseini ◽

Tinoosh Mohsenin

Keyword(s):

Neural Network ◽

Low Power ◽

Image Classification ◽

Case Studies ◽

Average Power ◽

Total Power ◽

Fabrication Technology ◽

Population Count ◽

Cluster Architecture ◽

Domain Specific

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

Download Full-text