Binary Precision Neural Network Manycore Accelerator

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

Download Full-text

Epileptic Seizure Detection on an Ultra-Low-Power Embedded RISC-V Processor Using a Convolutional Neural Network

Biosensors ◽

10.3390/bios11070203 ◽

2021 ◽

Vol 11 (7) ◽

pp. 203

Author(s):

Andreas Bahr ◽

Matthias Schneider ◽

Maria Francis ◽

Hendrik Lehmann ◽

Igor Barg ◽

...

Keyword(s):

Neural Network ◽

Low Power ◽

Convolutional Neural Network ◽

Medical Devices ◽

Epileptic Seizure ◽

Average Power ◽

Seizure Detection ◽

Ultra Low Power ◽

Epileptic Seizure Detection

The treatment of refractory epilepsy via closed-loop implantable devices that act on seizures either by drug release or electrostimulation is a highly attractive option. For such implantable medical devices, efficient and low energy consumption, small size, and efficient processing architectures are essential. To meet these requirements, epileptic seizure detection by analysis and classification of brain signals with a convolutional neural network (CNN) is an attractive approach. This work presents a CNN for epileptic seizure detection capable of running on an ultra-low-power microprocessor. The CNN is implemented and optimized in MATLAB. In addition, the CNN is also implemented on a GAP8 microprocessor with RISC-V architecture. The training, optimization, and evaluation of the proposed CNN are based on the CHB-MIT dataset. The CNN reaches a median sensitivity of 90% and a very high specificity over 99% corresponding to a median false positive rate of 6.8 s per hour. After implementation of the CNN on the microcontroller, a sensitivity of 85% is reached. The classification of 1 s of EEG data takes t=35 ms and consumes an average power of P≈140 μW. The proposed detector outperforms related approaches in terms of power consumption by a factor of 6. The universal applicability of the proposed CNN based detector is verified with recording of epileptic rats. This results enable the design of future medical devices for epilepsy treatment.

Download Full-text

Design of an Always-On Image Sensor Using an Analog Lightweight Convolutional Neural Network

Sensors ◽

10.3390/s20113101 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3101

Author(s):

Jaihyuk Choi ◽

Sungjae Lee ◽

Youngdoo Son ◽

Soo Youn Kim

Keyword(s):

Neural Network ◽

Power Consumption ◽

Convolutional Neural Network ◽

Image Classification ◽

Image Sensor ◽

Image Resolution ◽

Oxide Semiconductor ◽

Total Power ◽

Max Pooling ◽

Total Power Consumption

This paper presents an always-on Complementary Metal Oxide Semiconductor (CMOS) image sensor (CIS) using an analog convolutional neural network for image classification in mobile applications. To reduce the power consumption as well as the overall processing time, we propose analog convolution circuits for computing convolution, max-pooling, and correlated double sampling operations without operational transconductance amplifiers. In addition, we used the voltage-mode MAX circuit for max pooling in the analog domain. After the analog convolution processing, the image data were reduced by 99.58% and were converted to digital with a 4-bit single-slope analog-to-digital converter. After the conversion, images were classified by the fully connected processor, which is traditionally performed in the digital domain. The measurement results show that we achieved an 89.33% image classification accuracy. The prototype CIS was fabricated in a 0.11 μm 1-poly 4-metal CIS process with a standard 4T-active pixel sensor. The image resolution was 160 × 120, and the total power consumption of the proposed CIS was 1.12 mW with a 3.3 V supply voltage and a maximum frame rate of 120.

Download Full-text

Low Power Built-In Self-Test Schemes for Array and Booth Multipliers

VLSI Design ◽

10.1155/2001/67893 ◽

2001 ◽

Vol 12 (3) ◽

pp. 431-448 ◽

Cited By ~ 1

Author(s):

D. Bakalist ◽

X. Kavousianos ◽

H. T. Vergos ◽

D. Nikolos ◽

G. Ph. Alexiou

Keyword(s):

Low Power ◽

Power Dissipation ◽

Average Power ◽

Total Power ◽

Power Efficient ◽

Test Application ◽

Low Power Dissipation ◽

Recent Trends ◽

Self Test ◽

Built In Self Test

Recent trends in IC technology have given rise to a new requirement, that of low power dissipation during testing, that Built-In Self-Test (BIST) structures must target along with the traditional requirements. To this end, by exploiting the inherent properties of Carry Save, Carry Propagate and modified Booth multipliers, in this paper we propose new power-efficient BIST structures for them. The proposed BIST schemes are derived by: (a) properly assigning the Test Pattern Generator (TPG) outputs to the multiplier inputs, (b) modifying the TPG circuits and (c) reducing the test set length. Our results indicate that the total power dissipated during testing can be reduced from 29.3% to 54.9%, while the average power per test vector applied can be reduced from 5.8% to 36.5% and the peak power dissipation can be reduced from 15.5% to 50.2% depending on the implementation of the basic cells and the size of the multiplier. The test application time is also significantly reduced, while the introduced BIST schemes implementation area is small.

Download Full-text