Binary Precision Neural Network Manycore Accelerator

2021 ◽  
Vol 17 (2) ◽  
pp. 1-27
Author(s):  
Morteza Hosseini ◽  
Tinoosh Mohsenin

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

Biosensors ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 203
Author(s):  
Andreas Bahr ◽  
Matthias Schneider ◽  
Maria Francis ◽  
Hendrik Lehmann ◽  
Igor Barg ◽  
...  

The treatment of refractory epilepsy via closed-loop implantable devices that act on seizures either by drug release or electrostimulation is a highly attractive option. For such implantable medical devices, efficient and low energy consumption, small size, and efficient processing architectures are essential. To meet these requirements, epileptic seizure detection by analysis and classification of brain signals with a convolutional neural network (CNN) is an attractive approach. This work presents a CNN for epileptic seizure detection capable of running on an ultra-low-power microprocessor. The CNN is implemented and optimized in MATLAB. In addition, the CNN is also implemented on a GAP8 microprocessor with RISC-V architecture. The training, optimization, and evaluation of the proposed CNN are based on the CHB-MIT dataset. The CNN reaches a median sensitivity of 90% and a very high specificity over 99% corresponding to a median false positive rate of 6.8 s per hour. After implementation of the CNN on the microcontroller, a sensitivity of 85% is reached. The classification of 1 s of EEG data takes t=35 ms and consumes an average power of P≈140 μW. The proposed detector outperforms related approaches in terms of power consumption by a factor of 6. The universal applicability of the proposed CNN based detector is verified with recording of epileptic rats. This results enable the design of future medical devices for epilepsy treatment.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3101
Author(s):  
Jaihyuk Choi ◽  
Sungjae Lee ◽  
Youngdoo Son ◽  
Soo Youn Kim

This paper presents an always-on Complementary Metal Oxide Semiconductor (CMOS) image sensor (CIS) using an analog convolutional neural network for image classification in mobile applications. To reduce the power consumption as well as the overall processing time, we propose analog convolution circuits for computing convolution, max-pooling, and correlated double sampling operations without operational transconductance amplifiers. In addition, we used the voltage-mode MAX circuit for max pooling in the analog domain. After the analog convolution processing, the image data were reduced by 99.58% and were converted to digital with a 4-bit single-slope analog-to-digital converter. After the conversion, images were classified by the fully connected processor, which is traditionally performed in the digital domain. The measurement results show that we achieved an 89.33% image classification accuracy. The prototype CIS was fabricated in a 0.11 μm 1-poly 4-metal CIS process with a standard 4T-active pixel sensor. The image resolution was 160 × 120, and the total power consumption of the proposed CIS was 1.12 mW with a 3.3 V supply voltage and a maximum frame rate of 120.


VLSI Design ◽  
2001 ◽  
Vol 12 (3) ◽  
pp. 431-448 ◽  
Author(s):  
D. Bakalist ◽  
X. Kavousianos ◽  
H. T. Vergos ◽  
D. Nikolos ◽  
G. Ph. Alexiou

Recent trends in IC technology have given rise to a new requirement, that of low power dissipation during testing, that Built-In Self-Test (BIST) structures must target along with the traditional requirements. To this end, by exploiting the inherent properties of Carry Save, Carry Propagate and modified Booth multipliers, in this paper we propose new power-efficient BIST structures for them. The proposed BIST schemes are derived by: (a) properly assigning the Test Pattern Generator (TPG) outputs to the multiplier inputs, (b) modifying the TPG circuits and (c) reducing the test set length. Our results indicate that the total power dissipated during testing can be reduced from 29.3% to 54.9%, while the average power per test vector applied can be reduced from 5.8% to 36.5% and the peak power dissipation can be reduced from 15.5% to 50.2% depending on the implementation of the basic cells and the size of the multiplier. The test application time is also significantly reduced, while the introduced BIST schemes implementation area is small.


Sign in / Sign up

Export Citation Format

Share Document