multiPULPly

2021 ◽  
Vol 17 (2) ◽  
pp. 1-27
Author(s):  
Adi Eliahu ◽  
Ronny Ronen ◽  
Pierre-Emmanuel Gaillardon ◽  
Shahar Kvatinsky

Computationally intensive neural network applications often need to run on resource-limited low-power devices. Numerous hardware accelerators have been developed to speed up the performance of neural network applications and reduce power consumption; however, most focus on data centers and full-fledged systems. Acceleration in ultra-low-power systems has been only partially addressed. In this article, we present multiPULPly, an accelerator that integrates memristive technologies within standard low-power CMOS technology, to accelerate multiplication in neural network inference on ultra-low-power systems. This accelerator was designated for PULP, an open-source microcontroller system that uses low-power RISC-V processors. Memristors were integrated into the accelerator to enable power consumption only when the memory is active, to continue the task with no context-restoring overhead, and to enable highly parallel analog multiplication. To reduce the energy consumption, we propose novel dataflows that handle common multiplication scenarios and are tailored for our architecture. The accelerator was tested on FPGA and achieved a peak energy efficiency of 19.5 TOPS/W, outperforming state-of-the-art accelerators by 1.5× to 4.5×.


Nanoscale ◽  
2018 ◽  
Vol 10 (33) ◽  
pp. 15826-15833 ◽  
Author(s):  
Lin Chen ◽  
Tian-Yu Wang ◽  
Ya-Wei Dai ◽  
Ming-Yang Cha ◽  
Hao Zhu ◽  
...  

Brain-inspired neuromorphic computing has shown great promise beyond the conventional Boolean logic.



2008 ◽  
Vol 43 (1) ◽  
pp. 172-179 ◽  
Author(s):  
Yih Wang ◽  
Hong Jo Ahn ◽  
Uddalak Bhattacharya ◽  
Zhanping Chen ◽  
Tom Coan ◽  
...  


Author(s):  
Angelo Garofalo ◽  
Manuele Rusci ◽  
Francesco Conti ◽  
Davide Rossi ◽  
Luca Benini

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for quantized neural network inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing extensions available in the PULP RISC-V processors and the cluster’s parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63 × with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core cluster runs in 30 × and 19.6 × less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on STM32L4 and STM32H7 MCUs, respectively. The proposed library, when running on a GAP-8 processor, outperforms by 36.8 × and by 7.45 × the execution on energy efficient MCUs such as STM32L4 and high-end MCUs such as STM32H7 respectively, when operating at the maximum frequency. The energy efficiency on GAP-8 is 14.1 × higher than STM32L4 and 39.5 × higher than STM32H7, at the maximum efficiency operating point. This article is part of the theme issue ‘Harmonizing energy-autonomous computing and intelligence’.





2021 ◽  
Vol 15 ◽  
Author(s):  
Pavan Kumar Chundi ◽  
Dewei Wang ◽  
Sung Justin Kim ◽  
Minhao Yang ◽  
Joao Pedro Cerqueira ◽  
...  

This paper presents a novel spiking neural network (SNN) classifier architecture for enabling always-on artificial intelligent (AI) functions, such as keyword spotting (KWS) and visual wake-up, in ultra-low-power internet-of-things (IoT) devices. Such always-on hardware tends to dominate the power efficiency of an IoT device and therefore it is paramount to minimize its power dissipation. A key observation is that the input signal to always-on hardware is typically sparse in time. This is a great opportunity that a SNN classifier can leverage because the switching activity and the power consumption of SNN hardware can scale with spike rate. To leverage this scalability, the proposed SNN classifier architecture employs event-driven architecture, especially fine-grained clock generation and gating and fine-grained power gating, to obtain very low static power dissipation. The prototype is fabricated in 65 nm CMOS and occupies an area of 1.99 mm2. At 0.52 V supply voltage, it consumes 75 nW at no input activity and less than 300 nW at 100% input activity. It still maintains competitive inference accuracy for KWS and other always-on classification workloads. The prototype achieved a power consumption reduction of over three orders of magnitude compared to the state-of-the-art for SNN hardware and of about 2.3X compared to the state-of-the-art KWS hardware.



2011 ◽  
Vol 2011 (HITEN) ◽  
pp. 000243-000250 ◽  
Author(s):  
E. Boufouss ◽  
L. A. Francis ◽  
P. Gérard ◽  
M. Assaad ◽  
D. Flandre

We present three ultra-low-power CMOS circuits: a temperature sensor, a voltage reference and a comparator developed for an ultra-low-power microsystem (ULP-MST) aiming at temperature sensing in harsh environments. The microsystem has 3 main functions: detecting a user-defined temperature threshold T0, generating a wake-up signal that turns on a data-acquisition microprocessor (located in a safe area) above T0, and measuring temperatures above T0. To achieve ultra-low-power operation, the three CMOS circuits are implemented in Silicon-on-Insulator (SOI) CMOS technology and are optimized to work in the subthreshold regime of the transistors. Since our application is mainly for harsh environment (i.e. high temperature and radiation), the chip has been designed using a suitable 1-μm SOI-CMOS technology. Simulations have been performed over the different process corners to verify functionality after fabrication. The typical power dissipation at high temperature (up to 240°C) is less than 100 μW at 5 V supply voltage. Measurements have validated correct operation in the temperature range from −40°C to 300°C before radiation and to 125°C after radiation up to now which will be extended further with a new set-up. Irradiation has been performed from 10 to 30 kGy. Such very high doses cause a shift down of output voltage values, which leads to a change of the temperature detection level and also increases the power dissipation by up to six times. Annealing effects help the partial recovery of the device operation at high temperature and the remote microprocessor enables calibration after radiation to readjust the temperature detection level.



Sign in / Sign up

Export Citation Format

Share Document