A Scatter-and-Gather Spiking Convolutional Neural Network on a Reconfigurable Neuromorphic Hardware

Frontiers in Neuroscience ◽

10.3389/fnins.2021.694170 ◽

2021 ◽

Vol 15 ◽

Author(s):

Chenglong Zou ◽

Xiaoxin Cui ◽

Yisong Kuang ◽

Kefei Liu ◽

Yuan Wang ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Average Power ◽

Time Step ◽

Mapping Algorithm ◽

Learning Tasks ◽

Main Challenge ◽

Neuromorphic Hardware ◽

Average Power Consumption ◽

On Chip

Artificial neural networks (ANNs), like convolutional neural networks (CNNs), have achieved the state-of-the-art results for many machine learning tasks. However, inference with large-scale full-precision CNNs must cause substantial energy consumption and memory occupation, which seriously hinders their deployment on mobile and embedded systems. Highly inspired from biological brain, spiking neural networks (SNNs) are emerging as new solutions because of natural superiority in brain-like learning and great energy efficiency with event-driven communication and computation. Nevertheless, training a deep SNN remains a main challenge and there is usually a big accuracy gap between ANNs and SNNs. In this paper, we introduce a hardware-friendly conversion algorithm called “scatter-and-gather” to convert quantized ANNs to lossless SNNs, where neurons are connected with ternary {−1,0,1} synaptic weights. Each spiking neuron is stateless and more like original McCulloch and Pitts model, because it fires at most one spike and need be reset at each time step. Furthermore, we develop an incremental mapping framework to demonstrate efficient network deployments on a reconfigurable neuromorphic chip. Experimental results show our spiking LeNet on MNIST and VGG-Net on CIFAR-10 datasetobtain 99.37% and 91.91% classification accuracy, respectively. Besides, the presented mapping algorithm manages network deployment on our neuromorphic chip with maximum resource efficiency and excellent flexibility. Our four-spike LeNet and VGG-Net on chip can achieve respective real-time inference speed of 0.38 ms/image, 3.24 ms/image, and an average power consumption of 0.28 mJ/image and 2.3 mJ/image at 0.9 V, 252 MHz, which is nearly two orders of magnitude more efficient than traditional GPUs.

Download Full-text

Revisiting Batch Normalization for Training Low-Latency Deep Spiking Neural Networks From Scratch

Frontiers in Neuroscience ◽

10.3389/fnins.2021.773954 ◽

2021 ◽

Vol 15 ◽

Author(s):

Youngeun Kim ◽

Priyadarshini Panda

Keyword(s):

Neural Networks ◽

Temporal Dynamics ◽

Temporal Characteristic ◽

Spiking Neural Networks ◽

Time Step ◽

Intelligent Computing ◽

Early Exit ◽

Batch Normalization ◽

Binary Event ◽

Neuromorphic Hardware

Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, SNNs convey temporally-varying spike activation through time that is likely to induce a large variation of forward activation and backward gradients, resulting in unstable training. To address this training issue in SNNs, we revisit Batch Normalization (BN) and propose a temporal Batch Normalization Through Time (BNTT) technique. Different from previous BN techniques with SNNs, we find that varying the BN parameters at every time-step allows the model to learn the time-varying input distribution better. Specifically, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. We demonstrate BNTT on CIFAR-10, CIFAR-100, Tiny-ImageNet, event-driven DVS-CIFAR10 datasets, and Sequential MNIST and show near state-of-the-art performance. We conduct comprehensive analysis on the temporal characteristic of BNTT and showcase interesting benefits toward robustness against random and adversarial noise. Further, by monitoring the learnt parameters of BNTT, we find that we can do temporal early exit. That is, we can reduce the inference latency by ~5 − 20 time-steps from the original training latency. The code has been released at https://github.com/Intelligent-Computing-Lab-Yale/BNTT-Batch-Normalization-Through-Time.

Download Full-text

CMOS RF Transmitters with On-Chip Antenna for Passive RFID and IoT Nodes

Electronics ◽

10.3390/electronics8121448 ◽

2019 ◽

Vol 8 (12) ◽

pp. 1448 ◽

Cited By ~ 4

Author(s):

Massimo Merenda ◽

Demetrio Iero ◽

Francesco G. Della Corte

Keyword(s):

Average Power ◽

Monolithically Integrated ◽

Rf Transmitters ◽

Sufficient Power ◽

Average Power Consumption ◽

On Chip Antenna ◽

Data Transmission System ◽

On Chip ◽

Loop Antennas ◽

Passive Rfid

The performances of two RF transmitters, monolithically integrated with their antennas on a single CMOS microchip fabricated in a standard 0.35 µm process, are presented. The usage of these architectures in the Internet of Things (IoT) paradigm is envisioned, as part of a custom conceived data transmission system. The implemented circuits use two different directly on–off keying (OOK) modulated oscillator topologies whose outputs are employed to feed two loop antennas. The powering of both transmitters is duty-cycled for reducing the average power consumption to a few tenths of a microwatt, allowing the usage as low-power transmitters for IoT nodes. The integrated loop antennas radiate sufficient power for a few meters’ communication range. The OOK transmitted signal can be easily detected using a commercial receiver.

Download Full-text

Toward Robust Cognitive 3D Brain-Inspired Cross-Paradigm System

Frontiers in Neuroscience ◽

10.3389/fnins.2021.690208 ◽

2021 ◽

Vol 15 ◽

Author(s):

Abderazek Ben Abdallah ◽

Khanh N. Dang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Large Scale ◽

Three Dimensional ◽

Network On Chip ◽

Spike Timing ◽

Dimensional Structure ◽

Spiking Neural Network ◽

3D Ics ◽

On Chip

Spiking Neuromorphic systems have been introduced as promising platforms for energy-efficient spiking neural network (SNNs) execution. SNNs incorporate neuronal and synaptic states in addition to the variant time scale into their computational model. Since each neuron in these networks is connected to many others, high bandwidth is required. Moreover, since the spike times are used to encode information in SNN, a precise communication latency is also needed, although SNN is tolerant to the spike delay variation in some limits when it is seen as a whole. The two-dimensional packet-switched network-on-chip was proposed as a solution to provide a scalable interconnect fabric in large-scale spike-based neural networks. The 3D-ICs have also attracted a lot of attention as a potential solution to resolve the interconnect bottleneck. Combining these two emerging technologies provides a new horizon for IC design to satisfy the high requirements of low power and small footprint in emerging AI applications. Moreover, although fault-tolerance is a natural feature of biological systems, integrating many computation and memory units into neuromorphic chips confronts the reliability issue, where a defective part can affect the overall system's performance. This paper presents the design and simulation of R-NASH-a reliable three-dimensional digital neuromorphic system geared explicitly toward the 3D-ICs biological brain's three-dimensional structure, where information in the network is represented by sparse patterns of spike timing and learning is based on the local spike-timing-dependent-plasticity rule. Our platform enables high integration density and small spike delay of spiking networks and features a scalable design. R-NASH is a design based on the Through-Silicon-Via technology, facilitating spiking neural network implementation on clustered neurons based on Network-on-Chip. We provide a memory interface with the host CPU, allowing for online training and inference of spiking neural networks. Moreover, R-NASH supports fault recovery with graceful performance degradation.

Download Full-text

Supervised training of spiking neural networks for robust deployment on mixed-signal neuromorphic processors

Scientific Reports ◽

10.1038/s41598-021-02779-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Julian Büchel ◽

Dmitrii Zendrikov ◽

Sergio Solinas ◽

Giacomo Indiveri ◽

Dylan R. Muir

Keyword(s):

Neural Networks ◽

Analog Circuits ◽

Learning Rule ◽

High Energy ◽

Spiking Neural Networks ◽

Mixed Signal ◽

Device Mismatch ◽

Linear Control Theory ◽

Neuromorphic Hardware ◽

On Chip

AbstractMixed-signal analog/digital circuits emulate spiking neurons and synapses with extremely high energy efficiency, an approach known as “neuromorphic engineering”. However, analog circuits are sensitive to process-induced variation among transistors in a chip (“device mismatch”). For neuromorphic implementation of Spiking Neural Networks (SNNs), mismatch causes parameter variation between identically-configured neurons and synapses. Each chip exhibits a different distribution of neural parameters, causing deployed networks to respond differently between chips. Current solutions to mitigate mismatch based on per-chip calibration or on-chip learning entail increased design complexity, area and cost, making deployment of neuromorphic devices expensive and difficult. Here we present a supervised learning approach that produces SNNs with high robustness to mismatch and other common sources of noise. Our method trains SNNs to perform temporal classification tasks by mimicking a pre-trained dynamical system, using a local learning rule from non-linear control theory. We demonstrate our method on two tasks requiring temporal memory, and measure the robustness of our approach to several forms of noise and mismatch. We show that our approach is more robust than common alternatives for training SNNs. Our method provides robust deployment of pre-trained networks on mixed-signal neuromorphic hardware, without requiring per-device training or calibration.

Download Full-text

High-Speed Serializer for a 64 GS s<sup>−1</sup> Digital-to-Analog Converter in a 28 nm Fully-Depleted Silicon-on-Insulator CMOS Technology

Advances in Radio Science ◽

10.5194/ars-16-99-2018 ◽

2018 ◽

Vol 16 ◽

pp. 99-108

Author(s):

Daniel Widmann ◽

Markus Grözing ◽

Manfred Berroth

Keyword(s):

High Speed ◽

Average Power ◽

Silicon On Insulator ◽

Output Stage ◽

Fully Depleted ◽

Data Rates ◽

Average Power Consumption ◽

Digital To Analog ◽

On Chip ◽

28 Nm

Abstract. An attractive solution to provide several channels with very high data rates of tens of Gbit s−1 for digital-to-analog converters (DACs) in arbitrary waveform generators (AWGs) is to use a high speed serializer in front of the DAC. As data sources, on-chip memories, digital signal processors or field-programmable gate arrays can be used. Here, we present a serializer consisting of a 19 channel 16:1 multiplexer (MUX) for output data rates up to 64 Gbit s−1 per channel and a low skew (∼ 8.8 ps) two-phase frequency divider and clock distribution network that is completely realized in static CMOS logic. The circuit is designed in a 28 nm Fully-Depleted Silicon-on-Insulator (FD-SOI) technology and will be used in an 8 bit 64 GS s−1 DAC between the on-chip memory and the DAC output stage. Due to a four bits unary and four bits binary segmentation, a 19 channel MUX is required. Simulations on layout level reveal a data-dependent peak-to-peak jitter of less than 1.8 ps at the output of one MUX channel with a total average power consumption of approximately 1.15 W of the whole MUX and clock network.

Download Full-text

GeoBoost: An Incremental Deep Learning Approach toward Global Mapping of Buildings from VHR Remote Sensing Images

Remote Sensing ◽

10.3390/rs12111794 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1794

Author(s):

Naisen Yang ◽

Hong Tang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Large Scale ◽

Semantic Segmentation ◽

Data Sets ◽

Data Set ◽

Large Scale Data ◽

Learning Tasks ◽

Global Mapping ◽

Scale Data

Modern convolutional neural networks (CNNs) are often trained on pre-set data sets with a fixed size. As for the large-scale applications of satellite images, for example, global or regional mappings, these images are collected incrementally by multiple stages in general. In other words, the sizes of training datasets might be increased for the tasks of mapping rather than be fixed beforehand. In this paper, we present a novel algorithm, called GeoBoost, for the incremental-learning tasks of semantic segmentation via convolutional neural networks. Specifically, the GeoBoost algorithm is trained in an end-to-end manner on the newly available data, and it does not decrease the performance of previously trained models. The effectiveness of the GeoBoost algorithm is verified on the large-scale data set of DREAM-B. This method avoids the need for training on the enlarged data set from scratch and would become more effective along with more available data.

Download Full-text

The Backpropagation Algorithm Implemented on Spiking Neuromorphic Hardware

10.21203/rs.3.rs-701752/v1 ◽

2021 ◽

Author(s):

Alpha Renner ◽

Forrest Sheldon ◽

Anatoly Zlotnik ◽

Louis Tao ◽

Andrew Sornborger

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Low Power ◽

Large Scale ◽

Learning Algorithms ◽

Vlsi Circuits ◽

Machine Learning Algorithms ◽

Backpropagation Algorithm ◽

Neuromorphic Hardware ◽

On Chip

Abstract The capabilities of natural neural systems have inspired new generations of machine learning algorithms as well as neuromorphic very large-scale integrated (VLSI) circuits capable of fast, low-power information processing. However, it has been argued that most modern machine learning algorithms are not neurophysiologically plausible. In particular, the workhorse of modern deep learning, the backpropagation algorithm, has proven difficult to translate to neuromorphic hardware. In this study, we present a neuromorphic, spiking backpropagation algorithm based on synfire-gated dynamical information coordination and processing, implemented on Intel's Loihi neuromorphic research processor. We demonstrate a proof-of-principle three-layer circuit that learns to classify digits from the MNIST dataset. To our knowledge, this is the first work to show a Spiking Neural Network (SNN) implementation of the backpropagation algorithm that is fully on-chip, without a computer in the loop. It is competitive in accuracy with off-chip trained SNNs and achieves an energy-delay product suitable for edge computing. This implementation shows a path for using in-memory, massively parallel neuromorphic processors for low-power, low-latency implementation of modern deep learning applications.

Download Full-text

Reducing weight precision of convolutional neural networks towards large-scale on-chip image recognition

10.1117/12.2176598 ◽

2015 ◽

Author(s):

Zhengping Ji ◽

Ilia Ovsiannikov ◽

Yibing Wang ◽

Lilong Shi ◽

Qiang Zhang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Image Recognition ◽

Large Scale ◽

On Chip

Download Full-text

Power analysis approach and its application to IP-based SoC design

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-08-2015-0283 ◽

2016 ◽

Vol 35 (3) ◽

Author(s):

Yaseer Arafat Durrani ◽

Teresa Riesgo ◽

Muhammad Imran Khan ◽

Tariq Mahmood

Keyword(s):

Power Consumption ◽

Design Methodology ◽

Average Power ◽

Power Estimation ◽

Statistical Characteristics ◽

Estimation Accuracy ◽

Average Error ◽

Average Power Consumption ◽

On Chip ◽

Practical Implications

Purpose Low-power consumption has become an important issue that cannot be ignored in System-on-Chip (SoC) design. The key challenge encountered by system design is how to maintain balance between the estimation accuracy and speed. This paper aims at demonstrating an accurate and fast power estimation technique. Design/methodology/approach The methodology adopted in the paper is to use input patterns with the predefined statistical characteristics which helps to analyze the average power consumption of the different intellectual-property (IP) cores and the interconnects/buses in SoC design. Similarly the paper has implemented Genetic algorithm (GA) to generate sequences of input signals during the power estimation procedure. Findings The GA concurrently optimizes the input signal characteristics that influence the final solution of the pattern. In addition to that, a Monte-Carlo zero-delay simulation is also performed for individual IP core and bus at high-level. By the simple addition of these cores/buses, power is predicted by a novel macro-model function. In experiments, the average error is estimated at 13.84%. Research limitations/implications To present the research findings with clarity and to avoid complexities, the paper does not consider delay factors like glitches, jitter etc. in the power model. Practical implications The proposed methodology allowed accurate power/energy analysis of practical applications mapped onto Network-on-Chip (NoC) based Multiprocessors SoC platform. It enables the performance analysis of different design alternatives under the load imposed by complex applications. Originality/value This paper is an original contribution and the results demonstrate that our novel technique could be implemented to achieve fast and accurate power estimation in the early stage of any SoC design.

Download Full-text

Unsupervised Representation Learning by Predicting Random Distances

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/408 ◽

2020 ◽

Author(s):

Hu Wang ◽

Guansong Pang ◽

Chunhua Shen ◽

Congbo Ma

Keyword(s):

Neural Networks ◽

Anomaly Detection ◽

Unsupervised Learning ◽

Large Scale ◽

Deep Neural Networks ◽

State Of The Art ◽

Representation Learning ◽

Great Success ◽

Learning Tasks ◽

Real World Datasets

Deep neural networks have gained great success in a broad range of tasks due to its remarkable capability to learn semantically rich features from high-dimensional data. However, they often require large-scale labelled data to successfully learn such features, which significantly hinders their adaption in unsupervised learning tasks, such as anomaly detection and clustering, and limits their applications to critical domains where obtaining massive labelled data is prohibitively expensive. To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space. Random mapping is a theoretically proven approach to obtain approximately preserved distances. To well predict these distances, the representation learner is optimised to learn genuine class structures that are implicitly embedded in the randomly projected space. Empirical results on 19 real-world datasets show that our learned representations substantially outperform a few state-of-the-art methods for both anomaly detection and clustering tasks. Code is available at: \url{https://git.io/RDP}

Download Full-text