FPGA Implementation of a Pyramidal Weightless Neural Networks Learning System

A hardware architecture of a Probabilistic Logic Neuron (PLN) is presented. The suggested model facilitates the on-chip learning of pyramidal Weightless Neural Networks using a modified probabilistic search reward/penalty training algorithm. The penalization strategy of the training algorithm depends on a predefined parameter called the probabilistic search interval. A complete Weightless Neural Network (WNN) learning system is modeled and implemented on Xilinx XC4005E Field Programmable Gate Array (FPGA), allowing its architecture to be configurable. Various experiments have been conducted to examine the feasibility and performance of the WNN learning system. Results show that the system has a fast convergence rate and good generalization ability.

Download Full-text

Fifty years of Electronic Hardware Implementations of First and Higher Order Neural Networks

Artificial Higher Order Neural Networks for Computer Science and Engineering ◽

10.4018/978-1-61520-711-4.ch012 ◽

2010 ◽

pp. 269-285 ◽

Cited By ~ 3

Author(s):

David R. Selviah ◽

Janti Shawash

Keyword(s):

Neural Networks ◽

Real Time ◽

High Speed ◽

Higher Order ◽

Low Latency ◽

Real Time Control ◽

Practical Applications ◽

Field Programmable ◽

On Chip ◽

Electronic Hardware

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.

Download Full-text

xDNN: Inference for Deep Convolutional Neural Networks

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3473334 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-29

Author(s):

Paolo D'Alberto ◽

Victor Wu ◽

Aaron Ng ◽

Rahul Nimaiyar ◽

Elliott Delaye ◽

...

Keyword(s):

Neural Networks ◽

Power Efficiency ◽

Digital Signal ◽

Fpga Design ◽

Deep Convolutional Neural Networks ◽

Parametric Function ◽

Field Programmable ◽

Scale Down ◽

On Chip ◽

Numerical Precision

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

Download Full-text

Fast Convolutional Neural Networks in Low Density FPGAs Using Zero-Skipping and Weight Pruning

Electronics ◽

10.3390/electronics8111321 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1321 ◽

Cited By ~ 3

Author(s):

Mário P. Véstias ◽

Rui Policarpo Duarte ◽

José T. de Sousa ◽

Horácio C. Neto

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Low Density ◽

Embedded Computing ◽

System Architectures ◽

Field Programmable ◽

New Processing ◽

On Chip ◽

Weight Pruning

Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efficiency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using fixed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.

Download Full-text

Improving Performance-Power-Programmability in Space Avionics with Edge Devices: VBN on Myriad2 SoC

ACM Transactions on Embedded Computing Systems ◽

10.1145/3440885 ◽

2021 ◽

Vol 20 (3) ◽

pp. 1-23

Author(s):

Vasileios Leon ◽

George Lentaris ◽

Evangelos Petrongonas ◽

Dimitrios Soudris ◽

Gianluca Furano ◽

...

Keyword(s):

Digital Signal ◽

System On Chip ◽

Core System ◽

Proximity Operations ◽

Space Instruments ◽

Field Programmable ◽

Commercial Off The Shelf ◽

And Performance ◽

On Chip ◽

Key Enabling Technologies

The advent of powerful edge devices and AI algorithms has already revolutionized many terrestrial applications; however, for both technical and historical reasons, the space industry is still striving to adopt these key enabling technologies in new mission concepts. In this context, the current work evaluates an heterogeneous multi-core system-on-chip processor for use on-board future spacecraft to support novel, computationally demanding digital signal processors and AI functionalities. Given the importance of low power consumption in satellites, we consider the Intel Movidius Myriad2 system-on-chip and focus on SW development and performance aspects. We design a methodology and framework to accommodate efficient partitioning, mapping, parallelization, code optimization, and tuning of complex algorithms. Furthermore, we propose an avionics architecture combining this commercial off-the-shelf chip with a field programmable gate array device to facilitate, among others, interfacing with traditional space instruments via SpaceWire transcoding. We prototype our architecture in the lab targeting vision-based navigation tasks. We implement a representative computer vision pipeline to track the 6D pose of ENVISAT using megapixel images during hypothetical spacecraft proximity operations. Overall, we achieve 2.6 to 4.9 FPS with only 0.8 to 1.1 W on Myriad2 , i.e., 10-fold acceleration versus modern rad-hard processors. Based on the results, we assess various benefits of utilizing Myriad2 instead of conventional field programmable gate arrays and CPUs.

Download Full-text

High-Performance Time Server Core for FPGA System-on-Chip

Electronics ◽

10.3390/electronics8050528 ◽

2019 ◽

Vol 8 (5) ◽

pp. 528 ◽

Cited By ~ 1

Author(s):

Julian Viejo ◽

Jorge Juan-Chico ◽

Manuel J. Bellido ◽

Paulino Ruiz-de-Clavijo ◽

David Guerrero ◽

...

Keyword(s):

High Performance ◽

System On Chip ◽

The Core ◽

Performance Time ◽

Network Time ◽

Wide Range ◽

Network Time Protocol ◽

Field Programmable ◽

And Performance ◽

On Chip

This paper presents the complete design and implementation of a low-cost, low-footprint, network time protocol server core for field programmable gate arrays. The core uses a carefully designed modular architecture, which is fully implemented in hardware using digital circuits and systems. Most remarkable novelties introduced are a hardware-optimized timekeeping algorithm implementation, and a full-hardware protocol stack and automatic network configuration. As a result, the core is able to achieve similar accuracy and performance to typical high-performance network time protocol server equipment. The core uses a standard global positioning system receiver as time reference, has a small footprint and can easily fit in a low-range field-programmable chip, greatly scaling down from previous system-on-chip time synchronization systems. Accuracy and performance results show that the core can serve hundreds of thousands of network time clients with negligible accuracy degradation, in contrast to state-of-the-art high-performance time server equipment. Therefore, this core provides a valuable time server solution for a wide range of emerging embedded and distributed network applications such as the Internet of Things and the smart grid, at a fraction of the cost and footprint of current discrete and embedded solutions.

Download Full-text

A Reconfigurable and Biologically Inspired Paradigm for Computation Using Network-On-Chip and Spiking Neural Networks

International Journal of Reconfigurable Computing ◽

10.1155/2009/908740 ◽

2009 ◽

Vol 2009 ◽

pp. 1-13 ◽

Cited By ~ 37

Author(s):

Jim Harkin ◽

Fearghal Morgan ◽

Liam McDaid ◽

Steve Hall ◽

Brian McGinley ◽

...

Keyword(s):

Neural Networks ◽

Network Architecture ◽

Fault Tolerant ◽

Network On Chip ◽

Spiking Neurons ◽

Spiking Neural Networks ◽

Neural Network Architecture ◽

Biologically Inspired ◽

Field Programmable ◽

On Chip

FPGA devices have emerged as a popular platform for the rapid prototyping of biological Spiking Neural Networks (SNNs) applications, offering the key requirement of reconfigurability. However, FPGAs do not efficiently realise the biologically plausible neuron and synaptic models of SNNs, and current FPGA routing structures cannot accommodate the high levels of interneuron connectivity inherent in complex SNNs. This paper highlights and discusses the current challenges of implementing scalable SNNs on reconfigurable FPGAs. The paper proposes a novel field programmable neural network architecture (EMBRACE), incorporating low-power analogue spiking neurons, interconnected using a Network-on-Chip architecture. Results on the evaluation of the EMBRACE architecture using the XOR benchmark problem are presented, and the performance of the architecture is discussed. The paper also discusses the adaptability of the EMBRACE architecture in supporting fault tolerant computing.

Download Full-text

Memory Requirement Reduction of Deep Neural Networks for Field Programmable Gate Arrays Using Low-Bit Quantization of Parameters

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287739 ◽

2021 ◽

Author(s):

Niccolo Nicodemo ◽

Gaurav Naithani ◽

Konstantinos Drossos ◽

Tuomas Virtanen ◽

Roberto Saletti

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Field Programmable Gate Arrays ◽

Memory Requirement ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

An Improved Training Algorithm for Quantum Neural Networks

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2012.01417 ◽

2014 ◽

Vol 35 (7) ◽

pp. 1630-1635

Author(s):

Yi-peng Zhang ◽

Liang Chen ◽

Huan Hao

Keyword(s):

Neural Networks ◽

Training Algorithm

Download Full-text

Reconfigurable field‐programmable gate array‐based on‐chip learning neuromorphic digital implementation for nonlinear function approximation

International Journal of Circuit Theory and Applications ◽

10.1002/cta.3075 ◽

2021 ◽

Author(s):

Morteza Gholami ◽

Edris Zaman Farsa ◽

Gholamreza Karimi

Keyword(s):

Field Programmable Gate Array ◽

Function Approximation ◽

Nonlinear Function ◽

Digital Implementation ◽

Field Programmable ◽

Gate Array ◽

On Chip ◽

Nonlinear Function Approximation

Download Full-text

Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds

Sensors ◽

10.3390/s21010229 ◽

2021 ◽

Vol 21 (1) ◽

pp. 229

Author(s):

Xianzhong Tian ◽

Juan Zhu ◽

Ting Xu ◽

Yanjun Li

Keyword(s):

Neural Networks ◽

Energy Consumption ◽

Mobile Devices ◽

Wireless Network ◽

Deep Neural Networks ◽

Mobile User ◽

Computation Offloading ◽

Long Latency ◽

Total Latency ◽

And Performance

The latest results in Deep Neural Networks (DNNs) have greatly improved the accuracy and performance of a variety of intelligent applications. However, running such computation-intensive DNN-based applications on resource-constrained mobile devices definitely leads to long latency and huge energy consumption. The traditional way is performing DNNs in the central cloud, but it requires significant amounts of data to be transferred to the cloud over the wireless network and also results in long latency. To solve this problem, offloading partial DNN computation to edge clouds has been proposed, to realize the collaborative execution between mobile devices and edge clouds. In addition, the mobility of mobile devices is easily to cause the computation offloading failure. In this paper, we develop a mobility-included DNN partition offloading algorithm (MDPO) to adapt to user’s mobility. The objective of MDPO is minimizing the total latency of completing a DNN job when the mobile user is moving. The MDPO algorithm is suitable for both DNNs with chain topology and graphic topology. We evaluate the performance of our proposed MDPO compared to local-only execution and edge-only execution, experiments show that MDPO significantly reduces the total latency and improves the performance of DNN, and MDPO can adjust well to different network conditions.

Download Full-text