FPGA Implementation of a Pyramidal Weightless Neural Networks Learning System

2003 ◽  
Vol 13 (04) ◽  
pp. 225-237 ◽  
Author(s):  
Raida Al-Alawi

A hardware architecture of a Probabilistic Logic Neuron (PLN) is presented. The suggested model facilitates the on-chip learning of pyramidal Weightless Neural Networks using a modified probabilistic search reward/penalty training algorithm. The penalization strategy of the training algorithm depends on a predefined parameter called the probabilistic search interval. A complete Weightless Neural Network (WNN) learning system is modeled and implemented on Xilinx XC4005E Field Programmable Gate Array (FPGA), allowing its architecture to be configurable. Various experiments have been conducted to examine the feasibility and performance of the WNN learning system. Results show that the system has a fast convergence rate and good generalization ability.

Author(s):  
David R. Selviah ◽  
Janti Shawash

This chapter celebrates 50 years of first and higher order neural network (HONN) implementations in terms of the physical layout and structure of electronic hardware, which offers high speed, low latency, compact, low cost, low power, mass produced systems. Low latency is essential for practical applications in real time control for which software implementations running on CPUs are too slow. The literature review chapter traces the chronological development of electronic neural networks (ENN) discussing selected papers in detail from analog electronic hardware, through probabilistic RAM, generalizing RAM, custom silicon Very Large Scale Integrated (VLSI) circuit, Neuromorphic chips, pulse stream interconnected neurons to Application Specific Integrated circuits (ASICs) and Zero Instruction Set Chips (ZISCs). Reconfigurable Field Programmable Gate Arrays (FPGAs) are given particular attention as the most recent generation incorporate Digital Signal Processing (DSP) units to provide full System on Chip (SoC) capability offering the possibility of real-time, on-line and on-chip learning.


2022 ◽  
Vol 15 (2) ◽  
pp. 1-29
Author(s):  
Paolo D'Alberto ◽  
Victor Wu ◽  
Aaron Ng ◽  
Rahul Nimaiyar ◽  
Elliott Delaye ◽  
...  

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.


Electronics ◽  
2019 ◽  
Vol 8 (11) ◽  
pp. 1321 ◽  
Author(s):  
Mário P. Véstias ◽  
Rui Policarpo Duarte ◽  
José T. de Sousa ◽  
Horácio C. Neto

Edge devices are becoming smarter with the integration of machine learning methods, such as deep learning, and are therefore used in many application domains where decisions have to be made without human intervention. Deep learning and, in particular, convolutional neural networks (CNN) are more efficient than previous algorithms for several computer vision applications such as security and surveillance, where image and video analysis are required. This better efficiency comes with a cost of high computation and memory requirements. Hence, running CNNs in embedded computing devices is a challenge for both algorithm and hardware designers. New processing devices, dedicated system architectures and optimization of the networks have been researched to deal with these computation requirements. In this paper, we improve the inference execution times of CNNs in low density FPGAs (Field-Programmable Gate Arrays) using fixed-point arithmetic, zero-skipping and weight pruning. The developed architecture supports the execution of large CNNs in FPGA devices with reduced on-chip memory and computing resources. With the proposed architecture, it is possible to infer an image in AlexNet in 2.9 ms in a ZYNQ7020 and 1.0 ms in a ZYNQ7045 with less than 1% accuracy degradation. These results improve previous state-of-the-art architectures for CNN inference.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-23
Author(s):  
Vasileios Leon ◽  
George Lentaris ◽  
Evangelos Petrongonas ◽  
Dimitrios Soudris ◽  
Gianluca Furano ◽  
...  

The advent of powerful edge devices and AI algorithms has already revolutionized many terrestrial applications; however, for both technical and historical reasons, the space industry is still striving to adopt these key enabling technologies in new mission concepts. In this context, the current work evaluates an heterogeneous multi-core system-on-chip processor for use on-board future spacecraft to support novel, computationally demanding digital signal processors and AI functionalities. Given the importance of low power consumption in satellites, we consider the Intel Movidius Myriad2 system-on-chip and focus on SW development and performance aspects. We design a methodology and framework to accommodate efficient partitioning, mapping, parallelization, code optimization, and tuning of complex algorithms. Furthermore, we propose an avionics architecture combining this commercial off-the-shelf chip with a field programmable gate array device to facilitate, among others, interfacing with traditional space instruments via SpaceWire transcoding. We prototype our architecture in the lab targeting vision-based navigation tasks. We implement a representative computer vision pipeline to track the 6D pose of ENVISAT using megapixel images during hypothetical spacecraft proximity operations. Overall, we achieve 2.6 to 4.9 FPS with only 0.8 to 1.1 W on Myriad2 , i.e., 10-fold acceleration versus modern rad-hard processors. Based on the results, we assess various benefits of utilizing Myriad2 instead of conventional field programmable gate arrays and CPUs.


Electronics ◽  
2019 ◽  
Vol 8 (5) ◽  
pp. 528 ◽  
Author(s):  
Julian Viejo ◽  
Jorge Juan-Chico ◽  
Manuel J. Bellido ◽  
Paulino Ruiz-de-Clavijo ◽  
David Guerrero ◽  
...  

This paper presents the complete design and implementation of a low-cost, low-footprint, network time protocol server core for field programmable gate arrays. The core uses a carefully designed modular architecture, which is fully implemented in hardware using digital circuits and systems. Most remarkable novelties introduced are a hardware-optimized timekeeping algorithm implementation, and a full-hardware protocol stack and automatic network configuration. As a result, the core is able to achieve similar accuracy and performance to typical high-performance network time protocol server equipment. The core uses a standard global positioning system receiver as time reference, has a small footprint and can easily fit in a low-range field-programmable chip, greatly scaling down from previous system-on-chip time synchronization systems. Accuracy and performance results show that the core can serve hundreds of thousands of network time clients with negligible accuracy degradation, in contrast to state-of-the-art high-performance time server equipment. Therefore, this core provides a valuable time server solution for a wide range of emerging embedded and distributed network applications such as the Internet of Things and the smart grid, at a fraction of the cost and footprint of current discrete and embedded solutions.


2009 ◽  
Vol 2009 ◽  
pp. 1-13 ◽  
Author(s):  
Jim Harkin ◽  
Fearghal Morgan ◽  
Liam McDaid ◽  
Steve Hall ◽  
Brian McGinley ◽  
...  

FPGA devices have emerged as a popular platform for the rapid prototyping of biological Spiking Neural Networks (SNNs) applications, offering the key requirement of reconfigurability. However, FPGAs do not efficiently realise the biologically plausible neuron and synaptic models of SNNs, and current FPGA routing structures cannot accommodate the high levels of interneuron connectivity inherent in complex SNNs. This paper highlights and discusses the current challenges of implementing scalable SNNs on reconfigurable FPGAs. The paper proposes a novel field programmable neural network architecture (EMBRACE), incorporating low-power analogue spiking neurons, interconnected using a Network-on-Chip architecture. Results on the evaluation of the EMBRACE architecture using the XOR benchmark problem are presented, and the performance of the architecture is discussed. The paper also discusses the adaptability of the EMBRACE architecture in supporting fault tolerant computing.


2014 ◽  
Vol 35 (7) ◽  
pp. 1630-1635
Author(s):  
Yi-peng Zhang ◽  
Liang Chen ◽  
Huan Hao

Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 229
Author(s):  
Xianzhong Tian ◽  
Juan Zhu ◽  
Ting Xu ◽  
Yanjun Li

The latest results in Deep Neural Networks (DNNs) have greatly improved the accuracy and performance of a variety of intelligent applications. However, running such computation-intensive DNN-based applications on resource-constrained mobile devices definitely leads to long latency and huge energy consumption. The traditional way is performing DNNs in the central cloud, but it requires significant amounts of data to be transferred to the cloud over the wireless network and also results in long latency. To solve this problem, offloading partial DNN computation to edge clouds has been proposed, to realize the collaborative execution between mobile devices and edge clouds. In addition, the mobility of mobile devices is easily to cause the computation offloading failure. In this paper, we develop a mobility-included DNN partition offloading algorithm (MDPO) to adapt to user’s mobility. The objective of MDPO is minimizing the total latency of completing a DNN job when the mobile user is moving. The MDPO algorithm is suitable for both DNNs with chain topology and graphic topology. We evaluate the performance of our proposed MDPO compared to local-only execution and edge-only execution, experiments show that MDPO significantly reduces the total latency and improves the performance of DNN, and MDPO can adjust well to different network conditions.


Sign in / Sign up

Export Citation Format

Share Document