Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

Tobias Alonso; Lucian Petrica; Mario Ruiz; Jakoba Petri-Koenig; Yaman Umuroglu; Ioannis Stamelos; Elias Koromilas; Michaela Blott; Kees Vissers

doi:10.1145/3470567

Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3470567 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-34

Author(s):

Tobias Alonso ◽

Lucian Petrica ◽

Mario Ruiz ◽

Jakoba Petri-Koenig ◽

Yaman Umuroglu ◽

...

Keyword(s):

Neural Network ◽

Field Programmable Gate Array ◽

Resource Partitioning ◽

Deep Neural Network ◽

Peer To Peer ◽

Performance Difference ◽

Performance Portability ◽

Field Programmable ◽

Runtime Infrastructure ◽

Roll Out

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Download Full-text

Quantized deep neural network empowering an IM-DD link running in realtime on a field programmable gate array

45th European Conference on Optical Communication (ECOC 2019) ◽

10.1049/cp.2019.1190 ◽

2019 ◽

Author(s):

M. Chagnon ◽

J. Siirtola ◽

T. Rissa ◽

A. Verma

Keyword(s):

Neural Network ◽

Field Programmable Gate Array ◽

Deep Neural Network ◽

Field Programmable ◽

Gate Array

Download Full-text

Hardware Implementation of Artificial Neural Network Using Field Programmable Gate Array

International Journal of Computer Theory and Engineering ◽

10.7763/ijcte.2013.v5.795 ◽

2013 ◽

pp. 780-783 ◽

Cited By ~ 15

Author(s):

Esraa Zeki Mohammed ◽

Haitham Kareem Ali

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Field Programmable Gate Array ◽

Hardware Implementation ◽

Field Programmable ◽

Artificial Neural ◽

Gate Array

Download Full-text

Field Programmable Gate Array Implementation of a Neural Network-based Intelligent Sensor System

2006 9th International Conference on Control, Automation, Robotics and Vision ◽

10.1109/icarcv.2006.345341 ◽

2006 ◽

Cited By ~ 3

Author(s):

Jagdish C. Patra ◽

Han Yang Lee ◽

Pramod K. Meher ◽

Ee Luang Ang

Keyword(s):

Neural Network ◽

Field Programmable Gate Array ◽

Sensor System ◽

Intelligent Sensor ◽

Field Programmable ◽

Gate Array

Download Full-text

A field-programmable gate array system for sonar image recognition based on convolutional neural network

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/0959651820939345 ◽

2020 ◽

pp. 095965182093934

Author(s):

Chong Wang ◽

Yu Jiang ◽

Kai Wang ◽

Fenglin Wei

Keyword(s):

Neural Network ◽

Energy Consumption ◽

Convolutional Neural Network ◽

Image Recognition ◽

Field Programmable Gate Array ◽

Oil And Gas ◽

Sonar Image ◽

Field Programmable ◽

Sonar Images ◽

Gate Array

Subsea pipeline is the safest, most reliable, and most economical way to transport oil and gas from an offshore platform to an onshore terminal. However, the pipelines may rupture under the harsh working environment, causing oil and gas leakage. This calls for a proper device and method to detect the state of subsea pipelines in a timely and precise manner. The autonomous underwater vehicle carrying side-scan sonar offers a desirable way for target detection in the complex environment under the sea. As a result, this article combines the field-programmable gate array, featuring high throughput, low energy consumption and a high degree of parallelism, and the convolutional neural network into a sonar image recognition system. First, a training set was constructed by screening and splitting the sonar images collected by sensors, and labeled one by one. Next, the convolutional neural network model was trained by the set on the workstation platform. The trained model was integrated into the field-programmable gate array system and applied to recognize actual datasets. The recognition results were compared with those of the workstation platform. The comparison shows that the computational precision of the designed field-programmable gate array system based on convolutional neural network is equivalent to that of the workstation platform; however, the recognition time of the designed system can be saved by more than 77%, and its energy consumption can also be saved by more than 96.67%. Therefore, our system basically satisfies our demand for energy-efficient, real-time, and accurate recognition of sonar images.

Download Full-text

Real-time object tracking system based on field-programmable gate array and convolution neural network

International Journal of Advanced Robotic Systems ◽

10.1177/1729881416682705 ◽

2016 ◽

Vol 14 (1) ◽

pp. 172988141668270 ◽

Cited By ~ 3

Author(s):

Congyi Lyu ◽

Haoyao Chen ◽

Xin Jiang ◽

Peng Li ◽

Yunhui Liu

Keyword(s):

Neural Network ◽

Image Processing ◽

Object Tracking ◽

Real Time ◽

Field Programmable Gate Array ◽

Tracking System ◽

Convolution Neural Network ◽

Processing Unit ◽

Field Programmable ◽

Gate Array

Vision-based object tracking has lots of applications in robotics, like surveillance, navigation, motion capturing, and so on. However, the existing object tracking systems still suffer from the challenging problem of high computation consumption in the image processing algorithms. The problem can prevent current systems from being used in many robotic applications which have limitations of payload and power, for example, micro air vehicles. In these applications, the central processing unit- or graphics processing unit-based computers are not good choices due to the high weight and power consumption. To address the problem, this article proposed a real-time object tracking system based on field-programmable gate array, convolution neural network, and visual servo technology. The time-consuming image processing algorithms, such as distortion correction, color space convertor, and Sobel edge, Harris corner features detector, and convolution neural network were redesigned using the programmable gates in field-programmable gate array. Based on the field-programmable gate array-based image processing, an image-based visual servo controller was designed to drive a two degree of freedom manipulator to track the target in real time. Finally, experiments on the proposed system were performed to illustrate the effectiveness of the real-time object tracking system.

Download Full-text

Field-programmable gate array implementation of a probabilistic neural network for motor cortical decoding in rats

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2009.10.001 ◽

2010 ◽

Vol 185 (2) ◽

pp. 299-306 ◽

Cited By ~ 26

Author(s):

Fan Zhou ◽

Jun Liu ◽

Yi Yu ◽

Xiang Tian ◽

Hui Liu ◽

...

Keyword(s):

Neural Network ◽

Field Programmable Gate Array ◽

Probabilistic Neural Network ◽

Field Programmable ◽

Gate Array ◽

Motor Cortical

Download Full-text

Field Programmable Gate Array implementation of Conic Section Function Neural Network: An alternative to analog CSFNN circuitry

2012 IEEE 16th International Conference on Intelligent Engineering Systems (INES) ◽

10.1109/ines.2012.6249818 ◽

2012 ◽

Author(s):

Metin Elitas ◽

Oguzhan Yavuz ◽

Burcu Erkmen

Keyword(s):

Neural Network ◽

Field Programmable Gate Array ◽

Conic Section ◽

Field Programmable ◽

Gate Array ◽

Section Function

Download Full-text

FPGA Realization of Deep Neural Network for Hardware Trojan Detection

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i3.30946 ◽

2020 ◽

Vol 9 (3) ◽

pp. 764

Author(s):

Varun Reddy ◽

Nirmala Devi M

Keyword(s):

Neural Network ◽

Integrated Circuits ◽

Power Consumption ◽

Deep Neural Network ◽

Random Forest Classifier ◽

Third Party ◽

Hardware Trojan ◽

Hardware Trojan Detection ◽

Trojan Detection ◽

Field Programmable

With the increase in outsourcing design and fabrication, malicious third-party vendors often insert hardware Trojan (HT) in the integrated Circuits(IC). It is difficult to identify these Trojans since the nature and characteristics of each Trojan differ significantly. Any method developed for HT detection is limited by its capacity on dealing with varied types of Trojans. The main purpose of this study is to show using deep learning (DL), this problem can be dealt with some extent and the effect of deep neural network (DNN) when it is realized on field programmable gate array (FPGA). In this paper, we propose a comparison of accuracy in finding faults on ISCAS’85 benchmark circuits between random forest classifier and DNN. Further for the faster processing time and less power consumption, the network is implemented on FPGA. The results show the performance of deep neural network gets better when a large number of nets are used and faster in the execution of the algorithm. Also, the speedup of the neuron is 100x times better when implemented on FPGA with 15.32% of resource utilization and provides less power consumption than GPU.

Download Full-text

Design and implementation of hydrogen economy using artificial neural network on field programmable gate array

International Journal of Hydrogen Energy ◽

10.1016/j.ijhydene.2020.05.181 ◽

2020 ◽

Vol 45 (41) ◽

pp. 20709-20720 ◽

Cited By ~ 1

Author(s):

Ismail Koyuncu ◽

Ceyhun Yilmaz ◽

Murat Alcin ◽

Murat Tuna

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Field Programmable Gate Array ◽

Hydrogen Economy ◽

Design And Implementation ◽

Field Programmable ◽

Artificial Neural ◽

Gate Array

Download Full-text

A Real-Time Deep Learning OFDM Receiver

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3494049 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-25

Author(s):

Stefan Brennsteiner ◽

Tughrul Arslan ◽

John Thompson ◽

Andrew McCormick

Keyword(s):

Neural Network ◽

Neural Networks ◽

Real Time ◽

Field Programmable Gate Array ◽

Orthogonal Frequency Division Multiplexing ◽

Frequency Division Multiplexing ◽

Frequency Division ◽

Field Programmable ◽

Gate Array ◽

Fully Connected

Machine learning in the physical layer of communication systems holds the potential to improve performance and simplify design methodology. Many algorithms have been proposed; however, the model complexity is often unfeasible for real-time deployment. The real-time processing capability of these systems has not been proven yet. In this work, we propose a novel, less complex, fully connected neural network to perform channel estimation and signal detection in an orthogonal frequency division multiplexing system. The memory requirement, which is often the bottleneck for fully connected neural networks, is reduced by ≈ 27 times by applying known compression techniques in a three-step training process. Extensive experiments were performed for pruning and quantizing the weights of the neural network detector. Additionally, Huffman encoding was used on the weights to further reduce memory requirements. Based on this approach, we propose the first field-programmable gate array based, real-time capable neural network accelerator, specifically designed to accelerate the orthogonal frequency division multiplexing detector workload. The accelerator is synthesized for a Xilinx RFSoC field-programmable gate array, uses small-batch processing to increase throughput, efficiently supports branching neural networks, and implements superscalar Huffman decoders.

Download Full-text