Elastic-DF: Scaling Performance of DNN Inference in FPGA Clouds through Automatic Partitioning

2022 ◽  
Vol 15 (2) ◽  
pp. 1-34
Author(s):  
Tobias Alonso ◽  
Lucian Petrica ◽  
Mario Ruiz ◽  
Jakoba Petri-Koenig ◽  
Yaman Umuroglu ◽  
...  

Customized compute acceleration in the datacenter is key to the wider roll-out of applications based on deep neural network (DNN) inference. In this article, we investigate how to maximize the performance and scalability of field-programmable gate array (FPGA)-based pipeline dataflow DNN inference accelerators (DFAs) automatically on computing infrastructures consisting of multi-die, network-connected FPGAs. We present Elastic-DF, a novel resource partitioning tool and associated FPGA runtime infrastructure that integrates with the DNN compiler FINN. Elastic-DF allocates FPGA resources to DNN layers and layers to individual FPGA dies to maximize the total performance of the multi-FPGA system. In the resulting Elastic-DF mapping, the accelerator may be instantiated multiple times, and each instance may be segmented across multiple FPGAs transparently, whereby the segments communicate peer-to-peer through 100 Gbps Ethernet FPGA infrastructure, without host involvement. When applied to ResNet-50, Elastic-DF provides a 44% latency decrease on Alveo U280. For MobileNetV1 on Alveo U200 and U280, Elastic-DF enables a 78% throughput increase, eliminating the performance difference between these cards and the larger Alveo U250. Elastic-DF also increases operating frequency in all our experiments, on average by over 20%. Elastic-DF therefore increases performance portability between different sizes of FPGA and increases the critical throughput per cost metric of datacenter inference.

Author(s):  
Chong Wang ◽  
Yu Jiang ◽  
Kai Wang ◽  
Fenglin Wei

Subsea pipeline is the safest, most reliable, and most economical way to transport oil and gas from an offshore platform to an onshore terminal. However, the pipelines may rupture under the harsh working environment, causing oil and gas leakage. This calls for a proper device and method to detect the state of subsea pipelines in a timely and precise manner. The autonomous underwater vehicle carrying side-scan sonar offers a desirable way for target detection in the complex environment under the sea. As a result, this article combines the field-programmable gate array, featuring high throughput, low energy consumption and a high degree of parallelism, and the convolutional neural network into a sonar image recognition system. First, a training set was constructed by screening and splitting the sonar images collected by sensors, and labeled one by one. Next, the convolutional neural network model was trained by the set on the workstation platform. The trained model was integrated into the field-programmable gate array system and applied to recognize actual datasets. The recognition results were compared with those of the workstation platform. The comparison shows that the computational precision of the designed field-programmable gate array system based on convolutional neural network is equivalent to that of the workstation platform; however, the recognition time of the designed system can be saved by more than 77%, and its energy consumption can also be saved by more than 96.67%. Therefore, our system basically satisfies our demand for energy-efficient, real-time, and accurate recognition of sonar images.


2016 ◽  
Vol 14 (1) ◽  
pp. 172988141668270 ◽  
Author(s):  
Congyi Lyu ◽  
Haoyao Chen ◽  
Xin Jiang ◽  
Peng Li ◽  
Yunhui Liu

Vision-based object tracking has lots of applications in robotics, like surveillance, navigation, motion capturing, and so on. However, the existing object tracking systems still suffer from the challenging problem of high computation consumption in the image processing algorithms. The problem can prevent current systems from being used in many robotic applications which have limitations of payload and power, for example, micro air vehicles. In these applications, the central processing unit- or graphics processing unit-based computers are not good choices due to the high weight and power consumption. To address the problem, this article proposed a real-time object tracking system based on field-programmable gate array, convolution neural network, and visual servo technology. The time-consuming image processing algorithms, such as distortion correction, color space convertor, and Sobel edge, Harris corner features detector, and convolution neural network were redesigned using the programmable gates in field-programmable gate array. Based on the field-programmable gate array-based image processing, an image-based visual servo controller was designed to drive a two degree of freedom manipulator to track the target in real time. Finally, experiments on the proposed system were performed to illustrate the effectiveness of the real-time object tracking system.


2020 ◽  
Vol 9 (3) ◽  
pp. 764
Author(s):  
Varun Reddy ◽  
Nirmala Devi M

With the increase in outsourcing design and fabrication, malicious third-party vendors often insert hardware Trojan (HT) in the integrated Circuits(IC). It is difficult to identify these Trojans since the nature and characteristics of each Trojan differ significantly. Any method developed for HT detection is limited by its capacity on dealing with varied types of Trojans. The main purpose of this study is to show using deep learning (DL), this problem can be dealt with some extent and the effect of deep neural network (DNN) when it is realized on field programmable gate array (FPGA). In this paper, we propose a comparison of accuracy in finding faults on ISCAS’85 benchmark circuits between random forest classifier and DNN. Further for the faster processing time and less power consumption, the network is implemented on FPGA. The results show the performance of deep neural network gets better when a large number of nets are used and faster in the execution of the algorithm. Also, the speedup of the neuron is 100x times better when implemented on FPGA with 15.32% of resource utilization and provides less power consumption than GPU.


2022 ◽  
Vol 15 (3) ◽  
pp. 1-25
Author(s):  
Stefan Brennsteiner ◽  
Tughrul Arslan ◽  
John Thompson ◽  
Andrew McCormick

Machine learning in the physical layer of communication systems holds the potential to improve performance and simplify design methodology. Many algorithms have been proposed; however, the model complexity is often unfeasible for real-time deployment. The real-time processing capability of these systems has not been proven yet. In this work, we propose a novel, less complex, fully connected neural network to perform channel estimation and signal detection in an orthogonal frequency division multiplexing system. The memory requirement, which is often the bottleneck for fully connected neural networks, is reduced by ≈ 27 times by applying known compression techniques in a three-step training process. Extensive experiments were performed for pruning and quantizing the weights of the neural network detector. Additionally, Huffman encoding was used on the weights to further reduce memory requirements. Based on this approach, we propose the first field-programmable gate array based, real-time capable neural network accelerator, specifically designed to accelerate the orthogonal frequency division multiplexing detector workload. The accelerator is synthesized for a Xilinx RFSoC field-programmable gate array, uses small-batch processing to increase throughput, efficiently supports branching neural networks, and implements superscalar Huffman decoders.


Sign in / Sign up

Export Citation Format

Share Document