Value Iteration Architecture Based Deep Learning for Intelligent Routing Exploiting Heterogeneous Computing Platforms

“Cloud-native” is the umbrella adjective describing the standard approach for developing applications that exploit cloud infrastructures’ scalability and elasticity at their best. As the application complexity and user-bases grow, designing for performance becomes a first-class engineering concern. As an answer to these needs, heterogeneous computing platforms gained widespread attention as powerful tools to continue meeting SLAs for compute-intensive cloud-native workloads. We propose BlastFunction, an FPGA-as-a-Service full-stack framework to ease FPGAs’ adoption for cloud-native workloads, integrating with the vast spectrum of fundamental cloud models. At the IaaS level, BlastFunction time-shares FPGA-based accelerators to provide multi-tenant access to accelerated resources without any code rewriting. At the PaaS level, BlastFunction accelerates functionalities leveraging the serverless model and scales functions proactively, depending on the workload’s performance. Further lowering the FPGAs’ adoption barrier, an accelerators’ registry hosts accelerated functions ready to be used within cloud-native applications, bringing the simplicity of a SaaS-like approach to the developers. After an extensive experimental campaign against state-of-the-art cloud scenarios, we show how BlastFunction leads to higher performance metrics (utilization and throughput) against native execution, with minimal latency and overhead differences. Moreover, the scaling scheme we propose outperforms the main serverless autoscaling algorithms in workload performance and scaling operation amount.

Download Full-text

Heterogeneous Computing System for Deep Learning

Deep Learning: Concepts and Architectures - Studies in Computational Intelligence ◽

10.1007/978-3-030-31756-0_10 ◽

2019 ◽

pp. 287-319

Author(s):

Mihaela Maliţa ◽

George Vlǎduţ Popescu ◽

Gheorghe M. Ştefan

Keyword(s):

Deep Learning ◽

Heterogeneous Computing ◽

Computing System ◽

Heterogeneous Computing System

Download Full-text

Automated Ensemble for Deep Learning Inference on Edge Computing Platforms

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3102945 ◽

2021 ◽

pp. 1-1

Author(s):

Yang Bai ◽

Lixing Chen ◽

Mohamed Abdel-Mottaleb ◽

Jie Xu

Keyword(s):

Deep Learning ◽

Edge Computing ◽

Computing Platforms

Download Full-text

Research on Parallel Deep Learning for Heterogeneous Computing Architecture

Journal of Grid Computing ◽

10.1007/s10723-020-09520-4 ◽

2020 ◽

Vol 18 (2) ◽

pp. 177-179

Author(s):

Kaijian Xia ◽

Tao Hu ◽

Wen Si

Keyword(s):

Deep Learning ◽

Heterogeneous Computing ◽

Computing Architecture

Download Full-text

Accelerate Scientific Deep Learning Models on Heterogeneous Computing Platform with FPGA

EPJ Web of Conferences ◽

10.1051/epjconf/202024509014 ◽

2020 ◽

Vol 245 ◽

pp. 09014

Author(s):

Chao Jiang ◽

David Ojika ◽

Sofia Vallecorsa ◽

Thorsten Kurth ◽

Prabhat ◽

...

Keyword(s):

Deep Learning ◽

Data Analysis ◽

Heterogeneous Computing ◽

Scientific Data ◽

Great Promise ◽

University Of Florida ◽

Computing Platform ◽

Scientific Data Analysis ◽

The University ◽

Heterogeneous Computing Platform

AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis applications. However, traditional CPU-based sequential computing without special instructions can no longer meet the requirements of mission-critical applications, which are compute-intensive and require low latency and high throughput. Heterogeneous computing (HGC), with CPUs integrated with GPUs, FPGAs, and other science-targeted accelerators, offers unique capabilities to accelerate DNNs. Collaborating researchers at SHREC1at the University of Florida, CERN Openlab, NERSC2at Lawrence Berkeley National Lab, Dell EMC, and Intel are studying the application of heterogeneous computing (HGC) to scientific problems using DNN models. This paper focuses on the use of FPGAs to accelerate the inferencing stage of the HGC workflow. We present case studies and results in inferencing state-of-the-art DNN models for scientific data analysis, using Intel distribution of OpenVINO, running on an Intel Programmable Acceleration Card (PAC) equipped with an Arria 10 GX FPGA. Using the Intel Deep Learning Acceleration (DLA) development suite to optimize existing FPGA primitives and develop new ones, we were able accelerate the scientific DNN models under study with a speedup from 2.46x to 9.59x for a single Arria 10 FPGA against a single core (single thread) of a server-class Skylake CPU.

Download Full-text

Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2016.130 ◽

2016 ◽

Cited By ~ 7

Author(s):

John E. Stone ◽

Michael J. Hallock ◽

James C. Phillips ◽

Joseph R. Peterson ◽

Zaida Luthey-Schulten ◽

...

Keyword(s):

Energy Efficient ◽

Heterogeneous Computing ◽

Computing Platforms

Download Full-text

Performance and energy optimization of heterogeneous CPU-GPU systems for embedded applications

10.32920/ryerson.14661414 ◽

2021 ◽

Author(s):

Abdullah Siddiqui

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Optimization Algorithm ◽

Heterogeneous Computing ◽

Energy Optimization ◽

Systems Design ◽

Application Partitioning ◽

Computing Platforms ◽

Embedded Applications ◽

Software Partitioning

One of the most critical steps of embedded systems design is Hardware-Software partitioning. It is characterized by distributing the components of an application between hardware and software such that the user defined system constraints are satisfied. Heterogeneous computing platforms consisting of CPUs and GPUs have tremendous potential for enhancing the performance of embedded applications. The challenge of application partitioning for CPU-GPU mapping is much greater on such platforms due to their unique and diverse characteristics. In this thesis, an optimization algorithm is devised and presented for partitioning and mapping computational tasks on CPU-GPU platforms while keeping a check on the power consumption. Our methodology also uses parallelism in applications and their tasks by utilizing the architectural capabilities of the GPU. The optimization algorithm was tested with a MJPEG decoder, several benchmarks and synthetic graphs.

Download Full-text