BlastFunction: A Full-stack Framework Bringing FPGA Hardware Acceleration to Cloud-native Applications

2022 ◽  
Vol 15 (2) ◽  
pp. 1-27
Author(s):  
Andrea Damiani ◽  
Giorgia Fiscaletti ◽  
Marco Bacis ◽  
Rolando Brondolin ◽  
Marco D. Santambrogio

“Cloud-native” is the umbrella adjective describing the standard approach for developing applications that exploit cloud infrastructures’ scalability and elasticity at their best. As the application complexity and user-bases grow, designing for performance becomes a first-class engineering concern. As an answer to these needs, heterogeneous computing platforms gained widespread attention as powerful tools to continue meeting SLAs for compute-intensive cloud-native workloads. We propose BlastFunction, an FPGA-as-a-Service full-stack framework to ease FPGAs’ adoption for cloud-native workloads, integrating with the vast spectrum of fundamental cloud models. At the IaaS level, BlastFunction time-shares FPGA-based accelerators to provide multi-tenant access to accelerated resources without any code rewriting. At the PaaS level, BlastFunction accelerates functionalities leveraging the serverless model and scales functions proactively, depending on the workload’s performance. Further lowering the FPGAs’ adoption barrier, an accelerators’ registry hosts accelerated functions ready to be used within cloud-native applications, bringing the simplicity of a SaaS-like approach to the developers. After an extensive experimental campaign against state-of-the-art cloud scenarios, we show how BlastFunction leads to higher performance metrics (utilization and throughput) against native execution, with minimal latency and overhead differences. Moreover, the scaling scheme we propose outperforms the main serverless autoscaling algorithms in workload performance and scaling operation amount.

2020 ◽  
Vol 10 (2) ◽  
pp. 36-55 ◽  
Author(s):  
Hamid A Jadad ◽  
Abderezak Touzene ◽  
Khaled Day

Recently, much research has focused on the improvement of mobile app performance and their power optimization, by offloading computation from mobile devices to public cloud computing platforms. However, the scalability of these offloading services on a large scale is still a challenge. This article describes a solution to this scalability problem by proposing a middleware that provides offloading as a service (OAS) to large-scale implementation of mobile users and apps. The proposed middleware OAS uses adaptive VM allocation and deallocation algorithms based on a CPU rate prediction model. Furthermore, it dynamically schedules the requests using a load-balancing algorithm to ensure meeting QoS requirements at a lower cost. The authors have tested the proposed algorithm by conducting multiple simulations and compared our results with state-of-the-art algorithms based on various performance metrics under multiple load conditions. The results show that OAS achieves better response time with a minimum number of VMs and reduces 50% of the cost compared to existing approaches.


Author(s):  
J. Lane Thames ◽  
Oliver Eck ◽  
Dirk Schaefer

Many modern products are complex systems comprised of highly integrated mechanical, electrical, electronic, and software components, which are commonly known as mechatronic systems. Similarly, product data and life-cycle management systems that support the engineering and design of mechatronic systems are becoming complex and need to store, retrieve, and process vast amounts of files associated with mechatronic products. For many years, software developers and computer architects have benefited by continuous increases in computational performance, as predicted by Moore's law. However, issues such as extreme power consumption have begun to limit certain types of performance increases such as hardware clock rates. In an effort to find new ways to increase computational performance, engineers and computer scientists have been investigating new techniques such as hardware acceleration systems, reconfigurable computing, and heterogeneous computing (HC). In light of these emerging computational paradigms, this paper introduces a semantic association hardware acceleration system for integrated product data management (PDM) based on semantic file systems. The concept of semantic path merger (SPM) is described along with a discussion of its realization as a hardware-based associative memory for accelerated semantic file retrieval. The energy and retrieval performance metrics of the proposed hardware system is given along with its comparative analysis with the industry standard content addressable memory (CAM). The goal of the proposed system is to enhance the state-of-art for the field of heterogeneous computing within the scope of computational platforms for design and engineering applications.


2019 ◽  
Vol 35 (21) ◽  
pp. 4255-4263 ◽  
Author(s):  
Mohammed Alser ◽  
Hasan Hassan ◽  
Akash Kumar ◽  
Onur Mutlu ◽  
Can Alkan

AbstractMotivationThe ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern field-programmable gate array (FPGA) architectures to further boost the performance of our algorithm.ResultsShouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8×. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step.Availability and implementationhttps://github.com/CMU-SAFARI/Shouji.Supplementary informationSupplementary data are available at Bioinformatics online.


2013 ◽  
Vol 18 ◽  
pp. 1891-1898
Author(s):  
Chetan Kumar N G ◽  
Sudhanshu Vyas ◽  
Ron K. Cytron ◽  
Christopher D. Gill ◽  
Joseph Zambreno ◽  
...  

Author(s):  
Michał R. Nowicki ◽  
Dominik Belter ◽  
Aleksander Kostusiak ◽  
Petr Cížek ◽  
Jan Faigl ◽  
...  

Purpose This paper aims to evaluate four different simultaneous localization and mapping (SLAM) systems in the context of localization of multi-legged walking robots equipped with compact RGB-D sensors. This paper identifies problems related to in-motion data acquisition in a legged robot and evaluates the particular building blocks and concepts applied in contemporary SLAM systems against these problems. The SLAM systems are evaluated on two independent experimental set-ups, applying a well-established methodology and performance metrics. Design/methodology/approach Four feature-based SLAM architectures are evaluated with respect to their suitability for localization of multi-legged walking robots. The evaluation methodology is based on the computation of the absolute trajectory error (ATE) and relative pose error (RPE), which are performance metrics well-established in the robotics community. Four sequences of RGB-D frames acquired in two independent experiments using two different six-legged walking robots are used in the evaluation process. Findings The experiments revealed that the predominant problem characteristics of the legged robots as platforms for SLAM are the abrupt and unpredictable sensor motions, as well as oscillations and vibrations, which corrupt the images captured in-motion. The tested adaptive gait allowed the evaluated SLAM systems to reconstruct proper trajectories. The bundle adjustment-based SLAM systems produced best results, thanks to the use of a map, which enables to establish a large number of constraints for the estimated trajectory. Research limitations/implications The evaluation was performed using indoor mockups of terrain. Experiments in more natural and challenging environments are envisioned as part of future research. Practical implications The lack of accurate self-localization methods is considered as one of the most important limitations of walking robots. Thus, the evaluation of the state-of-the-art SLAM methods on legged platforms may be useful for all researchers working on walking robots’ autonomy and their use in various applications, such as search, security, agriculture and mining. Originality/value The main contribution lies in the integration of the state-of-the-art SLAM methods on walking robots and their thorough experimental evaluation using a well-established methodology. Moreover, a SLAM system designed especially for RGB-D sensors and real-world applications is presented in details.


2021 ◽  
Vol 14 (5) ◽  
pp. 785-798
Author(s):  
Daokun Hu ◽  
Zhiwen Chen ◽  
Jianbing Wu ◽  
Jianhua Sun ◽  
Hao Chen

Persistent memory (PM) is increasingly being leveraged to build hash-based indexing structures featuring cheap persistence, high performance, and instant recovery, especially with the recent release of Intel Optane DC Persistent Memory Modules. However, most of them are evaluated on DRAM-based emulators with unreal assumptions, or focus on the evaluation of specific metrics with important properties sidestepped. Thus, it is essential to understand how well the proposed hash indexes perform on real PM and how they differentiate from each other if a wider range of performance metrics are considered. To this end, this paper provides a comprehensive evaluation of persistent hash tables. In particular, we focus on the evaluation of six state-of-the-art hash tables including Level hashing, CCEH, Dash, PCLHT, Clevel, and SOFT, with real PM hardware. Our evaluation was conducted using a unified benchmarking framework and representative workloads. Besides characterizing common performance properties, we also explore how hardware configurations (such as PM bandwidth, CPU instructions, and NUMA) affect the performance of PM-based hash tables. With our in-depth analysis, we identify design trade-offs and good paradigms in prior arts, and suggest desirable optimizations and directions for the future development of PM-based hash tables.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Minh Thanh Vo ◽  
Anh H. Vo ◽  
Tuong Le

PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.


2021 ◽  
Author(s):  
Oliver Sjögren ◽  
Carlos Xisto ◽  
Tomas Grönstedt

Abstract The aim of this study is to explore the possibility of matching a cycle performance model to public data on a state-of-the-art commercial aircraft engine (GEnx-1B). The study is focused on obtaining valuable information on figure of merits for the technology level of the low-pressure system and associated uncertainties. It is therefore directed more specifically towards the fan and low-pressure turbine efficiencies, the Mach number at the fan-face, the distribution of power between the core and the bypass stream as well as the fan pressure ratio. Available cycle performance data have been extracted from the engine emission databank provided by the International Civil Aviation Organization (ICAO), type certificate datasheets from the European Union Aviation Safety Agency (EASA) and the Federal Aviation Administration (FAA), as well as publicly available data from engine manufacturer. Uncertainties in the available source data are estimated and randomly sampled to generate inputs for a model matching procedure. The results show that fuel performance can be estimated with some degree of confidence. However, the study also indicates that a high degree of uncertainty is expected in the prediction of key low-pressure system performance metrics, when relying solely on publicly available data. This outcome highlights the importance of statistic-based methods as a support tool for the inverse design procedures. It also provides a better understanding on the limitations of conventional thermodynamic matching procedures, and the need to complement with methods that take into account conceptual design, cost and fuel burn.


2018 ◽  
Vol 4 (9) ◽  
pp. 107 ◽  
Author(s):  
Mohib Ullah ◽  
Ahmed Mohammed ◽  
Faouzi Alaya Cheikh

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.


Sign in / Sign up

Export Citation Format

Share Document