heterogeneous processing Latest Research Papers

AbstractClassical database systems are now facing the challenge of processing high-volume data feeds at unprecedented rates as efficiently as possible while also minimizing power consumption. Since CPU-only machines hit their limits, co-processors like GPUs and FPGAs are investigated by database system designers for their distinct capabilities. As a result, database systems over heterogeneous processing architectures are on the rise. In order to better understand their potentials and limitations, in-depth performance analyses are vital. This paper provides interesting performance data by benchmarking a portable operator set for column-based systems on CPU, GPU, and FPGA – all available processing devices within the same system. We consider TPC‑H query Q6 and additionally a hash join to profile the execution across the systems. We show that system memory access and/or buffer management remains the main bottleneck for device integration, and that architecture-specific execution engines and operators offer significantly higher performance.

Download Full-text

SLAQA

ACM Transactions on Embedded Computing Systems ◽

10.1145/3462776 ◽

2021 ◽

Vol 20 (5) ◽

pp. 1-31

Author(s):

Sanjit Kumar Roy ◽

Rajesh Devaraj ◽

Arnab Sarkar ◽

Debabrata Senapati

Keyword(s):

Heuristic Algorithms ◽

Optimal Solution ◽

Quality Level ◽

Task Graph ◽

Complex Interactions ◽

Heterogeneous Processing ◽

Simulation Based ◽

Distributed Platform ◽

Functional Components

Continuous demands for higher performance and reliability within stringent resource budgets is driving a shift from homogeneous to heterogeneous processing platforms for the implementation of today’s cyber-physical systems (CPSs). These CPSs are typically represented as Directed-acyclic Task Graph (DTG) due to the complex interactions between their functional components that are often distributed in nature. In this article, we consider the problem of scheduling a real-time application modelled as a single DTG, where tasks may have multiple implementations designated as quality-levels, with higher quality-levels producing more accurate results and contributing to higher rewards/Quality-of-Service for the system. First, we introduce an optimal solution using Integer Linear Programming (ILP) for a DTG with multiple quality-levels, to be executed on a heterogeneous distributed platform . However, this ILP-based optimal solution exhibits high computational complexity and does not scale for moderately large problem sizes. Hence, we propose two low-overhead heuristic algorithms called Global Slack Aware Quality-level Allocator ( G-SLAQA ) and Total Slack Aware Quality-level Allocator ( T-SLAQA ), which are able to produce satisfactorily efficient as well as fast solutions within a reasonable time. G-SLAQA , the baseline heuristic, is greedier and faster than its counter-part T-SLAQA , whose performance is at least as efficient as G-SLAQA . The efficiency of all the proposed schemes have been extensively evaluated through simulation-based experiments using benchmark and randomly generated DTGs. Through the case study of a real-world automotive traction controller , we generate schedules using our proposed schemes to demonstrate their practical applicability.

Download Full-text

Hardware-Software Co-Synthesis for Distributed Memory Architectures

10.32920/ryerson.14644746.v1 ◽

2021 ◽

Author(s):

Usman Ahmed

Keyword(s):

Distributed Memory ◽

Synthesis Method ◽

Memory Architecture ◽

Communication Structure ◽

Data Dependencies ◽

Processing Elements ◽

Quad Tree ◽

Heterogeneous Processing ◽

Memory Architectures ◽

Distributed Memory Architecture

Hardware software co-synthesis problem is related to finding an architecture, subject to certain constraints, for a given set of tasks that are related through data dependencies. The architecture consists of a set of heterogeneous processing elements and a communication structure between these processing elements. In this thesis, a new algorithm for co-synthesis is presented that targets distributed memory architectures. The algorithm consists of four distinct phases namely, processing element selection, pipelined task allocation, scheduling and best topology selection. Selected processing elements are finally mapped to a regular distributed memory architecture comprising of mesh, hypercube or quad-tree topology. The co-synthesis method is demonstrated by applying it to MPEG encoder application and various size large random graphs.

Download Full-text

Load Deployment Decision Algorithm Based on CPU+GPU Heterogeneous Processing Platform

10.1145/3469968.3469996 ◽

2021 ◽

Author(s):

Di Yang ◽

Jinquan Ma ◽

Chunsheng Yue ◽

ZHICHONG Shen ◽

Xiaolong Shen

Keyword(s):

Decision Algorithm ◽

Heterogeneous Processing ◽

Processing Platform

Download Full-text

Hardware-Software Co-Synthesis for Distributed Memory Architectures

10.32920/ryerson.14644746 ◽

2021 ◽

Author(s):

Usman Ahmed

Keyword(s):

Distributed Memory ◽

Synthesis Method ◽

Memory Architecture ◽

Communication Structure ◽

Data Dependencies ◽

Processing Elements ◽

Quad Tree ◽

Heterogeneous Processing ◽

Memory Architectures ◽

Distributed Memory Architecture

Hardware software co-synthesis problem is related to finding an architecture, subject to certain constraints, for a given set of tasks that are related through data dependencies. The architecture consists of a set of heterogeneous processing elements and a communication structure between these processing elements. In this thesis, a new algorithm for co-synthesis is presented that targets distributed memory architectures. The algorithm consists of four distinct phases namely, processing element selection, pipelined task allocation, scheduling and best topology selection. Selected processing elements are finally mapped to a regular distributed memory architecture comprising of mesh, hypercube or quad-tree topology. The co-synthesis method is demonstrated by applying it to MPEG encoder application and various size large random graphs.

Download Full-text

Systems-on-Chip with Strong Ordering

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3428153 ◽

2021 ◽

Vol 18 (1) ◽

pp. 1-27

Author(s):

Sooraj Puthoor ◽

Mikko H. Lipasti

Keyword(s):

High Performance ◽

Geometric Mean ◽

Memory Consistency ◽

Memory Hierarchies ◽

Data Parallel ◽

Consistency Model ◽

Heterogeneous Processing ◽

Systems On Chip ◽

Memory Consistency Model ◽

On Chip

Sequential consistency (SC) is the most intuitive memory consistency model and the easiest for programmers and hardware designers to reason about. However, the strict memory ordering restrictions imposed by SC make it less attractive from a performance standpoint. Additionally, prior high-performance SC implementations required complex hardware structures to support speculation and recovery. In this article, we introduce the lockstep SC consistency model (LSC), a new memory model based on SC but carefully defined to accommodate the data parallel lockstep execution paradigm of GPUs. We also describe an efficient LSC implementation for an APU system-on-chip (SoC) and show that our implementation performs close to the baseline relaxed model. Evaluation of our implementation shows that the geometric mean performance cost for lockstep SC is just 0.76% for GPU execution and 6.11% for the entire APU SoC compared to a baseline with a weaker memory consistency model. Adoption of LSC in future APU and SoC designs will reduce the burden on programmers trying to write correct parallel programs, while also simplifying the implementation and verification of systems with heterogeneous processing elements and complex memory hierarchies. 1

Download Full-text

Design of High-Efficiency Heterogeneous Processing System for on-Board Mass Telemetry Data Analysis

Lecture Notes in Electrical Engineering - Signal and Information Processing, Networking and Computers ◽

10.1007/978-981-33-4102-9_78 ◽

2020 ◽

pp. 643-651

Author(s):

Yuehua Niu ◽

Wenyan Zhao ◽

Xin Li ◽

Weiwei Liu ◽

Yalong Pang

Keyword(s):

Data Analysis ◽

High Efficiency ◽

Processing System ◽

Telemetry Data ◽

Heterogeneous Processing

Download Full-text

MiR-CLIP reveals iso-miR selective regulation in the miR-124 targetome

Nucleic Acids Research ◽

10.1093/nar/gkaa1117 ◽

2020 ◽

Author(s):

Yuluan Wang ◽

Charlotte Soneson ◽

Anna L Malinowska ◽

Artur Laski ◽

Souvik Ghosh ◽

...

Keyword(s):

Structural Data ◽

Cross Linking ◽

Regulate Gene Expression ◽

Protein Levels ◽

New Class ◽

Heterogeneous Processing ◽

Target Sites ◽

First Time ◽

Mirna Targeting ◽

Regulate Gene

Abstract Many microRNAs regulate gene expression via atypical mechanisms, which are difficult to discern using native cross-linking methods. To ascertain the scope of non-canonical miRNA targeting, methods are needed that identify all targets of a given miRNA. We designed a new class of miR-CLIP probe, whereby psoralen is conjugated to the 3p arm of a pre-microRNA to capture targetomes of miR-124 and miR-132 in HEK293T cells. Processing of pre-miR-124 yields miR-124 and a 5′-extended isoform, iso-miR-124. Using miR-CLIP, we identified overlapping targetomes from both isoforms. From a set of 16 targets, 13 were differently inhibited at mRNA/protein levels by the isoforms. Moreover, delivery of pre-miR-124 into cells repressed these targets more strongly than individual treatments with miR-124 and iso-miR-124, suggesting that isomirs from one pre-miRNA may function synergistically. By mining the miR-CLIP targetome, we identified nine G-bulged target-sites that are regulated at the protein level by miR-124 but not isomiR-124. Using structural data, we propose a model involving AGO2 helix-7 that suggests why only miR-124 can engage these sites. In summary, access to the miR-124 targetome via miR-CLIP revealed for the first time how heterogeneous processing of miRNAs combined with non-canonical targeting mechanisms expand the regulatory range of a miRNA.

Download Full-text

Scalable Multicamera Heterogeneous Processing Architecture for Phase Measuring Profilometry

IEEE Sensors Journal ◽

10.1109/jsen.2020.2999060 ◽

2020 ◽

Vol 20 (20) ◽

pp. 12423-12434

Author(s):

Yinfei Pan ◽

Duanmao Liu ◽

Rongsheng Lu

Keyword(s):

Phase Measuring Profilometry ◽

Heterogeneous Processing ◽

Processing Architecture

Download Full-text

MuTARe: A Multi-Target, Adaptive Reconfigurable Architecture

10.5753/ctd.2020.11363 ◽

2020 ◽

Author(s):

Marcelo Brandalero ◽

Luigi Carro ◽

Antonio Carlos Schneider Beck

Keyword(s):

Energy Consumption ◽

Response Times ◽

Fast Response ◽

High Energy ◽

Reconfigurable Architecture ◽

Coarse Grained ◽

Approximate Computing ◽

Traditional System ◽

Heterogeneous Processing ◽

High Flexibility

With recent changes in transistor scaling trends, the design of all types of processing systems has become increasingly constrained by power consumption. At the same time, driven by the needs of fast response times, many applications are migrating from the cloud to the edge, pushing for the challenge of increasing the performance of these already power-constrained devices. The key to addressing this problem is to design application-specific processors that perfectly match the application's requirements and avoid unnecessary energy consumption. However, such dedicated platforms require significant design time and are thus unable to match the pace of fast-evolving applications that are deployed in the Internet-of-Things (IoT) every day. Motivated by the need for high energy efficiency and high flexibility in hardware platforms, this thesis paves the way to a new class of low-power adaptive processors that can achieve these goals by automatically modifying their structure at run time to match different applications' resource requirements. The proposed Multi-Target Adaptive Reconfigurable Architecture (MuTARe) is based upon a Coarse-Grained Reconfigurable Architecture (CGRA) that can transparently accelerate already-deployed applications, but incorporates novel compute paradigms such as Approximate Computing (AxC) and Near-Threshold Voltage Computing (NTC) to improve its efficiency. Compared to a traditional system of heterogeneous processing cores (similar to ARM's big.LITTLE), the base MuTARe architecture can (without any change to the existing software) improve the execution time by up to $1.3\times$, adapt to the same task deadline with $1.6\times$ smaller energy consumption or adapt to the same low energy budget with $2.3\times$ better performance. When extended for AxC, MuTARe's power savings can be further improved by up to $50\%$ in error-tolerant applications, and when extended for NTC, MuTARe can save further $30\%$ energy in memory-intensive workloads.

Download Full-text

heterogeneous processing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

In-Depth Analysis of OLAP Query Performance on Heterogeneous Hardware

SLAQA

Hardware-Software Co-Synthesis for Distributed Memory Architectures

Load Deployment Decision Algorithm Based on CPU+GPU Heterogeneous Processing Platform

Hardware-Software Co-Synthesis for Distributed Memory Architectures

Systems-on-Chip with Strong Ordering

Design of High-Efficiency Heterogeneous Processing System for on-Board Mass Telemetry Data Analysis

MiR-CLIP reveals iso-miR selective regulation in the miR-124 targetome

Scalable Multicamera Heterogeneous Processing Architecture for Phase Measuring Profilometry

MuTARe: A Multi-Target, Adaptive Reconfigurable Architecture

Export Citation Format

heterogeneous processingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

In-Depth Analysis of OLAP Query Performance on Heterogeneous Hardware

SLAQA

Hardware-Software Co-Synthesis for Distributed Memory Architectures

Load Deployment Decision Algorithm Based on CPU+GPU Heterogeneous Processing Platform

Hardware-Software Co-Synthesis for Distributed Memory Architectures

Systems-on-Chip with Strong Ordering

Design of High-Efficiency Heterogeneous Processing System for on-Board Mass Telemetry Data Analysis

MiR-CLIP reveals iso-miR selective regulation in the miR-124 targetome

Scalable Multicamera Heterogeneous Processing Architecture for Phase Measuring Profilometry

MuTARe: A Multi-Target, Adaptive Reconfigurable Architecture

heterogeneous processing
Recently Published Documents