heterogeneous architectures
Recently Published Documents


TOTAL DOCUMENTS

281
(FIVE YEARS 27)

H-INDEX

20
(FIVE YEARS 0)

2021 ◽  
Vol 20 (6) ◽  
pp. 1-35
Author(s):  
Junio Cezar Ribeiro Da Silva ◽  
Lorena Leão ◽  
Vinicius Petrucci ◽  
Abdoulaye Gamatié ◽  
Fernando Magno Quintão Pereira

A hardware configuration is a set of processors and their frequency levels in a multicore heterogeneous system. This article presents a compiler-based technique to match functions with hardware configurations. Such a technique consists of using multivariate linear regression to associate function arguments with particular hardware configurations. By showing that this classification space tends to be convex in practice, this article demonstrates that linear regression is not only an efficient tool to map computations to heterogeneous hardware, but also an effective one. To demonstrate the viability of multivariate linear regression as a way to perform adaptive compilation for heterogeneous architectures, we have implemented our ideas onto the Soot Java bytecode analyzer. Code that we produce can predict the best configuration for a large class of Java and Scala benchmarks running on an Odroid XU4 big.LITTLE board; hence, outperforming prior techniques such as ARM’s GTS and CHOAMP, a recently released static program scheduler.



Nano Research ◽  
2021 ◽  
Author(s):  
Chonghai Deng ◽  
Fan Ye ◽  
Tao Wang ◽  
Xiaohui Ling ◽  
Lulu Peng ◽  
...  


Author(s):  
Eleonora D'Arnese ◽  
Emanuele Del Sozzo ◽  
Davide Conficconi ◽  
Marco D. Santambrogio


2021 ◽  
Author(s):  
Najmeh Nazari Bavarsad ◽  
Hosein Mohammadi Makrani ◽  
Hossein Sayadi ◽  
Lawrence Landis ◽  
Setareh Rafatirad ◽  
...  


2021 ◽  
Author(s):  
Amanda Bienz ◽  
Luke N. Olson ◽  
William D. Gropp ◽  
Shelby Lockhart


Author(s):  
Tongsheng Geng ◽  
Marcos Amaris ◽  
Stéphane Zuckerman ◽  
Alfredo Goldman ◽  
Guang R. Gao ◽  
...  


2021 ◽  
pp. 131716
Author(s):  
Huijun Li ◽  
Linhan Jian ◽  
Yan Chen ◽  
Guowen Wang ◽  
Jiahui Lyu ◽  
...  


Author(s):  
T. Nouioua ◽  
A. H. Belbachir

Medical imaging has found an important way for routine daily practice using cone-beam computed tomography to reconstruct a 3D volume image using the Feldkamp-Davis-Kress (FDK) algorithm. This way can minimize the patient’s time exposure to X-rays. However, its implementation is very costly in computation time, which constitutes a handicap problem in practice. For this reason, the use of acceleration methods on GPU becomes a real solution. For the acceleration of the FDK algorithm, we have used the GPU on heterogeneous platforms. To take full advantage of the GPU, we have chosen useful features of the GPUs and, we have launched the acceleration of the reconstruction according to some technical criteria, namely the work-groups and the work-items. We have found that the number of parallel cores, as well as the memory bandwidth, have no effect on runtimes speedup without being rough in the choice of the number of work-items, which represents a real challenge to master in order to be able to divide them efficiently into work-groups according to the device specifications considered as principal difficulties if we do not study technically the GPU as a hardware device. After an optimized implementation using kernels launched optimally on GPU, we have deduced that the high capacities of the devices must be chosen with a rough optimization of the work-items which are divided into several work-groups according to the hardware limitations.



Mathematics ◽  
2021 ◽  
Vol 9 (14) ◽  
pp. 1685
Author(s):  
Julian Miller ◽  
Lukas Trümper ◽  
Christian Terboven ◽  
Matthias S. Müller

With the quickly evolving hardware landscape of high-performance computing (HPC) and its increasing specialization, the implementation of efficient software applications becomes more challenging. This is especially prevalent for domain scientists and may hinder the advances in large-scale simulation software. One idea to overcome these challenges is through software abstraction. We present a parallel algorithm model that allows for global optimization of their synchronization and dataflow and optimal mapping to complex and heterogeneous architectures. The presented model strictly separates the structure of an algorithm from its executed functions. It utilizes a hierarchical decomposition of parallel design patterns as well-established building blocks for algorithmic structures and captures them in an abstract pattern tree (APT). A data-centric flow graph is constructed based on the APT, which acts as an intermediate representation for rich and automated structural transformations. We demonstrate the applicability of this model to three representative algorithms and show runtime speedups between 1.83 and 2.45 on a typical heterogeneous CPU/GPU architecture.



Sign in / Sign up

Export Citation Format

Share Document