scholarly journals Increasing the Efficiency of the DaCS Programming Model for Heterogeneous Systems

Author(s):  
Maciej Cytowski ◽  
Marek Niezgódka
2009 ◽  
Vol 17 (1-2) ◽  
pp. 59-76 ◽  
Author(s):  
Alejandro Rico ◽  
Alex Ramirez ◽  
Mateo Valero

There is a clear industrial trend towards chip multiprocessors (CMP) as the most power efficient way of further increasing performance. Heterogeneous CMP architectures take one more step along this power efficiency trend by using multiple types of processors, tailored to the workloads they will execute. Programming these CMP architectures has been identified as one of the main challenges in the near future, and programming heterogeneous systems is even more challenging. High-level programming models which allow the programmer to identify parallel tasks, and the runtime management of the inter-task dependencies, have been identified as a suitable model for programming such heterogeneous CMP architectures. In this paper we analyze the performance of Cell Superscalar, a task-based programming model for the Cell Broadband Engine Architecture, in terms of its scalability to higher number of on-chip processors. Our results show that the low performance of the PPE component limits the scalability of some applications to less than 16 processors. Since the PPE has been identified as the limiting element, we perform a set of simulation studies evaluating the impact of out-of-order execution, branch prediction and larger caches on the task management overhead. We conclude that out-of-order execution is a very desirable feature, since it increases task management performance by 50%. We also identify memory latency as a fundamental aspect in performance, while the working set is not that large. We expect a significant performance impact if task management would run using a fast private memory to store the task dependency graph instead of relying on the cache hierarchy.


2015 ◽  
Vol 2015 ◽  
pp. 1-15 ◽  
Author(s):  
Rengan Xu ◽  
Xiaonan Tian ◽  
Sunita Chandrasekaran ◽  
Barbara Chapman

Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems requires more effort than what is required for traditional multicore systems. Directive-based programming approaches are being widely adopted since they make it easy to use/port/maintain application code. OpenMP and OpenACC are two popular models used to port applications to accelerators. However, neither of the models provides support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model; however, building this model has its own limitations due to lack of necessary compilers’ support. Moreover, the model also lacks support for direct device-to-device communication. To overcome these limitations, an alternate strategy is to extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We critically analyze the applicability of the hybrid model approach and evaluate the proposed strategy using several case studies and demonstrate their effectiveness.


2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
Ghislain Roquier ◽  
Endri Bezati ◽  
Marco Mattavelli

The new generation of multicore processors and reconfigurable hardware platforms provides a dramatic increase of the available parallelism and processing capabilities. However, one obstacle for exploiting all the promises of such platforms is deeply rooted in sequential thinking. The sequential programming model does not naturally expose potential parallelism that effectively permits to build parallel applications that can be efficiently mapped on different kind of platforms. A shift of paradigm is necessary at all levels of application development to yield portable and scalable implementations on the widest range of heterogeneous platforms. This paper presents a design flow for the hardware and software synthesis of heterogeneous systems allowing to automatically generate hardware and software components as well as appropriate interfaces, from a unique high-level description of the application, based on the dataflow paradigm, running onto heterogeneous architectures composed by reconfigurable hardware units and multicore processors. Experimental results based on the implementation of several video coding algorithms onto heterogeneous platforms are also provided to show the effectiveness of the approach both in terms of portability and scalability.


Author(s):  
Vinoth Krishnan Elangovan ◽  
Rosa. M. Badia ◽  
Eduard Ayguade Parra

2021 ◽  
Vol 18 (4) ◽  
pp. 1-25
Author(s):  
Paul Metzger ◽  
Volker Seeker ◽  
Christian Fensch ◽  
Murray Cole

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.


Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2386
Author(s):  
Raúl Nozal ◽  
Jose Luis Bosque

Heterogeneous systems are the core architecture of most computing systems, from high-performance computing nodes to embedded devices, due to their excellent performance and energy efficiency. Efficiently programming these systems has become a major challenge due to the complexity of their architectures and the efforts required to provide them with co-execution capabilities that can fully exploit the applications. There are many proposals to simplify the programming and management of acceleration devices and multi-core CPUs. However, in many cases, portability and ease of use compromise the efficiency of different devices—even more so when co-executing. Intel oneAPI, a new and powerful standards-based unified programming model, built on top of SYCL, addresses these issues. In this paper, oneAPI is provided with co-execution strategies to run the same kernel between different devices, enabling the exploitation of static and dynamic policies. This work evaluates the performance and energy efficiency for a well-known set of regular and irregular HPC benchmarks, using two heterogeneous systems composed of an integrated GPU and CPU. Static and dynamic load balancers are integrated and evaluated, highlighting single and co-execution strategies and the most significant key points of this promising technology. Experimental results show that co-execution is worthwhile when using dynamic algorithms and improves the efficiency even further when using unified shared memory.


1998 ◽  
Vol 37 (04/05) ◽  
pp. 518-526 ◽  
Author(s):  
D. Sauquet ◽  
M.-C. Jaulent ◽  
E. Zapletal ◽  
M. Lavril ◽  
P. Degoulet

AbstractRapid development of community health information networks raises the issue of semantic interoperability between distributed and heterogeneous systems. Indeed, operational health information systems originate from heterogeneous teams of independent developers and have to cooperate in order to exchange data and services. A good cooperation is based on a good understanding of the messages exchanged between the systems. The main issue of semantic interoperability is to ensure that the exchange is not only possible but also meaningful. The main objective of this paper is to analyze semantic interoperability from a software engineering point of view. It describes the principles for the design of a semantic mediator (SM) in the framework of a distributed object manager (DOM). The mediator is itself a component that should allow the exchange of messages independently of languages and platforms. The functional architecture of such a SM is detailed. These principles have been partly applied in the context of the HEllOS object-oriented software engineering environment. The resulting service components are presented with their current state of achievement.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
S Mohd Baki ◽  
Jack Kie Cheng

Production planning is often challenging for small medium enterprises (SMEs) company. Most of the SMEs are having difficulty in determining the optimal level of the production output which can affect their business performance. Product mix optimization is one of the main key for production planning. Many company have used linear programming model in determining the optimal combination of various products that need to be produced in order to maximize profit. Thus, this study aims for profit maximization of a SME company in Malaysia by using linear programming model. The purposes of this study are to identify the current process in the production line and to formulate a linear programming model that would suggest a viable product mix to ensure optimum profitability for the company. ABC Sdn Bhd is selected as a case study company for product mix profit maximization study. Some conclusive observations have been drawn and recommendations have been suggested. This study will provide the company and other companies, particularly in Malaysia, an exposure of linear programming method in making decisions to determine the maximum profit for different product mix.


Sign in / Sign up

Export Citation Format

Share Document