scholarly journals PAEAN: Portable and scalable runtime support for parallel Haskell dialects

Author(s):  
JOST BERTHOLD ◽  
HANS-WOLFGANG LOIDL ◽  
KEVIN HAMMOND

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Acta Numerica ◽  
2012 ◽  
Vol 21 ◽  
pp. 379-474 ◽  
Author(s):  
J. J. Dongarra ◽  
A. J. van der Steen

This article describes the current state of the art of high-performance computing systems, and attempts to shed light on near-future developments that might prolong the steady growth in speed of such systems, which has been one of their most remarkable characteristics. We review the different ways devised to speed them up, both with regard to components and their architecture. In addition, we discuss the requirements for software that can take advantage of existing and future architectures.


Symmetry ◽  
2020 ◽  
Vol 12 (6) ◽  
pp. 1029
Author(s):  
Anabi Hilary Kelechi ◽  
Mohammed H. Alsharif ◽  
Okpe Jonah Bameyi ◽  
Paul Joan Ezra ◽  
Iorshase Kator Joseph ◽  
...  

Power-consuming entities such as high performance computing (HPC) sites and large data centers are growing with the advance in information technology. In business, HPC is used to enhance the product delivery time, reduce the production cost, and decrease the time it takes to develop a new product. Today’s high level of computing power from supercomputers comes at the expense of consuming large amounts of electric power. It is necessary to consider reducing the energy required by the computing systems and the resources needed to operate these computing systems to minimize the energy utilized by HPC entities. The database could improve system energy efficiency by sampling all the components’ power consumption at regular intervals and the information contained in a database. The information stored in the database will serve as input data for energy-efficiency optimization. More so, device workload information and different usage metrics are stored in the database. There has been strong momentum in the area of artificial intelligence (AI) as a tool for optimizing and processing automation by leveraging on already existing information. This paper discusses ideas for improving energy efficiency for HPC using AI.


Author(s):  
Al Geist ◽  
Daniel A Reed

Commodity clusters revolutionized high-performance computing when they first appeared two decades ago. As scale and complexity have grown, new challenges in reliability and systemic resilience, energy efficiency and optimization and software complexity have emerged that suggest the need for re-evaluation of current approaches. This paper reviews the state of the art and reflects on some of the challenges likely to be faced when building trans-petascale computing systems, using insights and perspectives drawn from operational experience and community debates.


Author(s):  
Yaser Jararweh ◽  
Moath Jarrah ◽  
Abdelkader Bousselham

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.


2016 ◽  
pp. 2373-2384
Author(s):  
Yaser Jararweh ◽  
Moath Jarrah ◽  
Abdelkader Bousselham

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.


Author(s):  
Е.Н. Головченко ◽  
М.В. Якобовский

Задача рациональной декомпозиции расчетных сеток возникает при численном моделировании на высокопроизводительных вычислительных системах проблем механики сплошных сред, импульсной энергетики, электродинамики и др. Число процессоров, на котором будет считаться вычислительная задача, как правило, заранее не известно. В этой связи имеет смысл предварительно однократно разбить сетку на большое число микродоменов, а затем формировать из них домены. Методы разбиения графов параллельных пакетов ParMETIS, Jostle, PT-Scotch и Zoltan основываются на иерархических алгоритмах, недостатком которых является образование несвязных доменов. Другим недостатком указанных пакетов является получение сильно несбалансированных разбиений. Разработан пакет программ GridSpiderPar для параллельной декомпозиции больших сеток. Проведены вычислительные эксперименты по сравнению различных разбиений на микродомены, разбиений графов микродоменов на домены, а также разбиений сразу на домены нескольких сеток ($10^8$ вершин, $10^9$ элементов), полученных методами созданного комплекса программ GridSpiderPar и пакетов ParMETIS, Zoltan и PT-Scotch. Качество разбиений проверялось по дисбалансу числа вершин в доменах, числу несвязных доменов и числу разрезанных ребер, а также по эффективности параллельного счета задач газовой динамики при распределении сеток по ядрам в соответствии с различными разбиениями. Полученные результаты выявили преимущества разработанных алгоритмов. The problem of load balancing arises in parallel mesh-based numerical solution of problems of continuum mechanics, energetics, electrodynamics etc. on high-performance computing systems. The number of processors to run a computational problem is often unknown. It makes sense, therefore, to partition a mesh into a great number of microdomains which then are used to create subdomains. Graph partitioning methods implemented in state-of-the-art parallel partitioning tools ParMETIS, Jostle, PT-Scotch and Zoltan are based on multilevel algorithms. That approach has a shortcoming of forming unconnected subdomains. Another shortcoming of present graph partitioning methods is generation of strongly imbalanced partitions. The program package for parallel large mesh decomposition GridSpiderPar was developed. We compared different partitions into microdomains, microdomain graph partitions and partitions into subdomains of several meshes (10^8 vertices, 10^9 elements) obtained by means of the partitioning tool GridSpiderPar and the packages ParMETIS, Zoltan and PT-Scotch. Balance of the partitions, edge-cut and number of unconnected subdomains in different partitions were compared as well as the computational performance of gas-dynamic problem simulations run on different partitions. The obtained results demonstrate advantages of the proposed algorithms.


2020 ◽  
Vol 2020 ◽  
pp. 1-19 ◽  
Author(s):  
Paweł Czarnul ◽  
Jerzy Proficz ◽  
Krzysztof Drypczewski

This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals, ease of programming, debugging, deployment, portability, level of parallelism, constructs enabling parallelism and synchronization, features introduced in recent versions indicating trends, support for hybridity in parallel execution, and disadvantages. Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. It can help to shape future versions of programming standards, select technologies best matching programmers’ needs, and avoid potential difficulties while using high-performance computing systems.


Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


Sign in / Sign up

Export Citation Format

Share Document