PAEAN: Portable and scalable runtime support for parallel Haskell dialects

AbstractOver time, several competing approaches to parallel Haskell programming have emerged. Different approaches support parallelism at various different scales, ranging from small multicores to massively parallel high-performance computing systems. They also provide varying degrees of control, ranging from completely implicit approaches to ones providing full programmer control. Most current designs assume a shared memory model at the programmer, implementation and hardware levels. This is, however, becoming increasingly divorced from the reality at the hardware level. It also imposes significant unwanted runtime overheads in the form of garbage collection synchronisation etc. What is needed is an easy way to abstract over the implementation and hardware levels, while presenting a simple parallelism model to the programmer. The PArallEl shAred Nothing runtime system design aims to provide a portable and high-level shared-nothing implementation platform for parallel Haskell dialects. It abstracts over major issues such as work distribution and data serialisation, consolidating existing, successful designs into a single framework. It also provides an optional virtual shared-memory programming abstraction for (possibly) shared-nothing parallel machines, such as modern multicore/manycore architectures or cluster/cloud computing systems. It builds on, unifies and extends, existing well-developed support for shared-memory parallelism that is provided by the widely used GHC Haskell compiler. This paper summarises the state-of-the-art in shared-nothing parallel Haskell implementations, introduces the PArallEl shAred Nothing abstractions, shows how they can be used to implement three distinct parallel Haskell dialects, and demonstrates that good scalability can be obtained on recent parallel machines.

Download Full-text

High-performance computing systems: Status and outlook

Acta Numerica ◽

10.1017/s0962492912000050 ◽

2012 ◽

Vol 21 ◽

pp. 379-474 ◽

Cited By ~ 36

Author(s):

J. J. Dongarra ◽

A. J. van der Steen

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Computing Systems ◽

Future Developments ◽

Steady Growth ◽

Current State ◽

Near Future ◽

Performance Computing ◽

Shed Light

This article describes the current state of the art of high-performance computing systems, and attempts to shed light on near-future developments that might prolong the steady growth in speed of such systems, which has been one of their most remarkable characteristics. We review the different ways devised to speed them up, both with regard to components and their architecture. In addition, we discuss the requirements for software that can take advantage of existing and future architectures.

Download Full-text

Artificial Intelligence: An Energy Efficiency Tool for Enhanced High performance computing

Symmetry ◽

10.3390/sym12061029 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1029

Author(s):

Anabi Hilary Kelechi ◽

Mohammed H. Alsharif ◽

Okpe Jonah Bameyi ◽

Paul Joan Ezra ◽

Iorshase Kator Joseph ◽

...

Keyword(s):

Artificial Intelligence ◽

Energy Efficiency ◽

High Performance Computing ◽

High Performance ◽

Large Data ◽

Computing Systems ◽

Computing Power ◽

Product Delivery ◽

High Level ◽

Performance Computing

Power-consuming entities such as high performance computing (HPC) sites and large data centers are growing with the advance in information technology. In business, HPC is used to enhance the product delivery time, reduce the production cost, and decrease the time it takes to develop a new product. Today’s high level of computing power from supercomputers comes at the expense of consuming large amounts of electric power. It is necessary to consider reducing the energy required by the computing systems and the resources needed to operate these computing systems to minimize the energy utilized by HPC entities. The database could improve system energy efficiency by sampling all the components’ power consumption at regular intervals and the information contained in a database. The information stored in the database will serve as input data for energy-efficiency optimization. More so, device workload information and different usage metrics are stored in the database. There has been strong momentum in the area of artificial intelligence (AI) as a tool for optimizing and processing automation by leveraging on already existing information. This paper discusses ideas for improving energy efficiency for HPC using AI.

Download Full-text

A survey of high-performance computing scaling challenges

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015597083 ◽

2016 ◽

Vol 31 (1) ◽

pp. 104-113 ◽

Cited By ~ 26

Author(s):

Al Geist ◽

Daniel A Reed

Keyword(s):

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

Operational Experience ◽

Computing Systems ◽

Software Complexity ◽

Petascale Computing ◽

New Challenges ◽

Commodity Clusters ◽

Performance Computing

Commodity clusters revolutionized high-performance computing when they first appeared two decades ago. As scale and complexity have grown, new challenges in reliability and systemic resilience, energy efficiency and optimization and software complexity have emerged that suggest the need for re-evaluation of current approaches. This paper reviews the state of the art and reflects on some of the challenges likely to be faced when building trans-petascale computing systems, using insights and perspectives drawn from operational experience and community debates.

Download Full-text

GPU Scaling

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2014100102 ◽

2014 ◽

Vol 9 (4) ◽

pp. 13-23

Author(s):

Yaser Jararweh ◽

Moath Jarrah ◽

Abdelkader Bousselham

Keyword(s):

High Performance Computing ◽

High Performance ◽

Gpu Computing ◽

State Of The Art ◽

Computing Systems ◽

Current State ◽

Viable Solution ◽

Order Of Magnitude ◽

Computational Resources ◽

Performance Computing

Current state-of-the-art GPU-based systems offer unprecedented performance advantages through accelerating the most compute-intensive portions of applications by an order of magnitude. GPU computing presents a viable solution for the ever-increasing complexities in applications and the growing demands for immense computational resources. In this paper the authors investigate different platforms of GPU-based systems, starting from the Personal Supercomputing (PSC) to cloud-based GPU systems. The authors explore and evaluate the GPU-based platforms and the authors present a comparison discussion against the conventional high performance cluster-based computing systems. The authors' evaluation shows potential advantages of using GPU-based systems for high performance computing applications while meeting different scaling granularities.

Download Full-text

GPU Scaling

Web-Based Services ◽

10.4018/978-1-4666-9466-8.ch105 ◽

2016 ◽

pp. 2373-2384

Author(s):

Yaser Jararweh ◽

Moath Jarrah ◽

Abdelkader Bousselham

Keyword(s):

High Performance Computing ◽

High Performance ◽

Gpu Computing ◽

State Of The Art ◽

Computing Systems ◽

Current State ◽

Viable Solution ◽

Order Of Magnitude ◽

Computational Resources ◽

Performance Computing

Download Full-text

Parallel partitioning tool GridSpiderPar for large mesh decomposition

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r448 ◽

2015 ◽

pp. 507-517

Author(s):

Е.Н. Головченко ◽

М.В. Якобовский

Keyword(s):

Graph Partitioning ◽

High Performance ◽

State Of The Art ◽

Computing Systems ◽

Mesh Decomposition ◽

Graph Partitions ◽

Computational Performance ◽

Gas Dynamic ◽

Partitioning Methods ◽

Performance Computing

Задача рациональной декомпозиции расчетных сеток возникает при численном моделировании на высокопроизводительных вычислительных системах проблем механики сплошных сред, импульсной энергетики, электродинамики и др. Число процессоров, на котором будет считаться вычислительная задача, как правило, заранее не известно. В этой связи имеет смысл предварительно однократно разбить сетку на большое число микродоменов, а затем формировать из них домены. Методы разбиения графов параллельных пакетов ParMETIS, Jostle, PT-Scotch и Zoltan основываются на иерархических алгоритмах, недостатком которых является образование несвязных доменов. Другим недостатком указанных пакетов является получение сильно несбалансированных разбиений. Разработан пакет программ GridSpiderPar для параллельной декомпозиции больших сеток. Проведены вычислительные эксперименты по сравнению различных разбиений на микродомены, разбиений графов микродоменов на домены, а также разбиений сразу на домены нескольких сеток ($10^8$ вершин, $10^9$ элементов), полученных методами созданного комплекса программ GridSpiderPar и пакетов ParMETIS, Zoltan и PT-Scotch. Качество разбиений проверялось по дисбалансу числа вершин в доменах, числу несвязных доменов и числу разрезанных ребер, а также по эффективности параллельного счета задач газовой динамики при распределении сеток по ядрам в соответствии с различными разбиениями. Полученные результаты выявили преимущества разработанных алгоритмов. The problem of load balancing arises in parallel mesh-based numerical solution of problems of continuum mechanics, energetics, electrodynamics etc. on high-performance computing systems. The number of processors to run a computational problem is often unknown. It makes sense, therefore, to partition a mesh into a great number of microdomains which then are used to create subdomains. Graph partitioning methods implemented in state-of-the-art parallel partitioning tools ParMETIS, Jostle, PT-Scotch and Zoltan are based on multilevel algorithms. That approach has a shortcoming of forming unconnected subdomains. Another shortcoming of present graph partitioning methods is generation of strongly imbalanced partitions. The program package for parallel large mesh decomposition GridSpiderPar was developed. We compared different partitions into microdomains, microdomain graph partitions and partitions into subdomains of several meshes (10^8 vertices, 10^9 elements) obtained by means of the partitioning tool GridSpiderPar and the packages ParMETIS, Zoltan and PT-Scotch. Balance of the partitions, edge-cut and number of unconnected subdomains in different partitions were compared as well as the computational performance of gas-dynamic problem simulations run on different partitions. The obtained results demonstrate advantages of the proposed algorithms.

Download Full-text

Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems

Scientific Programming ◽

10.1155/2020/4176794 ◽

2020 ◽

Vol 2020 ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Paweł Czarnul ◽

Jerzy Proficz ◽

Krzysztof Drypczewski

Keyword(s):

High Performance Computing ◽

Parallel Programming ◽

High Performance ◽

Programming Model ◽

Parallel Execution ◽

Target System ◽

Computing Systems ◽

Programming Abstraction ◽

Near Future ◽

Performance Computing

This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type (shared memory, distributed, and hybrid), communication patterns (one-sided and two-sided), and programming abstraction level. We analyze representatives in terms of many aspects including programming model, languages, supported platforms, license, optimization goals, ease of programming, debugging, deployment, portability, level of parallelism, constructs enabling parallelism and synchronization, features introduced in recent versions indicating trends, support for hybridity in parallel execution, and disadvantages. Such detailed analysis has led us to the identification of trends in high-performance computing and of the challenges to be addressed in the near future. It can help to shape future versions of programming standards, select technologies best matching programmers’ needs, and avoid potential difficulties while using high-performance computing systems.

Download Full-text

Evaluation of Static Mapping for Dynamic Space-Shared Multi-task Processing on FPGAs

Journal of Signal Processing Systems ◽

10.1007/s11265-020-01633-z ◽

2021 ◽

Author(s):

Umar Ibrahim Minhas ◽

Roger Woods ◽

Georgios Karakonstantis

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Design Space ◽

System Throughput ◽

Design Parameters ◽

Temporal Constraints ◽

Shared Resources ◽

Task Processing ◽

High Level ◽

Performance Computing

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.

Download Full-text