PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

Zeki Bozkus; Larry Meadows; Steven Nakamoto; Vincent Schuster; Mark Young

doi:10.1155/1997/705102

PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

Scientific Programming ◽

10.1155/1997/705102 ◽

1997 ◽

Vol 6 (1) ◽

pp. 29-40 ◽

Cited By ~ 9

Author(s):

Zeki Bozkus ◽

Larry Meadows ◽

Steven Nakamoto ◽

Vincent Schuster ◽

Mark Young

Keyword(s):

High Performance ◽

Distributed Memory ◽

Parallel Machines ◽

High Efficiency ◽

Memory Systems ◽

Production Quality ◽

Distributed Memory Machines ◽

High Performance Fortran ◽

Application Developers ◽

Efficient Software

High Performance Fortran (HPF) is the first widely supported, efficient, and portable parallel programming language for shared and distributed memory systems. HPF is realized through a set of directive-based extensions to Fortran 90. It enables application developers and Fortran end-users to write compact, portable, and efficient software that will compile and execute on workstations, shared memory servers, clusters, traditional supercomputers, or massively parallel processors. This article describes a production-quality HPF compiler for a set of parallel machines. Compilation techniques such as data and computation distribution, communication generation, run-time support, and optimization issues are elaborated as the basis for an HPF compiler implementation on distributed memory machines. The performance of this compiler on benchmark programs demonstrates that high efficiency can be achieved executing HPF code on parallel architectures.

High Performance Fortran: A Practical Analysis

Scientific Programming ◽

10.1155/1994/150306 ◽

1994 ◽

Vol 3 (3) ◽

pp. 187-199 ◽

Cited By ~ 7

Author(s):

Allan Knies ◽

Matthew O'keefe ◽

Tom Macdonald

Keyword(s):

High Performance ◽

Parallel Machines ◽

Production Quality ◽

Efficient Production ◽

Data Parallel ◽

High Performance Fortran ◽

Multiple Data ◽

Computing Industry ◽

Application Developers ◽

Important Design

The recently released high performance Fortran forum (HPFF) proposal has stirred much interest in the high performance computing industry. HPFF's most important design goal is to create a language that has source code portability and that achieves high performance on single instruction multiple data (SIMD), distributed-memory multiple instruction multiple data (MIMD), and shared-memory MIMD architectures. The HPFF proposal brings to the forefront many questions about design of portable and efficient languages for parallel machines. In this article, we discuss issues that need to be addressed before an efficient production quality compiler will be available for any such language. We examine some specific issues that are related to HPF's model of computation and analyze several implementation issues. We also provide some results from another data parallel compiler to help gain insight on some of the implementation issues that are relevant to HPF. Finally, we provide a summary of options currently available for application developers in industry.

PROBLEMS WITH DATA PARALLELISM

Parallel Processing Letters ◽

10.1142/s0129626401000440 ◽

2001 ◽

Vol 11 (01) ◽

pp. 77-94 ◽

Cited By ~ 1

Author(s):

C. PHILLIPS ◽

R. PERROTT

Keyword(s):

High Performance ◽

Distributed Memory ◽

Decisive Factor ◽

Detailed Knowledge ◽

Data Parallelism ◽

Simple Test ◽

Evolution Of Language ◽

Distributed Memory Machines ◽

High Performance Fortran ◽

Gradual Evolution

The gradual evolution of language features and approaches used for the programming of distributed memory machines underwent substantial advances in the 1990s. One of the most promising and widely praised approaches was based on data parallelism and resulted in High Performance Fortran. This paper reports on an experiment using that approach based on a commercial distributed memory machine, available compilers and simple test programs. The results are disappointing and not encouraging. The variety of components involved and the lack of detailed knowledge available for the compilers compound the difficulties of obtaining results and doing comparisons. The results show great variation and question the premise that communication is the decisive factor in performance determination. The results are also a contribution towards the difficult tasks of predicating performance on a distributed memory computer.

ON MESSAGE PACKAGING IN TASK SCHEDULING FOR DISTRIBUTED MEMORY PARALLEL MACHINES

International Journal of Foundations of Computer Science ◽

10.1142/s0129054101000497 ◽

2001 ◽

Vol 12 (03) ◽

pp. 285-306 ◽

Cited By ~ 2

Author(s):

NORIYUKI FUJIMOTO ◽

TOMOKI BABA ◽

TAKASHI HASHIMOTO ◽

KENICHI HAGIHARA

Keyword(s):

Task Scheduling ◽

High Performance ◽

Distributed Memory ◽

Parallel Machines ◽

Scheduling Algorithm ◽

Parallel Programs ◽

Parallel Program ◽

Interprocessor Communication ◽

Task Scheduling Algorithm ◽

Software Overhead

In this paper, we report a performance gap betweeen a schedule with small makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason of the gap is the software overhead in the interprocessor communication. Therefore, speedup ratios of schedules on the model do not approximate well to those of parallel programs on the machines. The purpose of the paper is to get a task scheduling algorithm that generates a schedule with good approximation to the corresponding parallel program and with small makespan. For this purpose, we propose algorithm BCSH that generates only bulk synchronous schedules. In those schedules, no-communication phases and communication phases appear alternately. All interprocessor communications are done only in the latter phases, and thus the corresponding parallel programs can make better use of the message packaging technique easily. It reduces many software overheads of messages form a source processor to the same destination processor to almost one software overhead, and improves the performance of a parallel program significantly. Finally, we show some experimental results of performance gaps on BCSH, Kruatrachue's algorithm DSH, and Ahmad et al's algorithm ECPFD. The schedules by DSH and ECPFD are famous for their small makespans, but message packaging can not be effectively applied to the corresponding program. The results show that a bulk synchronous schedule with small makespan has advantages that the gap is small and the corresponding program is a high performance parallel one.

Extending OpenMP for NUMA Machines

Scientific Programming ◽

10.1155/2000/464182 ◽

2000 ◽

Vol 8 (3) ◽

pp. 163-181 ◽

Cited By ~ 16

Author(s):

John Bircsak ◽

Peter Craig ◽

RaeLyn Crowell ◽

Zarka Cvetanovic ◽

Jonathan Harris ◽

...

Keyword(s):

Shared Memory ◽

High Performance ◽

Distributed Memory ◽

Parallel Programs ◽

Compiler Optimizations ◽

High Performance Fortran ◽

Efficient Code ◽

Memory Architectures ◽

Shared Memory Architectures ◽

Fast Access

This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.

Performance Issues in High Performance Fortran Implementations of Sensor-Based Applications

Scientific Programming ◽

10.1155/1997/372831 ◽

1997 ◽

Vol 6 (1) ◽

pp. 59-72 ◽

Cited By ~ 1

Author(s):

David R. O'hallaron ◽

Jon Webb ◽

Jaspal Subhlok

Keyword(s):

High Performance ◽

Parallel Machines ◽

Radar Imaging ◽

Synthetic Aperture ◽

Application Domain ◽

Resonance Imaging ◽

High Performance Fortran ◽

Intel Paragon ◽

Independent Loops ◽

Tracking Radar

Applications that get their inputs from sensors are an important and often overlooked application domain for High Performance Fortran (HPF). Such sensor-based applications typically perform regular operations on dense arrays, and often have latency and through put requirements that can only be achieved with parallel machines. This article describes a study of sensor-based applications, including the fast Fourier transform, synthetic aperture radar imaging, narrowband tracking radar processing, multibaseline stereo imaging, and medical magnetic resonance imaging. The applications are written in a dialect of HPF developed at Carnegie Mellon, and are compiled by the Fx compiler for the Intel Paragon. The main results of the study are that (1) it is possible to realize good performance for realistic sensor-based applications written in HPF and (2) the performance of the applications is determined by the performance of three core operations: independent loops (i.e., loops with no dependences between iterations), reductions, and index permutations. The article discusses the implications for HPF implementations and introduces some simple tests that implementers and users can use to measure the efficiency of the loops, reductions, and index permutations generated by an HPF compiler.

Implementation and Performance of DSMPI

Scientific Programming ◽

10.1155/1997/452521 ◽

1997 ◽

Vol 6 (2) ◽

pp. 201-214 ◽

Cited By ~ 2

Author(s):

Luis M. Silva ◽

JoÃo Gabriel Silva ◽

Simon Chapple

Keyword(s):

Shared Memory ◽

Message Passing ◽

Distributed Memory ◽

Programming Model ◽

Distributed Shared Memory ◽

Memory Systems ◽

Distributed Memory Machines ◽

Coherence Protocols ◽

And Performance ◽

Performance Results

Distributed shared memory has been recognized as an alternative programming model to exploit the parallelism in distributed memory systems because it provides a higher level of abstraction than simple message passing. DSM combines the simple programming model of shared memory with the scalability of distributed memory machines. This article presents DSMPI, a parallel library that runs atop of MPI and provides a DSM abstraction. It provides an easy-to-use programming interface, is fully, portable, and supports heterogeneity. For the sake of flexibility, it supports different coherence protocols and models of consistency. We present some performance results taken in a network of workstations and in a Cray T3D which show that DSMPI can be competitive with MPI for some applications.

Efficient parallel programming on scalable shared memory systems with High Performance Fortran

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.649 ◽

2002 ◽

Vol 14 (8-9) ◽

pp. 789-803 ◽

Cited By ~ 3

Author(s):

Siegfried Benkner ◽

Thomas Brandes

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

High Performance ◽

Memory Systems ◽

High Performance Fortran

High Performance Polar Decomposition on Distributed Memory Systems

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_44 ◽

2016 ◽

pp. 605-616 ◽

Cited By ~ 2

Author(s):

Dalal Sukkari ◽

Hatem Ltaief ◽

David Keyes

Keyword(s):

High Performance ◽

Distributed Memory ◽

Polar Decomposition ◽

Memory Systems

APR's approach to High Performance Fortran for distributed memory multiprocessor systems

Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation ◽

10.1109/fmpc.1995.380465 ◽

2002 ◽

Author(s):

G. Wagenbreth

Keyword(s):

High Performance ◽

Distributed Memory ◽

Multiprocessor Systems ◽

High Performance Fortran

Compiling High Performance Fortran for distributed-memory architectures

Parallel Computing ◽

10.1016/s0167-8191(99)00074-5 ◽

1999 ◽

Vol 25 (13-14) ◽

pp. 1785-1825 ◽

Cited By ~ 9

Author(s):

Siegfried Benkner ◽

Hans Zima

Keyword(s):

High Performance ◽

Distributed Memory ◽

High Performance Fortran ◽

Memory Architectures