The physical structure of concurrent problems and concurrent computers

We introduce a physical analogy to describe problems and high-performance concurrent computers on which they are run. We show that the spatial characteristics of problems lead to their parallelism and review the lessons from use of the early hypercubes and a natural particle-process analogy. We generalize this picture to include the temporal structure of problems and show how this allows us to unify distributed, shared and hierarchical memories as well as SIMD (single instruction multiple data) architectures. We also show how neural network methods can be used to analyse a general formalism based on interacting strings and these lead to possible real-time schedulers and decomposers for massively parallel machines.

Download Full-text

The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

Scientific Programming ◽

10.1155/1995/278064 ◽

1995 ◽

Vol 4 (1) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

Matthew O'keefe ◽

Terence Parr ◽

B. Kevin Edgar ◽

Steve Anderson ◽

Paul Woodward ◽

...

Keyword(s):

High Performance ◽

Parallel Machines ◽

Parallel Processors ◽

Massively Parallel ◽

Automatic Translation ◽

Efficient Code ◽

Self Similar ◽

User Friendly ◽

Application Codes ◽

Fortran 77

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

Download Full-text

A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

International Journal of Antennas and Propagation ◽

10.1155/2012/851465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Dau-Chyrh Chang ◽

Lihong Zhang ◽

Xiaoling Yang ◽

Shao-Hsiang Yen ◽

Wenhua Yu

Keyword(s):

High Performance ◽

Fdtd Method ◽

Hardware Acceleration ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Computer Cluster ◽

Simulation Performance ◽

Acceleration Technique ◽

Multiple Data ◽

Difference Time

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.

Download Full-text

Design and Performance Analysis of a Massively Parallel Atmospheric General Circulation Model

Scientific Programming ◽

10.1155/2000/371012 ◽

2000 ◽

Vol 8 (1) ◽

pp. 49-57 ◽

Cited By ~ 3

Author(s):

Daniel S. Schaffer ◽

Max J. Suárez

Keyword(s):

General Circulation Model ◽

General Circulation ◽

High Performance ◽

Degrees Of Freedom ◽

Parallel Machines ◽

Atmospheric General Circulation Model ◽

Circulation Model ◽

Massively Parallel ◽

Atmospheric General Circulation ◽

And Performance

In the 1990's, computer manufacturers are increasingly turning to the development of parallel processor machines to meet the high performance needs of their customers. Simultaneously, atmospheric scientists studying weather and climate phenomena ranging from hurricanes to El Niño to global warming require increasingly fine resolution models. Here, implementation of a parallel atmospheric general circulation model (GCM) which exploits the power of massively parallel machines is described. Using the horizontal data domain decomposition methodology, this FORTRAN 90 model is able to integrate a 0.6° longitude by 0.5° latitude problem at a rate of 19 Gigaflops on 512 processors of a Cray T3E 600; corresponding to 280 seconds of wall-clock time per simulated model day. At this resolution, the model has 64 times as many degrees of freedom and performs 400 times as many floating point operations per simulated day as the model it replaces.

Download Full-text

A high-performance and low-power 32-bit multiply-accumulate unit with single-instruction-multiple-data (SIMD) feature

IEEE Journal of Solid-State Circuits ◽

10.1109/jssc.2002.1015692 ◽

2002 ◽

Vol 37 (7) ◽

pp. 926-931 ◽

Cited By ~ 27

Author(s):

Yuyun Liao ◽

D.B. Roberts

Keyword(s):

Low Power ◽

High Performance ◽

Single Instruction Multiple Data ◽

Multiple Data

Download Full-text

An Algebraic Machinery for Optimizing Data Motion for HPF

Scientific Programming ◽

10.1155/1997/790426 ◽

1997 ◽

Vol 6 (3) ◽

pp. 297-325

Author(s):

Jan-Jan Wu ◽

Marina C. Chen

Keyword(s):

High Performance ◽

Parallel Machines ◽

Compiler Optimization ◽

Optimization Technique ◽

Massively Parallel ◽

High Performance Fortran ◽

Fortran 90

This paper describes a general compiler optimization technique that reduces communica tion over-head for FORTRAN-90 (and High Performance FORTRAN) implementations on massively parallel machines.

Download Full-text

A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Software Practice and Experience ◽

10.1002/spe.1102 ◽

2011 ◽

Vol 42 (6) ◽

pp. 753-777 ◽

Cited By ~ 7

Author(s):

Hiroshi Inoue ◽

Takao Moriyama ◽

Hideaki Komatsu ◽

Toshio Nakatani

Keyword(s):

High Performance ◽

Single Instruction Multiple Data ◽

Sorting Algorithm ◽

Multiple Data

Download Full-text

A high performance FFT library with single instruction multiple data (SIMD) architecture

2011 International Conference on Electronics, Communications and Control (ICECC) ◽

10.1109/icecc.2011.6066463 ◽

2011 ◽

Cited By ~ 5

Author(s):

Wang Xu ◽

Zhang Yan ◽

Ding Shunying

Keyword(s):

High Performance ◽

Single Instruction Multiple Data ◽

Multiple Data ◽

Simd Architecture

Download Full-text

High Performance Fortran: A Practical Analysis

Scientific Programming ◽

10.1155/1994/150306 ◽

1994 ◽

Vol 3 (3) ◽

pp. 187-199 ◽

Cited By ~ 7

Author(s):

Allan Knies ◽

Matthew O'keefe ◽

Tom Macdonald

Keyword(s):

High Performance ◽

Parallel Machines ◽

Production Quality ◽

Efficient Production ◽

Data Parallel ◽

High Performance Fortran ◽

Multiple Data ◽

Computing Industry ◽

Application Developers ◽

Important Design

The recently released high performance Fortran forum (HPFF) proposal has stirred much interest in the high performance computing industry. HPFF's most important design goal is to create a language that has source code portability and that achieves high performance on single instruction multiple data (SIMD), distributed-memory multiple instruction multiple data (MIMD), and shared-memory MIMD architectures. The HPFF proposal brings to the forefront many questions about design of portable and efficient languages for parallel machines. In this article, we discuss issues that need to be addressed before an efficient production quality compiler will be available for any such language. We examine some specific issues that are related to HPF's model of computation and analyze several implementation issues. We also provide some results from another data parallel compiler to help gain insight on some of the implementation issues that are relevant to HPF. Finally, we provide a summary of options currently available for application developers in industry.

Download Full-text

REVERSIBLE SYSTOLIC ARRAYS: m-ARY BIJECTIVE SINGLE-INSTRUCTION MULTIPLE-DATA (SIMD) ARCHITECTURES AND THEIR QUANTUM CIRCUITS

Journal of Circuits System and Computers ◽

10.1142/s0218126608004472 ◽

2008 ◽

Vol 17 (04) ◽

pp. 729-771 ◽

Cited By ~ 4

Author(s):

ANAS N. AL-RABADI

Keyword(s):

High Performance ◽

Cost Effective ◽

Classical Case ◽

Single Instruction Multiple Data ◽

Systolic Arrays ◽

Quantum Superposition ◽

Multiple Data ◽

Wide Range ◽

New Type ◽

Future Technologies

New type of m-ary systolic arrays called reversible systolic arrays is introduced in this paper. The m-ary quantum systolic architectures' realizations and computations of the new type of systolic arrays are also introduced. A systolic array is an example of a single-instruction multiple-data (SIMD) machine in which each processing element (PE) performs a single simple operation. Systolic devices provide inexpensive but massive computation power, and are cost-effective, high-performance, and special-purpose systems that have wide range of applications such as in solving several regular and compute-bound problems containing repetitive multiple operations on large arrays of data. Similar to the classical case, information in a reversible and quantum systolic circuit flows between cells in a pipelined fashion, and communication with the outside world occurs only at the boundary cells. Since basic PEs used in the construction of arithmetic systolic arrays are the add–multiply cells, the results introduced in this paper are general and apply to a very wide range of add–multiply-based systolic arrays. Since the reduction of power consumption is a major requirement for the circuit design in future technologies, such as in quantum computing, the main features of several future technologies will include reversibility. Consequently, the new systolic circuits can play an important task in the design of future circuits that consume minimal power. It is also shown that the new systolic arrays maintain the high level of regularity while exhibiting the new fundamental bijectivity (reversibility) and quantum superposition properties. These new properties will be essential in performing super-fast arithmetic-intensive computations that are fundamental in several future applications such as in multi-dimensional quantum signal processing (QSP).

Download Full-text