The physical structure of concurrent problems and concurrent computers

We introduce a physical analogy to describe problems and high-performance concurrent computers on which they are run. We show that the spatial characteristics of problems lead to their parallelism and review the lessons from use of the early hypercubes and a natural particle-process analogy. We generalize this picture to include the temporal structure of problems and show how this allows us to unify distributed, shared and hierarchical memories as well as SIMD (single instruction multiple data) architectures. We also show how neural network methods can be used to analyse a general formalism based on interacting strings and these lead to possible real-time schedulers and decomposers for massively parallel machines.

1995 ◽  
Vol 4 (1) ◽  
pp. 1-21 ◽  
Author(s):  
Matthew O'keefe ◽  
Terence Parr ◽  
B. Kevin Edgar ◽  
Steve Anderson ◽  
Paul Woodward ◽  
...  

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.


2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Dau-Chyrh Chang ◽  
Lihong Zhang ◽  
Xiaoling Yang ◽  
Shao-Hsiang Yen ◽  
Wenhua Yu

We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD) method using the SSE (streaming (single instruction multiple data) SIMD extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit) VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.


2000 ◽  
Vol 8 (1) ◽  
pp. 49-57 ◽  
Author(s):  
Daniel S. Schaffer ◽  
Max J. Suárez

In the 1990's, computer manufacturers are increasingly turning to the development of parallel processor machines to meet the high performance needs of their customers. Simultaneously, atmospheric scientists studying weather and climate phenomena ranging from hurricanes to El Niño to global warming require increasingly fine resolution models. Here, implementation of a parallel atmospheric general circulation model (GCM) which exploits the power of massively parallel machines is described. Using the horizontal data domain decomposition methodology, this FORTRAN 90 model is able to integrate a 0.6° longitude by 0.5° latitude problem at a rate of 19 Gigaflops on 512 processors of a Cray T3E 600; corresponding to 280 seconds of wall-clock time per simulated model day. At this resolution, the model has 64 times as many degrees of freedom and performs 400 times as many floating point operations per simulated day as the model it replaces.


1997 ◽  
Vol 6 (3) ◽  
pp. 297-325
Author(s):  
Jan-Jan Wu ◽  
Marina C. Chen

This paper describes a general compiler optimization technique that reduces communica tion over-head for FORTRAN-90 (and High Performance FORTRAN) implementations on massively parallel machines.


2011 ◽  
Vol 42 (6) ◽  
pp. 753-777 ◽  
Author(s):  
Hiroshi Inoue ◽  
Takao Moriyama ◽  
Hideaki Komatsu ◽  
Toshio Nakatani

1994 ◽  
Vol 3 (3) ◽  
pp. 187-199 ◽  
Author(s):  
Allan Knies ◽  
Matthew O'keefe ◽  
Tom Macdonald

The recently released high performance Fortran forum (HPFF) proposal has stirred much interest in the high performance computing industry. HPFF's most important design goal is to create a language that has source code portability and that achieves high performance on single instruction multiple data (SIMD), distributed-memory multiple instruction multiple data (MIMD), and shared-memory MIMD architectures. The HPFF proposal brings to the forefront many questions about design of portable and efficient languages for parallel machines. In this article, we discuss issues that need to be addressed before an efficient production quality compiler will be available for any such language. We examine some specific issues that are related to HPF's model of computation and analyze several implementation issues. We also provide some results from another data parallel compiler to help gain insight on some of the implementation issues that are relevant to HPF. Finally, we provide a summary of options currently available for application developers in industry.


2008 ◽  
Vol 17 (04) ◽  
pp. 729-771 ◽  
Author(s):  
ANAS N. AL-RABADI

New type of m-ary systolic arrays called reversible systolic arrays is introduced in this paper. The m-ary quantum systolic architectures' realizations and computations of the new type of systolic arrays are also introduced. A systolic array is an example of a single-instruction multiple-data (SIMD) machine in which each processing element (PE) performs a single simple operation. Systolic devices provide inexpensive but massive computation power, and are cost-effective, high-performance, and special-purpose systems that have wide range of applications such as in solving several regular and compute-bound problems containing repetitive multiple operations on large arrays of data. Similar to the classical case, information in a reversible and quantum systolic circuit flows between cells in a pipelined fashion, and communication with the outside world occurs only at the boundary cells. Since basic PEs used in the construction of arithmetic systolic arrays are the add–multiply cells, the results introduced in this paper are general and apply to a very wide range of add–multiply-based systolic arrays. Since the reduction of power consumption is a major requirement for the circuit design in future technologies, such as in quantum computing, the main features of several future technologies will include reversibility. Consequently, the new systolic circuits can play an important task in the design of future circuits that consume minimal power. It is also shown that the new systolic arrays maintain the high level of regularity while exhibiting the new fundamental bijectivity (reversibility) and quantum superposition properties. These new properties will be essential in performing super-fast arithmetic-intensive computations that are fundamental in several future applications such as in multi-dimensional quantum signal processing (QSP).


Sign in / Sign up

Export Citation Format

Share Document