PERFORMANCE EVALUATION OF PRACTICAL PARALLEL COMPUTER MODEL LogPQ

The present super computer will be replaced by a massively parallel computer consisting of a large number of processing elements which satisfy the continuous increasing depend for computing power. Practical parallel computing model has been expected to develop efficient parallel algorithms on massively parallel computers. Thus, we have presented a practical parallel computation model LogPQ by taking account of communication queues into the LogP model. This paper addresses the performance of a parallel matrix multiplication algorithm using LogPQ and LogP models. The parallel algorithm is implemented on Cray T3E and the parallel performances are compared with on the old machine CM-5. This shows that the communication network of T3E has superior buffering behavior than CM-5, in which we don't need to prepare extra buffering on T3E. Although, a little effect remains for both of the send and receive bufferings. On the other hand, the effect of message size remains, which shows the necessity of the overhead and gap proportional to the message size.

Download Full-text

Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

Scientific Programming ◽

10.1155/2004/934718 ◽

2004 ◽

Vol 12 (3) ◽

pp. 169-183 ◽

Cited By ~ 6

Author(s):

Alexandros V. Gerbessiotis ◽

Seung-Yeop Lee

Keyword(s):

Message Passing ◽

Matrix Multiplication ◽

Memory Access ◽

Parallel Computer ◽

Remote Memory ◽

Dense Matrix ◽

Radix Sort ◽

Matrix Multiplication Algorithm ◽

Bulk Synchronous Parallel ◽

Remote Memory Access

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

Download Full-text

Cluster Molecular Dynamics on Massively Parallel Computers

MRS Proceedings ◽

10.1557/proc-278-231 ◽

1992 ◽

Vol 278 ◽

Author(s):

K. M. Nelson ◽

S. T. Smith ◽

L. T. Wille

Keyword(s):

Molecular Dynamics ◽

Noble Gas ◽

Equations Of Motion ◽

Pair Potential ◽

Parallel Computer ◽

Massively Parallel ◽

Massively Parallel Computers ◽

Gas Clusters ◽

Element Array ◽

Small Clusters

AbstractWe report the results of computer simulations of phase transitions in noble-gas clusters. The calculations were performed on a MasPar MP-l massively parallel computer with 8,192 processing elements (PE's). We discuss the efficient implementation of molecular dynamics algorithms for small clusters on this type of architecture. The simulations are based on a classical Lennard-Jones pair potential and follow the temporal evolution of the system by numerically integrating Newton's equations of motion using the Gear algorithm. Because the number of particles is much smaller than the number of PE's, optimal partitioning of the processing element array is an essential and non-trivial task.

Download Full-text

Communication Issues in Parallel Systems with Optical Interconnections

International Journal of Foundations of Computer Science ◽

10.1142/s0129054197000124 ◽

1997 ◽

Vol 08 (02) ◽

pp. 143-162 ◽

Cited By ~ 1

Author(s):

Pascal Berthomé ◽

Afonso Ferreira

Keyword(s):

Interconnection Networks ◽

Parallel Computers ◽

Parallel Architectures ◽

Massively Parallel ◽

Optical Interconnections ◽

Processing Elements ◽

Bounded Number ◽

Massively Parallel Computers ◽

Communication Problems ◽

Communication Operations

In classical massively parallel computers, the complexity of the interconnection networks is much higher than the complexity of the processing elements themselves. However, emerging optical technologies may provide a way to reconsider very large parallel architectures where processors would communicate by optical means. In this paper, we compare some optically interconnected parallel multicomputer models with regard to their communication capabilities. We first establish a distinction of such systems, based on the independence of the communication elements embedded in the processors (transmitters and receivers). Then, motivated by the fact that in multicomputers some communication operations have to be very efficiently performed, we study communication problems, namely, broadcast and multi-broadcast, under the hypothesis of bounded fanout. Our results take also into account a bounded number of available wavelengths.

Download Full-text

A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication

IBM Journal of Research and Development ◽

10.1147/rd.386.0673 ◽

1994 ◽

Vol 38 (6) ◽

pp. 673-681 ◽

Cited By ~ 46

Author(s):

R. C. Agarwal ◽

F. G. Gustavson ◽

M. Zubair

Keyword(s):

High Performance ◽

Distributed Memory ◽

Matrix Multiplication ◽

Parallel Computer ◽

Matrix Multiplication Algorithm ◽

Multiplication Algorithm ◽

Performance Matrix

Download Full-text

A New Parallel Matrix Multiplication Algorithm on Hex-Cell Network (PMMHC) Using IMAN1 Super Computer

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2017.9503 ◽

2017 ◽

Vol 9 (5) ◽

pp. 29-37

Author(s):

Enas Rawashdeh ◽

Mohammad Qatawneh ◽

Hussein A. Al Ofeishat

Keyword(s):

Matrix Multiplication ◽

Matrix Multiplication Algorithm ◽

Cell Network ◽

Multiplication Algorithm ◽

Super Computer

Download Full-text

THE RECONFIGURABLE RING OF PROCESSORS: EFFICIENT ALGORITHMS VIA HYPERCUBE SIMULATION

Parallel Processing Letters ◽

10.1142/s0129626495000059 ◽

1995 ◽

Vol 05 (01) ◽

pp. 37-48 ◽

Cited By ~ 2

Author(s):

ARNOLD L. ROSENBERG ◽

VITTORIO SCARANO ◽

RAMESH K. SITARAMAN

Keyword(s):

Communication Protocol ◽

Matrix Multiplication ◽

Constant Factor ◽

Efficient Algorithms ◽

Parallel Computer ◽

Computational Power ◽

Processing Elements ◽

Dynamically Reconfigurable ◽

Small Constant ◽

In Transit

We propose a design for, and investigate the computational power of a dynamically reconfigurable parallel computer that we call the Reconfigurable Ring of Processors ([Formula: see text], for short). The [Formula: see text] is a ring of identical processing elements (PEs) that are interconnected via a flexible multi-line reconfigurable bus, each of whose lines has one-packet width and can be configured, independently of the other lines, to establish an arbitrary PE-to-PE connection. A novel aspect of our design is a communication protocol we call COMET — for Cooperative MEssage Transmission — which allows PEs of an [Formula: see text] to exchange one-packet messages with latency that is logarithmic in the number of PEs the message passes over in transit. The main contribution of this paper is an algorithm that allows an N-PE, N-line [Formula: see text] to simulate an N-PE hypercube executing a normal algorithm, with slowdown less than 4 log log N, provided that the local state of a hypercube PE can be encoded and transmitted using a single packet. This simulation provides a rich class of efficient algorithms for the [Formula: see text], including algorithms for matrix multiplication, sorting, and the Fast Fourer Transform (often using fewer than N buslines). The resulting algorithms for the [Formula: see text] are often within a small constant factor of optimal.

Download Full-text

Parallel Substructure Approach for Massively Parallel Computers

ASME 1992 International Computers in Engineering Conference: Volume 2 — Finite Element Techniques; Computers in Education; Robotics and Controls ◽

10.1115/cie1992-0093 ◽

1992 ◽

Author(s):

Robert E. Fulton ◽

Philip S. Su

Keyword(s):

Parallel Computers ◽

Computer Architectures ◽

Parallel Computer ◽

Massively Parallel ◽

Computer Algorithms ◽

Solution Strategy ◽

Parallel Method ◽

System Equation ◽

Massively Parallel Computers ◽

Substructure Approach

Abstract New massively parallel computer architectures have revolutionized the design of computer algorithms, and promise to have significant influence on algorithms for engineering computations. The traditional global model parallel method has a limited benefit for massively parallel computers. An alternative method is to use the substructure approach. This paper is to explore the potential for substructure strategy through actual examples. Each substructure is mapped on to some processors of a MIMD parallel computer. The internal nodes variables will be condensed into boundary nodes variables in each substructure. All substructures computations can be performed in parallel until the global boundary system equation is formed. A direct solution strategy for the global boundary displacements is performed. The final internal nodes displacements solution in each substructure can be performed in parallel. Examples for two-dimensional static analysis are presented on a BBN Butterfly GP1000 parallel computer.

Download Full-text

The function of a connection network between hosts and processing elements in massively parallel computer systems

Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation ◽

10.1109/fmpc.1988.47399 ◽

2003 ◽

Author(s):

T. Bridges

Keyword(s):

Computer Systems ◽

Parallel Computer ◽

Massively Parallel ◽

Processing Elements

Download Full-text

Detection, Averaging, and 3D Reconstruction of Biological Specimens on Hypercubes (Transputer-based) Computers

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100181026 ◽

1990 ◽

Vol 48 (1) ◽

pp. 454-455

Author(s):

Jose-Maria Carazo ◽

I. Benavides ◽

S. Marco ◽

J.L. Carrascosa ◽

E.L. Zapata

Keyword(s):

3D Reconstruction ◽

3D Structure ◽

Three Dimensional ◽

Software Tools ◽

Mapping Method ◽

Computer Architectures ◽

Parallel Computer ◽

Reconstruction Process ◽

Computing Power ◽

Order Of Magnitude

Obtaining the three-dimensional (3D) structure of negatively stained biological specimens at a resolution of, typically, 2 - 4 nm is becoming a relatively common practice in an increasing number of laboratories. A combination of new conceptual approaches, new software tools, and faster computers have made this situation possible. However, all these 3D reconstruction processes are quite computer intensive, and the middle term future is full of suggestions entailing an even greater need of computing power. Up to now all published 3D reconstructions in this field have been performed on conventional (sequential) computers, but it is a fact that new parallel computer architectures represent the potential of order-of-magnitude increases in computing power and should, therefore, be considered for their possible application in the most computing intensive tasks.We have studied both shared-memory-based computer architectures, like the BBN Butterfly, and local-memory-based architectures, mainly hypercubes implemented on transputers, where we have used the algorithmic mapping method proposed by Zapata el at. In this work we have developed the basic software tools needed to obtain a 3D reconstruction from non-crystalline specimens (“single particles”) using the so-called Random Conical Tilt Series Method. We start from a pair of images presenting the same field, first tilted (by ≃55°) and then untilted. It is then assumed that we can supply the system with the image of the particle we are looking for (ideally, a 2D average from a previous study) and with a matrix describing the geometrical relationships between the tilted and untilted fields (this step is now accomplished by interactively marking a few pairs of corresponding features in the two fields). From here on the 3D reconstruction process may be run automatically.

Download Full-text

A Parallel Matrix Multiplication Algorithm Based on Network of Moore Graph of Diameter 2

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2013.01843 ◽

2014 ◽

Vol 36 (9) ◽

pp. 1843-1849

Author(s):

Bing ZHANG

Keyword(s):

Matrix Multiplication ◽

Matrix Multiplication Algorithm ◽

Multiplication Algorithm ◽

Moore Graph

Download Full-text