SnBversion 2.2: an example of crystallographic multiprocessing

The computer programSnBimplements a direct-methods algorithm, known asShake-and-Bake, which optimizes trial structures consisting of randomly positioned atoms. Although largeShake-and-Bakeapplications require significant amounts of computing time, the algorithm can be easily implemented in parallel in order to decrease the real time required to achieve a solution. By using a master–worker model,SnBversion 2.2 is amenable to all of the prevalent modern parallel-computing platforms, including (i) shared-memory multiprocessor machines, such as the SGI Origin2000, (ii) distributed-memory multiprocessor machines, such as the IBM SP, and (iii) collections of workstations, including Beowulf clusters. A linear speedup in the processing of a fixed number of trial structures can be obtained on each of these platforms.

Download Full-text

The Same-Source Parallel MM5

Scientific Programming ◽

10.1155/2000/712795 ◽

2000 ◽

Vol 8 (1) ◽

pp. 5-12 ◽

Cited By ~ 6

Author(s):

John Michalakes

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Mesoscale Model ◽

State University ◽

Parallel Performance ◽

Hybrid Parallelization ◽

Beowulf Clusters ◽

Penn State ◽

Library Support ◽

The Impact

Beginning with the March 1998 release of the Penn State University/NCAR Mesoscale Model (MM5), and continuing through eight subsequent releases up to the present, the official version has run on distributed -memory (DM) parallel computers. Source translation and runtime library support minimize the impact of parallelization on the original model source code, with the result that the majority of code is line-for-line identical with the original version. Parallel performance and scaling are equivalent to earlier, hand-parallelized versions; the modifications have no effect when the code is compiled and run without the DM option. Supported computers include the IBM SP, Cray T3E, Fujitsu VPP, Compaq Alpha clusters, and clusters of PCs (so-called Beowulf clusters). The approach also is compatible with shared-memory parallel directives, allowing distributed-memory/shared-memory hybrid parallelization on distributed-memory clusters of symmetric multiprocessors.

Download Full-text

PALS: Efficient Or-Parallel execution of Prolog on Beowulf clusters

Theory and Practice of Logic Programming ◽

10.1017/s1471068406002985 ◽

2007 ◽

Vol 7 (6) ◽

pp. 633-695

Author(s):

ENRICO PONTELLI ◽

KAREN VILLAVERDE ◽

HAI-FENG GUO ◽

GOPAL GUPTA

Keyword(s):

Shared Memory ◽

Distributed Memory ◽

Parallel Implementation ◽

Splitting Method ◽

Memory Systems ◽

Parallel Execution ◽

Distributed Scheduling ◽

Effective Technique ◽

Constraint Systems ◽

Beowulf Clusters

AbstractThis paper describes the development of thePALSsystem, an implementation of Prolog capable of efficiently exploiting or-parallelism ondistributed-memoryplatforms—specifically Beowulf clusters. PALS makes use of a novel technique, calledincremental stack-splitting. The technique proposed builds on the stack-splitting approach, previously described by the authors and experimentally validated on shared-memory systems, which in turn is an evolution of the stack-copying method used in a variety of parallel logic and constraint systems—e.g., MUSE, YAP, and Penny. The PALS system is the first distributed or-parallel implementation of Prolog based on the stack-splitting method ever realized. The results presented confirm the superiority of this method as a simple yet effective technique to transition from shared-memory to distributed-memory systems. PALS extends stack-splitting by combining it with incremental copying; the paper provides a description of the implementation of PALS, including details of how distributed scheduling is handled. We also investigate methodologies to effectively support order-sensitive predicates (e.g., side-effects) in the context of the stack-splitting scheme. Experimental results obtained from running PALS on both Shared Memory and Beowulf systems are presented and analyzed.

Download Full-text

Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor

ACM SIGSIM Simulation Digest ◽

10.1145/174134.158480 ◽

1993 ◽

Vol 23 (1) ◽

pp. 159-162

Author(s):

David R. Cheriton ◽

Hendrik A. Goosen ◽

Hugh Holbrook ◽

Philip Machanick

Keyword(s):

Shared Memory ◽

Parallel Simulation ◽

Shared Memory Multiprocessor

Download Full-text

Practical Wavelet Tree Construction

Journal of Experimental Algorithmics ◽

10.1145/3457197 ◽

2021 ◽

Vol 26 ◽

pp. 1-67

Author(s):

Patrick Dinklage ◽

Jonas Ellert ◽

Johannes Fischer ◽

Florian Kurpicz ◽

Marvin Löbel

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Distributed Memory ◽

Auxiliary Information ◽

Parallel Computers ◽

External Memory ◽

Sequential Algorithms ◽

Bottom Up ◽

Memory Efficiency ◽

Tree Construction

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

Minimal Aggregated Shared Memory Messaging on Distributed Memory Supercomputers

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2016.72 ◽

2016 ◽

Author(s):

Benjamin F. Jamroz ◽

John M. Dennis

Keyword(s):

Shared Memory ◽

Distributed Memory

Download Full-text

A cache coherency scheme for an asynchronous packet-switched shared memory multiprocessor

Proceedings of 36th Midwest Symposium on Circuits and Systems ◽

10.1109/mwscas.1993.343101 ◽

2002 ◽

Author(s):

S. Alles ◽

S. Mahmud

Keyword(s):

Shared Memory ◽

Shared Memory Multiprocessor ◽

Cache Coherency

Download Full-text

Closed-Form Expressions for Numerical Evaluation of Self-Impedance Terms Involved on Wire Antenna Analysis by the Method of Moments

Electronics ◽

10.3390/electronics10111316 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1316

Author(s):

Carlos-Ivan Paez-Rueda ◽

Arturo Fajardo ◽

Manuel Pérez ◽

Gabriel Perilla

Keyword(s):

Numerical Integration ◽

Method Of Moments ◽

Computing Time ◽

Basis Functions ◽

Wire Antenna ◽

Complex Geometries ◽

Point Matching ◽

Piecewise Constant ◽

Matching Procedure ◽

Time Required

This paper proposes new closed expressions of self-impedance using the Method of Moments with the Point Matching Procedure and piecewise constant and linear basis functions in different configurations, which allow saving computing time for the solution of wire antennas with complex geometries. The new expressions have complexity O(1) with well-defined theoretical bound errors. They were compared with an adaptive numerical integration. We obtain an accuracy between 7 and 16 digits depending on the chosen basis function and segmentation used. Besides, the computing time involved in the calculation of the self-impedance terms was evaluated and compared with the time required by the adaptative quadrature integration solution of the same problem. Expressions have a run-time bounded between 50 and 200 times faster than an adaptive numerical integration assuming full computation of all constant of the expressions.

Download Full-text