Parallelizing RRT on Large-Scale Distributed-Memory Architectures

We present a rule-based framework for the development of scalable parallel high performance simulations for a broad class of scientific applications (with particular emphasis on continuum mechanics). We take a pragmatic approach to our programming abstractions by implementing structures that are used frequently and have common high performance implementations on distributed memory architectures. The resulting framework borrows heavily from rule-based systems for relational database models, however limiting the scope to those parts that have obvious high performance implementation. Using our approach, we demonstrate predictable performance behavior and efficient utilization of large scale distributed memory architectures on problems of significant complexity involving multiple disciplines.

Download Full-text

Solution of large-scale scattering problems with the multilevel fast multipole algorithm parallelized on distributed-memory architectures

2007 22nd international symposium on computer and information sciences ◽

10.1109/iscis.2007.4456837 ◽

2007 ◽

Author(s):

Ozgur Ergul ◽

Levent Gurel

Keyword(s):

Large Scale ◽

Distributed Memory ◽

Fast Multipole ◽

Scattering Problems ◽

Multilevel Fast Multipole Algorithm ◽

Fast Multipole Algorithm ◽

Memory Architectures

Download Full-text

Implementing actor-based primitives on distributed-memory architectures

ACM SIGPLAN OOPS Messenger ◽

10.1145/127070.127078 ◽

1991 ◽

Vol 2 (2) ◽

pp. 45-49 ◽

Cited By ~ 1

Author(s):

Michele Di Santo ◽

Giulio Iannello

Keyword(s):

Distributed Memory ◽

Memory Architectures

Download Full-text

Overcoming the startup time problem in distributed memory architectures

10.1109/hicss.1991.183927 ◽

2002 ◽

Cited By ~ 7

Author(s):

W. Schroeder-Preikschat

Keyword(s):

Distributed Memory ◽

Time Problem ◽

Startup Time ◽

Memory Architectures

Download Full-text

Parallel distributed-memory simplex for large-scale stochastic LP problems

Computational Optimization and Applications ◽

10.1007/s10589-013-9542-y ◽

2013 ◽

Vol 55 (3) ◽

pp. 571-596 ◽

Cited By ~ 15

Author(s):

Miles Lubin ◽

J. A. Julian Hall ◽

Cosmin G. Petra ◽

Mihai Anitescu

Keyword(s):

Large Scale ◽

Distributed Memory

Download Full-text

A Library for Coarse Grain Macro-Pipelining in Distributed Memory Architectures

Programming Environments for Massively Parallel Distributed Systems ◽

10.1007/978-3-0348-8534-8_37 ◽

1994 ◽

pp. 365-371

Author(s):

F. Desprez

Keyword(s):

Distributed Memory ◽

Coarse Grain ◽

Memory Architectures

Download Full-text

Performance Characterization of a Hierarchical MPI Implementation on Large-scale Distributed-memory Platforms

2009 International Conference on Parallel Processing ◽

10.1109/icpp.2009.51 ◽

2009 ◽

Author(s):

Sadaf R. Alam ◽

Richard Barrett ◽

Jeffery Kuehn ◽

Steve Poole

Keyword(s):

Large Scale ◽

Distributed Memory ◽

Performance Characterization ◽

Mpi Implementation

Download Full-text

Compilation for Distributed Memory Architectures

The Compiler Design Handbook ◽

10.1201/9781420040579.ch11 ◽

2002 ◽

Author(s):

Alok Choudhary ◽

Mahmut Kandemir

Keyword(s):

Distributed Memory ◽

Memory Architectures

Download Full-text

Compilation for Distributed Memory Architectures Alok Choudhary and Mahmut Kandemir

The Compiler Design Handbook ◽

10.1201/9781420040579-15 ◽

2002 ◽

pp. 385-420

Keyword(s):

Distributed Memory ◽

Memory Architectures

Download Full-text

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text