scholarly journals Parallelizing RRT on Large-Scale Distributed-Memory Architectures

2013 ◽  
Vol 29 (2) ◽  
pp. 571-579 ◽  
Author(s):  
Didier Devaurs ◽  
Thierry Simeon ◽  
Juan Cortes
2005 ◽  
Vol 15 (3) ◽  
pp. 477-502 ◽  
Author(s):  
EDWARD A. LUKE ◽  
THOMAS GEORGE

We present a rule-based framework for the development of scalable parallel high performance simulations for a broad class of scientific applications (with particular emphasis on continuum mechanics). We take a pragmatic approach to our programming abstractions by implementing structures that are used frequently and have common high performance implementations on distributed memory architectures. The resulting framework borrows heavily from rule-based systems for relational database models, however limiting the scope to those parts that have obvious high performance implementation. Using our approach, we demonstrate predictable performance behavior and efficient utilization of large scale distributed memory architectures on problems of significant complexity involving multiple disciplines.


1991 ◽  
Vol 2 (2) ◽  
pp. 45-49 ◽  
Author(s):  
Michele Di Santo ◽  
Giulio Iannello

2013 ◽  
Vol 55 (3) ◽  
pp. 571-596 ◽  
Author(s):  
Miles Lubin ◽  
J. A. Julian Hall ◽  
Cosmin G. Petra ◽  
Mihai Anitescu

Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 342
Author(s):  
Alessandro Varsi ◽  
Simon Maskell ◽  
Paul G. Spirakis

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.


Sign in / Sign up

Export Citation Format

Share Document