Towards Structured Parallel Computing on Architecture-Independent Parallel Algorithm Design for Distributed-Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text

Architecture independent parallel algorithm design: theory vs practice

Future Generation Computer Systems ◽

10.1016/s0167-739x(01)00068-1 ◽

2002 ◽

Vol 18 (5) ◽

pp. 573-593 ◽

Cited By ~ 4

Author(s):

Alexandros V. Gerbessiotis

Keyword(s):

Parallel Algorithm ◽

Design Theory ◽

Algorithm Design ◽

Parallel Algorithm Design

Download Full-text

A Structured Representation for Parallel Algorithm Design on Multicomputers

The Sixth Distributed Memory Computing Conference, 1991. Proceedings ◽

10.1109/dmcc.1991.633319 ◽

2005 ◽

Author(s):

Xian-He Sun ◽

L.M. Ni

Keyword(s):

Parallel Algorithm ◽

Algorithm Design ◽

Parallel Algorithm Design ◽

Structured Representation

Download Full-text

Parallel Algorithm Design for Binary Tree Traversing Sequence Based on Coding

2009 5th International Conference on Wireless Communications, Networking and Mobile Computing ◽

10.1109/wicom.2009.5303612 ◽

2009 ◽

Author(s):

Gejun Zhu ◽

Yuwan Gu ◽

Yuqaing Sun

Keyword(s):

Parallel Algorithm ◽

Binary Tree ◽

Algorithm Design ◽

Parallel Algorithm Design

Download Full-text

Parallel Algorithm Design and Performance Evaluation of FDTD on 3 Different Architectures: Cluster, Homogeneous Multicore and Cell/B.E.

2008 10th IEEE International Conference on High Performance Computing and Communications ◽

10.1109/hpcc.2008.85 ◽

2008 ◽

Cited By ~ 2

Author(s):

Meilian Xu ◽

Parimala Thulasiraman

Keyword(s):

Performance Evaluation ◽

Parallel Algorithm ◽

Algorithm Design ◽

And Performance ◽

Parallel Algorithm Design

Download Full-text

A highly parallel algorithm to approximate MaxCut on distributed memory architectures

Proceedings of 9th International Parallel Processing Symposium ◽

10.1109/ipps.1995.395922 ◽

2002 ◽

Author(s):

S. Homer ◽

M. Peinado

Keyword(s):

Parallel Algorithm ◽

Distributed Memory ◽

Memory Architectures

Download Full-text

Data Parallel and Scheduling Mechanism Based on Petri Nets

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.3264 ◽

2014 ◽

Vol 543-547 ◽

pp. 3264-3267

Author(s):

Wan Feng Dou ◽

Jing Zhao ◽

Kun Yang ◽

Min Xu

Keyword(s):

Parallel Computing ◽

Petri Nets ◽

Distributed Memory ◽

Algorithm Design ◽

Data Partition ◽

Parallel Methods ◽

Node Number ◽

Data Parallel ◽

Total Data ◽

Data Granularity

Data-parallel and task-parallel methods are the basic methods frequently used for algorithm design in parallel computing. Data-parallel method as name means is used for partition data to be processed into some small blocks considering storage and computing capacity such as memory size of a computation node, node number to take part in parallel computing and total data size, and etc. On the other hand, data dispensing strategy is an important problem carefully considered to increase the efficiency of computation. According to the characteristics of analysis of digital terrain, petri nets is introduced to describe the parallel relationships within data partitions based on data granularity model considering two kinds of computing modes, shared memory and distributed memory respectively, and corresponding scheduling algorithms are proposed for load balance. The experimental results show that our method is very usable to data partition and dispensation, in particular to distributed memory mode.

Download Full-text