The fast multipole method on parallel clusters, multicore processors, and graphics processing units

Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation has [Formula: see text] computational complexity. The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.

Download Full-text

Numerical Synthesis of Translation Operators for the Multi-Level Fast Multipole Method

2020 14th European Conference on Antennas and Propagation (EuCAP) ◽

10.23919/eucap48036.2020.9135193 ◽

2020 ◽

Author(s):

Arslan Azhar ◽

Thomas F. Eibert

Keyword(s):

Fast Multipole Method ◽

Fast Multipole ◽

Multipole Method ◽

Translation Operators ◽

Multi Level ◽

Numerical Synthesis

Download Full-text

Another preprocessing algorithm for generalized one-dimensional fast multipole method

Journal of Computational Physics ◽

10.1016/j.jcp.2003.10.018 ◽

2004 ◽

Vol 195 (2) ◽

pp. 790-803 ◽

Cited By ~ 4

Author(s):

Reiji Suda ◽

Shingo Kuriyama

Keyword(s):

Fast Multipole Method ◽

Fast Multipole ◽

One Dimensional ◽

Multipole Method

Download Full-text

Application of the Inverse Fast Multipole Method to Antenna Array Analysis

2020 IEEE International Symposium on Antennas and Propagation and North American Radio Science Meeting ◽

10.1109/ieeeconf35879.2020.9329965 ◽

2020 ◽

Author(s):

Keshav Sewraj ◽

Matthys M. Botha

Keyword(s):

Antenna Array ◽

Fast Multipole Method ◽

Fast Multipole ◽

Array Analysis ◽

Multipole Method

Download Full-text

Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6515 ◽

2021 ◽

Author(s):

José I. Aliaga ◽

Hartwig Anzt ◽

Thomas Grützmacher ◽

Enrique S. Quintana‐Ortí ◽

Andrés E. Tomás

Keyword(s):

Load Balancing ◽

Graphics Processing Units ◽

Sparse Matrix ◽

Multicore Processors ◽

Vector Product ◽

Graphics Processing ◽

Matrix Vector

Download Full-text

Compression of Far-Fields in the Fast Multipole Method via Tucker Decomposition

IEEE Transactions on Antennas and Propagation ◽

10.1109/tap.2021.3070113 ◽

2021 ◽

pp. 1-1

Author(s):

Cheng Qian ◽

Mingyu Wang ◽

Abdulkadir C. Yucel

Keyword(s):

Fast Multipole Method ◽

Tucker Decomposition ◽

Fast Multipole ◽

Far Fields ◽

Multipole Method

Download Full-text

Fast multipole method for 3-D Poisson-Boltzmann equation in layered electrolyte-dielectric media

Journal of Computational Physics ◽

10.1016/j.jcp.2021.110379 ◽

2021 ◽

pp. 110379

Author(s):

Bo Wang ◽

Wenzhong Zhang ◽

Wei Cai

Keyword(s):

Boltzmann Equation ◽

Fast Multipole Method ◽

Fast Multipole ◽

Dielectric Media ◽

Multipole Method ◽

Poisson Boltzmann ◽

Poisson Boltzmann Equation

Download Full-text

A Fourier-series-based kernel-independent fast multipole method

Journal of Computational Physics ◽

10.1016/j.jcp.2011.03.049 ◽

2011 ◽

Vol 230 (15) ◽

pp. 5807-5821 ◽

Cited By ~ 6

Author(s):

Bo Zhang ◽

Jingfang Huang ◽

Nikos P. Pitsianis ◽

Xiaobai Sun

Keyword(s):

Fourier Series ◽

Fast Multipole Method ◽

Fast Multipole ◽

Multipole Method

Download Full-text

The Fast Multipole Method in Canonical Ensemble Dynamics on Massively Parallel Computers

MRS Proceedings ◽

10.1557/proc-278-9 ◽

1992 ◽

Vol 278 ◽

Cited By ~ 2

Author(s):

Steven R. Lustig ◽

J.J. Cristy ◽

D.A. Pensak

Keyword(s):

Canonical Ensemble ◽

Yukawa Potential ◽

Direct Method ◽

Fast Multipole Method ◽

Parallel Computers ◽

Massively Parallel ◽

Fast Multipole ◽

Massively Parallel Computers ◽

Multipole Method ◽

Execution Times

AbstractThe fast multipole method (FMM) is implemented in canonical ensemble particle simulations to compute non-bonded interactions efficiently with explicit error control. Multipole and local expansions have been derived to implement the FMM efficiently in Cartesian coordinates for soft-sphere (inverse power law), Lennard- Jones, Morse and Yukawa potential functions. Significant reductions in execution times have been achieved with respect to the direct method. For a given number, N, of particles the execution times of the direct method scale asO(N2). The FMM execution times scale asO(N) on sequential workstations and vector processors and asymptotically0(logN) on massively parallel computers. Connection Machine CM-2 and WAVETRACER-DTC parallel FMM implementations execute faster than the Cray-YMP vectorized FMM for ensemble sizes larger than 28k and 35k, respectively. For 256k particle ensembles the CM-2 parallel FMM is 12 times faster than the Cray-YMP vectorized direct method and 2.2 times faster than the vectorized FMM. For 256k particle ensembles the WAVETRACER-DTC parallel FMM is 33 times faster than the Cray-YMP vectorized direct method.

Download Full-text