The fast multipole method on parallel clusters, multicore processors, and graphics processing units

2011 ◽  
Vol 339 (2-3) ◽  
pp. 185-193 ◽  
Author(s):  
Eric Darve ◽  
Cris Cecka ◽  
Toru Takahashi
Author(s):  
Bartosz Kohnke ◽  
Carsten Kutzner ◽  
Andreas Beckmann ◽  
Gert Lube ◽  
Ivo Kabadshow ◽  
...  

Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation has [Formula: see text] computational complexity. The fast multipole method (FMM) can reduce runtime and complexity to [Formula: see text] for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with [Formula: see text] potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.


2011 ◽  
Vol 230 (15) ◽  
pp. 5807-5821 ◽  
Author(s):  
Bo Zhang ◽  
Jingfang Huang ◽  
Nikos P. Pitsianis ◽  
Xiaobai Sun

1992 ◽  
Vol 278 ◽  
Author(s):  
Steven R. Lustig ◽  
J.J. Cristy ◽  
D.A. Pensak

AbstractThe fast multipole method (FMM) is implemented in canonical ensemble particle simulations to compute non-bonded interactions efficiently with explicit error control. Multipole and local expansions have been derived to implement the FMM efficiently in Cartesian coordinates for soft-sphere (inverse power law), Lennard- Jones, Morse and Yukawa potential functions. Significant reductions in execution times have been achieved with respect to the direct method. For a given number, N, of particles the execution times of the direct method scale asO(N2). The FMM execution times scale asO(N) on sequential workstations and vector processors and asymptotically0(logN) on massively parallel computers. Connection Machine CM-2 and WAVETRACER-DTC parallel FMM implementations execute faster than the Cray-YMP vectorized FMM for ensemble sizes larger than 28k and 35k, respectively. For 256k particle ensembles the CM-2 parallel FMM is 12 times faster than the Cray-YMP vectorized direct method and 2.2 times faster than the vectorized FMM. For 256k particle ensembles the WAVETRACER-DTC parallel FMM is 33 times faster than the Cray-YMP vectorized direct method.


Sign in / Sign up

Export Citation Format

Share Document