Billion Degree-of-Freedom Granular Dynamics Simulation on Commodity Hardware

Degrees Of Freedom ◽

Discharge Rate ◽

Dynamics Simulation ◽

Scaling Analysis ◽

Triangle Mesh ◽

Granular Dynamics ◽

Empirical Coefficients ◽

Mass Discharge ◽

Abstract This study describes the implementation of a granular dynamics solver designed to run on Graphics Processing Units (GPUs). The discussion concentrates on how the Discrete Element Method (DEM) has been mapped onto the GPU architecture, the software design decisions involved in the process, and the optimizations allowed by those decisions. This solver, called Chrono::Granular, has been developed as a standalone library that can interface with other dynamics engines via triangle mesh co-simulation. A scaling analysis of the code presented herein demonstrates linear scaling with problem sizes of over two billion degrees of freedom and closing in on one billion bodies. We conclude with a study of hourglass (or hopper) mass discharge rate which compares the solver to experimental results and investigates a process for determining empirical coefficients of flow rate through simulation.

EFFECTIVE USE OF PROGRAMMABLE GRAPHICS PROCESSING UNITS IN PROBLEMS OF MOLECULAR DYNAMICS SIMULATION

Systems and Means of Informatics ◽

10.14357/08696527170408 ◽

2017 ◽

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Dynamics Simulation ◽

Effective Use ◽

On the use of graphics processing units (GPUs) for molecular dynamics simulation of spherical particles

10.1063/1.4811894 ◽

2013 ◽

Cited By ~ 7

Author(s):

R. C. Hidalgo ◽

T. Kanzaki ◽

F. Alonso-Marroquin ◽

S. Luding

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Dynamics Simulation ◽

Spherical Particles ◽

Symbolic and Numeric Kernel Division for GPU-based FEA Assembly of Regular Meshes with Modified Sparse Storage Formats

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4051123 ◽

2021 ◽

pp. 1-35

Author(s):

Subhajit Sanfui ◽

Deepak Sharma

Keyword(s):

Degrees Of Freedom ◽

State Of The Art ◽

General Purpose ◽

Storage Space ◽

Element Analysis ◽

Race Condition ◽

Inherent Problem ◽

Space Requirements ◽

Abstract This paper presents an efficient strategy to perform the assembly stage of finite element analysis (FEA) on general-purpose graphics processing units (GPU). This strategy involves dividing the assembly task by using symbolic and numeric kernels, and thereby reducing the complexity of the standard single-kernel assembly approach. Two sparse storage formats based on the proposed strategy are also developed by modifying the existing sparse storage formats with the intention of removing the degrees of freedom-based redundancies in the global matrix. The inherent problem of race condition is resolved through the implementation of coloring and atomics. The proposed strategy is compared with the state-of-the-art GPU-based and CPU-based assembly techniques. These comparisons reveal a significant number of benefits in terms of reducing storage space requirements and execution time and increasing performance (GFLOPS). Moreover, using the proposed strategy, it is found that the coloring method is more effective compared to the atomics-based method for the existing as well as the modified storage formats.

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Reversibly Sampling Conformations and Binding Modes Using Molecular Darting

10.26434/chemrxiv.12670676.v2 ◽

2020 ◽

Author(s):

Samuel C. Gill ◽

David Mobley

Keyword(s):

Monte Carlo ◽

Ligand Binding ◽

Binding Sites ◽

Degrees Of Freedom ◽

Dynamics Simulation ◽

Hiv Integrase ◽

Binding Modes ◽

System A ◽

Multiple Binding ◽

Internal Degrees Of Freedom

<div>Sampling multiple binding modes of a ligand in a single molecular dynamics simulation is difficult. A given ligand may have many internal degrees of freedom, along with many different ways it might orient itself a binding site or across several binding sites, all of which might be separated by large energy barriers. We have developed a novel Monte Carlo move called Molecular Darting (MolDarting) to reversibly sample between predefined binding modes of a ligand. Here, we couple this with nonequilibrium candidate Monte Carlo (NCMC) to improve acceptance of moves.</div><div>We apply this technique to a simple dipeptide system, a ligand binding to T4 Lysozyme L99A, and ligand binding to HIV integrase in order to test this new method. We observe significant increases in acceptance compared to uniformly sampling the internal, and rotational/translational degrees of freedom in these systems.</div>

Parallel Option Pricing with Fourier Space Time-Stepping Method on Graphics Processing Units

SSRN Electronic Journal ◽

10.2139/ssrn.1020207 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vladimir Surkov

Keyword(s):

Option Pricing ◽

Space Time ◽

Fourier Space ◽

Time Stepping ◽

Improving the Efficiency and the Accuracy of 2D Gel Electrophoresis Spot Detection Using Graphics Processing Units

Current Bioinformatics ◽

10.2174/1574893612666170725141905 ◽

2018 ◽

Vol 13 (2) ◽

pp. 193-206 ◽

Cited By ~ 1

Author(s):

Marwa K. Elteir ◽

Shaheera A. Rashwan ◽

Ashraf A. Khalil

Keyword(s):

Gel Electrophoresis ◽

2D Gel Electrophoresis ◽

Spot Detection ◽

2D Gel ◽

Using graphics processing units on the cloud to accelerate and reduce processing cost of parameters estimation of seismic processing algorithm

10.22564/16cisbgf2019.221 ◽

2019 ◽

Author(s):

Nicholas Okita ◽

Tiago Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Parameters Estimation ◽

Processing Algorithm ◽

Seismic Processing ◽

Processing Cost ◽

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

Review of smoothed particle hydrodynamics: towards converged Lagrangian flow modelling

10.1098/rspa.2019.0801 ◽

2020 ◽

Vol 476 (2241) ◽

pp. 20190801

Author(s):

Steven J. Lind ◽

Benedict D. Rogers ◽

Peter K. Stansby

Keyword(s):

Smoothed Particle Hydrodynamics ◽

Wave Structure ◽

Free Form ◽

Mesh Free ◽

Weakly Compressible ◽

Particle Hydrodynamics ◽

Massively Parallel Computing ◽

Smoothed Particle ◽

This paper presents a review of the progress of smoothed particle hydrodynamics (SPH) towards high-order converged simulations. As a mesh-free Lagrangian method suitable for complex flows with interfaces and multiple phases, SPH has developed considerably in the past decade. While original applications were in astrophysics, early engineering applications showed the versatility and robustness of the method without emphasis on accuracy and convergence. The early method was of weakly compressible form resulting in noisy pressures due to spurious pressure waves. This was effectively removed in the incompressible (divergence-free) form which followed; since then the weakly compressible form has been advanced, reducing pressure noise. Now numerical convergence studies are standard. While the method is computationally demanding on conventional processors, it is well suited to parallel processing on massively parallel computing and graphics processing units. Applications are diverse and encompass wave–structure interaction, geophysical flows due to landslides, nuclear sludge flows, welding, gearbox flows and many others. In the state of the art, convergence is typically between the first- and second-order theoretical limits. Recent advances are improving convergence to fourth order (and higher) and these will also be outlined. This can be necessary to resolve multi-scale aspects of turbulent flow.

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.