Addressing the challenges of standalone multi-core simulations in molecular dynamics

AbstractComputational modelling in material science involves mathematical abstractions of force fields between particles with the aim to postulate, develop and understand materials by simulation. The aggregated pairwise interactions of the material’s particles lead to a deduction of its macroscopic behaviours. For practically meaningful macroscopic scales, a large amount of data are generated, leading to vast execution times. Simulation times of hours, days or weeks for moderately sized problems are not uncommon. The reduction of simulation times, improved result accuracy and the associated software and hardware engineering challenges are the main motivations for many of the ongoing researches in the computational sciences. This contribution is concerned mainly with simulations that can be done on a “standalone” computer based on Message Passing Interfaces (MPI), parallel code running on hardware platforms with wide specifications, such as single/multi- processor, multi-core machines with minimal reconfiguration for upward scaling of computational power. The widely available, documented and standardized MPI library provides this functionality through the MPI_Comm_size (), MPI_Comm_rank () and MPI_Reduce () functions. A survey of the literature shows that relatively little is written with respect to the efficient extraction of the inherent computational power in a cluster. In this work, we discuss the main avenues available to tap into this extra power without compromising computational accuracy. We also present methods to overcome the high inertia encountered in single-node-based computational molecular dynamics. We begin by surveying the current state of the art and discuss what it takes to achieve parallelism, efficiency and enhanced computational accuracy through program threads and message passing interfaces. Several code illustrations are given. The pros and cons of writing raw code as opposed to using heuristic, third-party code are also discussed. The growing trend towards graphical processor units and virtual computing clouds for high-performance computing is also discussed. Finally, we present the comparative results of vacancy formation energy calculations using our own parallelized standalone code called Verlet–Stormer velocity (VSV) operating on 30,000 copper atoms. The code is based on the Sutton–Chen implementation of the Finnis–Sinclair pairwise embedded atom potential. A link to the code is also given.

Download Full-text

Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019826667 ◽

2019 ◽

Vol 33 (3) ◽

pp. 507-521 ◽

Cited By ~ 10

Author(s):

Vladimir Stegailov ◽

Ekaterina Dlinnova ◽

Timur Ismagilov ◽

Mikhail Khalilov ◽

Nikolay Kondratyuk ◽

...

Keyword(s):

Molecular Dynamics ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Job Scheduling ◽

Cost Effective ◽

Test Bed ◽

Molecular Dynamics Calculations ◽

High Bandwidth ◽

Network Topologies

In this article, we describe the Desmos supercomputer that consists of 32 hybrid nodes connected by a low-latency high-bandwidth Angara interconnect with torus topology. This supercomputer is aimed at cost-effective classical molecular dynamics calculations. Desmos serves as a test bed for the Angara interconnect that supports 3-D and 4-D torus network topologies and verifies its ability to unite massively parallel programming systems speeding-up effectively message-passing interface (MPI)-based applications. We describe the Angara interconnect presenting typical MPI benchmarks. Desmos benchmarks results for GROMACS, LAMMPS, VASP and CP2K are compared with the data for other high-performance computing (HPC) systems. Also, we consider the job scheduling statistics for several months of Desmos deployment.

Download Full-text

Toward Computational Accuracy in Realistic Systems to Aid Understanding of Field-Level Water Quality Issues

Physchem ◽

10.3390/physchem1030018 ◽

2021 ◽

Vol 1 (3) ◽

pp. 243-249

Author(s):

William A. Alexander

Keyword(s):

Water Quality ◽

Computational Modelling ◽

Computational Power ◽

Computational Accuracy ◽

Surface Binding ◽

Field Level ◽

Technological Advances ◽

Quality Issues ◽

Computational Systems ◽

Acute Event

Contemplating what will unfold in this new decade and those after, it is not difficult to imagine the increasing importance of conservation and protection of clean water supplies. A worrying but predictable offshoot of humanity’s technological advances is the seemingly ever-increasing chemical load burdening our waterways. In this perspective are presented a few modest areas where computational chemistry modelling could provide benefit to these efforts by harnessing the continually improving computational power available to the field. In the acute event of a chemical spill incident, true quantum-chemistry-based predictions of physicochemical properties and surface-binding behaviors can be used to help decision making in remediating the spill threat. The chronic burdens of microplastics and perfluorinated “forever chemicals” can also be addressed with computational modelling to fill the gap between feasible laboratory experiment timescales and the much-longer material lifetimes. For all of these systems, field-level accuracy models will avail themselves as the model computational systems are able to incorporate more realistic features that are relevant to water quality issues.

Download Full-text

Migrating Engineering Windows HPC applications to Linux HTCondor and Slurm Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/202024509016 ◽

2020 ◽

Vol 245 ◽

pp. 09016

Author(s):

Maria Alandes Pradillo ◽

Nils Høimyr ◽

Pablo Llopis Sanmillan ◽

Markus Tapani Jylhänkangas

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Technology Development ◽

General Purpose ◽

Low Latency ◽

Technical Solution ◽

Single Node ◽

It Department ◽

Linux Cluster

The CERN IT department has been maintaining different High Performance Computing (HPC) services over the past five years. While the bulk of computing facilities at CERN are running under Linux, a Windows cluster was dedicated for engineering simulations and analysis related to accelerator technology development. The Windows cluster consisted of machines with powerful CPUs, big memory, and a low-latency interconnect. The Linux cluster resources are accessible through HTCondor, and are used for general purpose parallel but single-node type jobs, providing computing power to the CERN experiments and departments for tasks such as physics event reconstruction, data analysis, and simulation. For HPC workloads that require multi-node parallel environments for Message Passing Interface (MPI) based programs, there is another Linux-based HPC service that is comprised of several clusters running under the Slurm batch system, and consist of powerful hardware with low-latency interconnects. In 2018, it was decided to consolidate compute intensive jobs in Linux to make a better use of the existing resources. Moreover, this was also in line with CERN IT strategy to reduce its dependencies on Microsoft products. This paper focuses on the migration of Ansys [1], COMSOL [2] and CST [3] users from Windows HPC to Linux clusters. Ansys, COMSOL and CST are three engineering applications used at CERN for different domains, like multiphysics simulations and electromagnetic field problems. Users of these applications are in different departments, with different needs and levels of expertise. In most cases, the users have no prior knowledge of Linux. The paper will present the technical strategy to allow the engineering users to submit their simulations to the appropriate Linux cluster, depending on their simulation requirements. We also describe the technical solution to integrate their Windows workstations in order from them to be able to submit to Linux clusters. Finally, we discuss the challenges and lessons learnt during the migration.

Download Full-text

A framework for genomic sequencing on clusters of multicore and manycore processors

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016653243 ◽

2016 ◽

Vol 32 (3) ◽

pp. 393-406 ◽

Cited By ~ 1

Author(s):

Héctor Martínez ◽

Sergio Barrachina ◽

Maribel Castillo ◽

Joaquín Tárraga ◽

Ignacio Medina ◽

...

Keyword(s):

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Message Passing Interface ◽

Multicore Processors ◽

Global Alignment ◽

Genomic Sequencing ◽

Single Node ◽

Strongly Coupled ◽

Two Stages

The advances in genomic sequencing during the past few years have motivated the development of fast and reliable software for DNA/RNA sequencing on current high performance architectures. Most of these efforts target multicore processors, only a few can also exploit graphics processing units, and a much smaller set will run in clusters equipped with any of these multi-threaded architecture technologies. Furthermore, the examples that can be used on clusters today are all strongly coupled with a particular aligner. In this paper we introduce an alignment framework that can be leveraged to coordinately run any “single-node” aligner, taking advantage of the resources of a cluster without having to modify any portion of the original software. The key to our transparent migration lies in hiding the complexity associated with the multi-node execution (such as coordinating the processes running in the cluster nodes) inside the generic-aligner framework. Moreover, following the design and operation in our Message Passing Interface (MPI) version of HPG Aligner RNA BWT, we organize the framework into two stages in order to be able to execute different aligners in each one of them. With this configuration, for example, the first stage can ideally apply a fast aligner to accelerate the process, while the second one can be tuned to act as a refinement stage that further improves the global alignment process with little cost.

Download Full-text

Multi-level Parallelization of Genotype Imputation on Supercomputers

Current Bioinformatics ◽

10.2174/1574893615999200420071307 ◽

2020 ◽

Vol 15 ◽

Author(s):

Weiwen Zhang ◽

Long Wang ◽

Theint Theint Aye ◽

Juniarto Samsudin ◽

Yongqing Zhu

Keyword(s):

Association Study ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Genome Wide Association Study ◽

Job Scheduling ◽

Genotype Imputation ◽

Job Level ◽

Multi Level ◽

High Performance Requirement

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.

Download Full-text

Enhancing the PVA fiber-matrix interface properties in ultra high performance concrete: An experimental and molecular dynamics study

Construction and Building Materials ◽

10.1016/j.conbuildmat.2021.122862 ◽

2021 ◽

Vol 285 ◽

pp. 122862

Author(s):

Yang Zhou ◽

Jiale Huang ◽

Xiao Yang ◽

Yujia Dong ◽

Taotao Feng ◽

...

Keyword(s):

Molecular Dynamics ◽

High Performance ◽

High Performance Concrete ◽

Interface Properties ◽

Matrix Interface ◽

Ultra High Performance Concrete ◽

Fiber Matrix Interface ◽

Fiber Matrix ◽

Pva Fiber

Download Full-text

GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP

The International Journal of High Performance Computing Applications ◽

10.1177/10943420211008288 ◽

2021 ◽

pp. 109434202110082

Author(s):

Nikolay Kondratyuk ◽

Vsevolod Nikolskiy ◽

Daniil Pavlov ◽

Vladimir Stegailov

Keyword(s):

Molecular Dynamics ◽

High Performance ◽

Software Performance ◽

Computing Systems ◽

Accelerated Molecular Dynamics ◽

Nvidia Cuda ◽

Software And Hardware ◽

Management Capabilities ◽

Utilization Time ◽

Performance Computing

Classical molecular dynamics (MD) calculations represent a significant part of the utilization time of high-performance computing systems. As usual, the efficiency of such calculations is based on an interplay of software and hardware that are nowadays moving to hybrid GPU-based technologies. Several well-developed open-source MD codes focused on GPUs differ both in their data management capabilities and in performance. In this work, we analyze the performance of LAMMPS, GROMACS and OpenMM MD packages with different GPU backends on Nvidia Volta and AMD Vega20 GPUs. We consider the efficiency of solving two identical MD models (generic for material science and biomolecular studies) using different software and hardware combinations. We describe our experience in porting the CUDA backend of LAMMPS to ROCm HIP that shows considerable benefits for AMD GPUs comparatively to the OpenCL backend.

Download Full-text

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing ◽

10.1016/0167-8191(96)00024-5 ◽

1996 ◽

Vol 22 (6) ◽

pp. 789-828 ◽

Cited By ~ 1155

Author(s):

William Gropp ◽

Ewing Lusk ◽

Nathan Doss ◽

Anthony Skjellum

Keyword(s):

Message Passing ◽

High Performance ◽

Message Passing Interface

Download Full-text

Atomic Mechanisms of Grain Boundary Motion

Materials Science Forum ◽

10.4028/www.scientific.net/msf.502.157 ◽

2005 ◽

Vol 502 ◽

pp. 157-162 ◽

Cited By ~ 65

Author(s):

A. Suzuki ◽

Yuri M. Mishin

Keyword(s):

Molecular Dynamics ◽

Grain Boundary ◽

Geometric Model ◽

Lattice Rotation ◽

Shear Stresses ◽

Translation Rate ◽

Embedded Atom ◽

Local Lattice ◽

Symmetrical Tilt ◽

Grain Boundary Motion

We present results of atomistic computer simulations of spontaneous and stress-induced grain boundary (GB) migration in copper. Several symmetrical tilt GBs have been studied using the embedded-atom method and molecular dynamics. The GBs are observed to spontaneously migrate in a random manner. This spontaneous GB motion is always accompanied by relative translations of the grains parallel to the GB plane. Furthermore, external shear stresses applied parallel to the GB and normal to the tilt axis induce GB migration. Strong coupling is observed between the normal GB velocity vn and the grain translation rate v||. The mechanism of GB motion is established to be local lattice rotation within the GB core that does not involve any GB diffusion or sliding. The coupling constant between vn and v|| predicted within a simple geometric model accurately matches the molecular dynamics observations.

Download Full-text

Design and evaluation of a high performance file system for message passing parallel computers

[1991] Proceedings. The Fifth International Parallel Processing Symposium ◽

10.1109/ipps.1991.153835 ◽

2002 ◽

Cited By ~ 1

Author(s):

U. Nagaraj ◽

U.S. Shukla ◽

A. Paulraj

Keyword(s):

Message Passing ◽

High Performance ◽

File System ◽

Parallel Computers

Download Full-text