distributed memory
Recently Published Documents


TOTAL DOCUMENTS

1955
(FIVE YEARS 140)

H-INDEX

49
(FIVE YEARS 4)

2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Jianhua Li ◽  
Guanlong Liu ◽  
Zhiyuan Zhen ◽  
Zihao Shen ◽  
Shiliang Li ◽  
...  

Molecular docking aims to predict possible drug candidates for many diseases, and it is computationally intensive. Particularly, in simulating the ligand-receptor binding process, the binding pocket of the receptor is divided into subcubes, and when the ligand is docked into all cubes, there are many molecular docking tasks, which are extremely time-consuming. In this study, we propose a heterogeneous parallel scheme of molecular docking for the binding process of ligand to receptor to accelerate simulating. The parallel scheme includes two layers of parallelism, a coarse-grained layer of parallelism implemented in the message-passing interface (MPI) and a fine-grained layer of parallelism focused on the graphics processing unit (GPU). At the coarse-grain layer of parallelism, a docking task inside one lattice is assigned to one unique MPI process, and a grouped master-slave mode is used to allocate and schedule the tasks. Meanwhile, at the fine-gained layer of parallelism, GPU accelerators undertake the computationally intensive computing of scoring functions and related conformation spatial transformations in a single docking task. The results of the experiments for the ligand-receptor binding process show that on a multicore server with GPUs the parallel program has achieved a speedup ratio as high as 45 times in flexible docking and as high as 54.5 times in semiflexible docking, and on a distributed memory system, the docking time for flexible docking and that for semiflexible docking gradually decrease as the number of nodes used in the parallel program gradually increases. The scalability of the parallel program is also verified in multiple nodes on a distributed memory system and is approximately linear.


2022 ◽  
pp. 89-157
Author(s):  
Peter S. Pacheco ◽  
Matthew Malensek
Keyword(s):  

Author(s):  
Justin Grandinetti ◽  
Taylor Abrams-Rollinson

Introduced in July 2016, Pokémon GO is widely considered the killer app for contemporary augmented reality. Popular attention to the game has waned in recent years, but Pokémon GO remains enormously successful in terms of both player base and revenue generation. Whether individuals experienced the game for a short time or remain dedicated hardcore players, Pokémon GO exists as memories of time and place, imbuing familiar sites and routes with new meaning and temporal connection. Attending to these complex interrelationships of place, space, mobility, humans, technologies, infrastructures, environments, and memory, we situate Pokémon GO as what Hayles (2016) calls a cognitive assemblage—sociotechnical systems of interconnectivity in which cognition is an exteriorized process occurring across multiple levels, sites, and boundaries. In turn, we conceptualize cognition (and specifically memory) not as confined within a delimited hominid body, but instead operating through contextual relations, at multiple sites, and in a constant state of becoming. By reflecting on our own experiences as part of the distributed memory of Pokémon GO, we situate memory as momentary convergence of signals made possible by infrastructures, inscribed on servers and silicon, and made part of algorithmic suggestion and learning AI. Additionally, our own memories and experiences serve to highlight the experiential complexity of cognitive assemblages in relation to structures of feeling, as well as new temporal and spatial relations.


2021 ◽  
Author(s):  
Piotr Dziekan ◽  
Piotr Zmijewski

Abstract. A numerical cloud model with Lagrangian particles coupled to an Eulerian flow is adapted for distributed memory systems. Eulerian and Lagrangian calculations can be done in parallell on CPUs and GPUs, respectively. Scaling efficiency and the amount of parallelization of CPU and GPU calculations both exceed 50 % for up to 40 nodes. A sophisticated Lagrangian microphysics model slows down simulation by only 50 % compared to a simplistic bulk microphysics model, thanks to the use of GPUs. Overhead of communications between cluster nodes is mostly related to the pressure solver. Presented method of adaptation for computing clusters can be used in any numerical model with Lagrangian particles coupled to an Eulerian fluid flow.


2021 ◽  
Author(s):  
Ghada Dessouky ◽  
Mihailo Isakov ◽  
Michel A. Kinsy ◽  
Pouya Mahmoody ◽  
Miguel Mark ◽  
...  
Keyword(s):  

Algorithms ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 342
Author(s):  
Alessandro Varsi ◽  
Simon Maskell ◽  
Paul G. Spirakis

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.


Fluids ◽  
2021 ◽  
Vol 6 (11) ◽  
pp. 395
Author(s):  
Hui Liu ◽  
Zhangxin Chen ◽  
Xiaohu Guo ◽  
Lihua Shen

Reservoir simulation is to solve a set of fluid flow equations through porous media, which are partial differential equations from the petroleum engineering industry and described by Darcy’s law. This paper introduces the model, numerical methods, algorithms and parallel implementation of a thermal reservoir simulator that is designed for numerical simulations of a thermal reservoir with multiple components in three-dimensional domain using distributed-memory parallel computers. Its full mathematical model is introduced with correlations for important properties and well modeling. Efficient numerical methods (discretization scheme, matrix decoupling methods, and preconditioners), parallel computing technologies, and implementation details are presented. The numerical methods applied in this paper are suitable for large-scale thermal reservoir simulations with dozens of thousands of CPU cores (MPI processes), which are efficient and scalable. The simulator is designed for giant models with billions or even trillions of grid blocks using hundreds of thousands of CPUs, which is our main focus. The validation part is compared with CMG STARS, which is one of the most popular and mature commercial thermal simulators. Numerical experiments show that our results match commercial simulators, which confirms the correctness of our methods and implementations. SAGD simulation with 7406 well pairs is also presented to study the effectiveness of our numerical methods. Scalability testings demonstrate that our simulator can handle giant models with billions of grid blocks using 100,800 CPU cores and the simulator has good scalability.


2021 ◽  
Author(s):  
Gaston Irrmann ◽  
Sébastien Masson ◽  
Éric Maisonnave ◽  
David Guibert ◽  
Erwan Raffin

Abstract. Communications in distributed memory supercomputers are still limiting scalability of geophysical models. Consid-ering the recent trends of the semiconductor industry, we think this problem is here to stay. We present the optimisations thathave been implemented in the actual 4.0 reference version of the ocean model NEMO 4.0 to improve its scalability. Thanksto the collaboration of oceanographers and HPC experts, we identified and removed the unnecessary communications in twobottleneck routines, the computation of free surface pressure gradient and the forcing in the straights or unstructured open5boundaries. Since a wrong parallel decomposition choice could undermine computing performance, we impose its automaticdefinition in all cases, including when subdomains containing land points only are excluded from the decomposition. For asmaller audience of developers and vendors, we propose a new benchmark configuration, easy to use while offering the fullcomplexity of operational versions.


Sign in / Sign up

Export Citation Format

Share Document