parallel scalability Latest Research Papers

2021 ◽

Keyword(s):

Shape Optimization ◽

Optimization Algorithm ◽

Elastic Energy ◽

Inner Product ◽

Mesh Deformation ◽

Multigrid Solver ◽

Parallel Scalability ◽

Shape Spaces ◽

Linear Elastic ◽

Geometric Multigrid

We propose and investigate a mesh deformation technique for PDE constrained shape optimization. Introducing a gradient penalization to the inner product for linearized shape spaces, mesh degeneration can be prevented within the optimization iteration allowing for the scalability of employed solvers. We illustrate the approach by a shape optimization for cellular composites with respect to linear elastic energy under tension. The influence of the gradient penalization is evaluated and the parallel scalability of the approach demonstrated employing a geometric multigrid solver on hierarchically distributed meshes.

Download Full-text

Disk

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430925 ◽

2020 ◽

Vol 14 (3) ◽

pp. 351-363

Author(s):

Yue Wang ◽

Ruiqi Xu ◽

Zonghao Feng ◽

Yulin Che ◽

Lei Chen ◽

...

Keyword(s):

Shared Memory ◽

Experimental Studies ◽

Similarity Measures ◽

Optimization Techniques ◽

Personalized Recommendation ◽

Single Source ◽

Distributed Environment ◽

Parallel Scalability ◽

Distributed Framework ◽

Online Queries

Measuring similarities among different nodes is important in graph analysis. SimRank is one of the most popular similarity measures. Given a graph G ( V , E ) and a source node u , a single-source Sim-Rank query returns the similarities between u and each node v ∈ V. This type of query is often used in link prediction, personalized recommendation and spam detection. While dealing with a large graph is beyond the ability of a single machine due to its limited memory and computational power, it is necessary to process single-source SimRank queries in a distributed environment, where the graph is partitioned and distributed across multiple machines. However, most current solutions are based on shared-memory model, where the whole graph is loaded into a shared memory and all processors can access the graph randomly. It is difficult to deploy such algorithms on shared-nothing model. In this paper, we present DISK, a distributed framework for processing single-source SimRank queries. DISK follows the linearized formulation of SimRank, and consists of offline and online phases. In the offline phase, a tree-based method is used to estimate the diagonal correction matrix of SimRank accurately, and in the online phase, single-source similarities are computed iteratively. Under this framework, we propose different optimization techniques to boost the indexing and queries. DISK guarantees both accuracy and parallel scalability, which distinguishes itself from existing solutions. Its accuracy, efficiency, parallel scalability and scalability are also verified by extensive experimental studies. The experiments show that DISK scales up to graphs of billions of nodes and edges, and answers online queries within seconds, while ensuring the accuracy bounds.

Download Full-text

Reducing communication in algebraic multigrid with multi-step node aware communication

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020925535 ◽

2020 ◽

Vol 34 (5) ◽

pp. 547-561

Author(s):

Amanda Bienz ◽

William D Gropp ◽

Luke N Olson

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Algebraic Multigrid ◽

Sparse Linear Systems ◽

Parallel Scalability ◽

Strong Scaling ◽

The Cost ◽

Communication Schedule ◽

Inter Process Communication

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text

Compatible finite element methods and parallel-in-time schemes for numerical weather prediction.

10.5194/egusphere-egu2020-22676 ◽

2020 ◽

Author(s):

Jemma Shipton ◽

Colin Cotter ◽

Tom Bendall ◽

Thomas Gibson ◽

Lawrence Mitchell ◽

...

Keyword(s):

Finite Element ◽

Numerical Weather Prediction ◽

Finite Element Methods ◽

Mixed Finite Element ◽

Weather Prediction ◽

Parallel Scalability ◽

Numerical Weather ◽

Vector Calculus ◽

Standard Vector ◽

The Time Domain

<div> <div> <div> <p>I will describe Gusto, a dynamical core toolkit built on top of the Fire- drake finite element library; present recent results from a range of test cases and outline our plans for future code development.</p> <p>Gusto uses compatible finite element methods, a form of mixed finite element methods (meaning that different finite element spaces are used for different fields) that allow the exact representation of the standard vector calculus identities div-curl=0 and curl-grad=0. The popularity of these methods for numerical weather prediction is due to the flexibility to run on non-orthogonal grid, thus avoiding the communication bottleneck at the poles, while retaining the necessary convergence and wave propagation prop- erties required for accuracy.</p> <p>Although the flexibility of the compatible finite element spatial discreti- sation improves the parallel scalability of the model it does not solve the parallel scalability problem inherent in spatial domain decomposition: we need to find a way to perform parallel calculations in the time domain. Ex- ponential integrators, approximated by a near optimal rational expansion, offer a way to take large timesteps and form the basis for parallel timestep- ping schemes based on wave averaging. I will describe the progress we have made towards implementing these schemes in Gusto.</p> </div> </div> </div>

Download Full-text

Scalability and some optimization of the Finite-volumE Sea ice–Ocean Model, Version 2.0 (FESOM2)

Geoscientific Model Development ◽

10.5194/gmd-12-3991-2019 ◽

2019 ◽

Vol 12 (9) ◽

pp. 3991-4012 ◽

Cited By ~ 10

Author(s):

Nikolay V. Koldunov ◽

Vadym Aizinger ◽

Natalja Rakowsky ◽

Patrick Scholz ◽

Dmitry Sidorenko ◽

...

Keyword(s):

Sea Ice ◽

Finite Volume ◽

Ocean Circulation ◽

High Performance ◽

Circulation Model ◽

Ocean Model ◽

Special Focus ◽

Parallel Scalability ◽

Model Version ◽

Version 2.0

Abstract. A study of the scalability of the Finite-volumE Sea ice–Ocean circulation Model, Version 2.0 (FESOM2), the first mature global model of its kind formulated on unstructured meshes, is presented. This study includes an analysis of the main computational kernels with a special focus on bottlenecks in parallel scalability. Several model enhancements improving this scalability for large numbers of processes are described and tested. Model grids at different resolutions are used on four high-performance computing (HPC) systems with differing computational and communication hardware to demonstrate the model's scalability and throughput. Furthermore, strategies for improvements in parallel performance are presented and assessed. We show that, in terms of throughput, FESOM2 is on a par with state-of-the-art structured ocean models and, in a realistic eddy-resolving configuration (1/10∘ resolution), can achieve about 16 years per day on 14 000 cores. This suggests that unstructured-mesh models are becoming very competitive tools in high-resolution climate modeling. We show that the main bottlenecks of FESOM2 parallel scalability are the two-dimensional components of the model, namely the computations of the external (barotropic) mode and the sea-ice model. It is argued that these bottlenecks are shared with other general ocean circulation models.

Download Full-text

Scalability and some optimization of the Finite-volumE Sea ice-Ocean Model, Version 2.0 (FESOM2)

10.5194/gmd-2018-334 ◽

2019 ◽

Author(s):

Nikolay V. Koldunov ◽

Vadym Aizinger ◽

Natalja Rakowsky ◽

Patrick Scholz ◽

Dmitry Sidorenko ◽

...

Keyword(s):

Sea Ice ◽

Finite Volume ◽

Ocean Circulation ◽

Circulation Model ◽

Ocean Model ◽

Special Focus ◽

Climate Modelling ◽

Parallel Scalability ◽

Model Version ◽

Version 2.0

Abstract. A study of the scalability of the Finite-volumE Sea ice-Ocean circulation Model, Version 2.0 (FESOM2), the first mature global model of its kind formulated on unstructured meshes, is presented. This study includes an analysis of main computational kernels with a special focus on bottlenecks in parallel scalability. Several model enhancements, improving this scalability for large numbers of processes, are described and tested. Model grids at different resolutions are used on four HPC systems with differing computation and communication hardware to demonstrate model's scalability and throughput. Furthermore, strategies for improvements in parallel performance are presented and assessed. We show that in terms of throughput FESOM2.0 is on par with the state-of-the-art structured ocean models and in realistic eddy resolving configuration (1/10° resolution) can produce about 16 years per day on 14 000 cores. This suggests that unstructured-mesh models are becoming extremely competitive tools in high-resolution climate modelling. It is shown that main bottlenecks of FESOM parallel scalability are the two-dimensional components of the model, namely the computations of external (barotropic) mode and the sea-ice model. It is argued that these bottlenecks are shared with other general ocean circulation models.

Download Full-text

A massively scalable distributed multigrid framework for nonlinear marine hydrodynamics

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019826662 ◽

2019 ◽

Vol 33 (5) ◽

pp. 855-868 ◽

Cited By ~ 1

Author(s):

Stefan Lemvig Glimberg ◽

Allan Peter Engsig-Karup ◽

Luke N Olson

Keyword(s):

Potential Flow ◽

Large Scale ◽

Data Locality ◽

High Order ◽

Domain Model ◽

Processing Unit ◽

Parallel Scalability ◽

Marine Hydrodynamics ◽

Practical Applications ◽

Oak Ridge

The focus of this article is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on graphics processing unit (GPU)-accelerated supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oak Ridge Titan supercomputer, which is equipped with a large number of many-core CPU-GPU nodes. The high-order finite difference scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method designed for high-throughput processing and massive parallelism. In this work, the FNPF discretization is based on a multi-block discretization that allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries to allow for an accurate representation of geometric features such as surface-piercing bottom-mounted structures—for example, mono-pile foundations as demonstrated. Unprecedented performance and scalability results are presented for a system of equations that is historically known as being too expensive to solve in practical applications. A novel feature of the potential flow model is demonstrated, being that a modest number of multigrid restrictions is sufficient for fast convergence, improving overall parallel scalability as the coarse grid problem diminishes. In the numerical benchmarks presented, we demonstrate using 8192 modern Nvidia GPUs enabling large-scale and high-resolution nonlinear marine hydrodynamics applications.

Download Full-text

PaScal Viewer: A Tool for the Visualization of Parallel Scalability Trends

Programming and Performance Visualization Tools - Lecture Notes in Computer Science ◽

10.1007/978-3-030-17872-7_15 ◽

2019 ◽

pp. 250-264

Author(s):

Anderson B. N. da Silva ◽

Daniel A. M. Cunha ◽

Vitor R. G. Silva ◽

Alex F. de A. Furtunato ◽

Samuel Xavier-de-Souza

Keyword(s):

Parallel Scalability

Download Full-text

Creation of an OpenFOAM Fuel Performance Class Based on FRED and Integration Into the GeN-Foam Multi-Physics Code

Volume 3: Nuclear Fuel and Material, Reactor Physics, and Transport Theory ◽

10.1115/icone26-81574 ◽

2018 ◽

Cited By ~ 1

Author(s):

Carlo Fiorina ◽

Andreas Pautz ◽

Konstantin Mikityuk

Keyword(s):

Performance Analysis ◽

Nuclear Fuel ◽

Fast Reactor ◽

Nuclear Engineering ◽

Fuel Performance ◽

Parallel Scalability ◽

Sodium Fast Reactor ◽

Paul Scherrer ◽

École Polytechnique ◽

Full Core

The FRED code is an in-house tool developed at the Paul Scherrer Institut for the so-called 1.5-D nuclear fuel performance analysis. In order to extend its field of application, this code has been re-implemented as a class of the OpenFOAM numerical library. A first objective of this re-implementation is to provide this tool with the parallel scalability necessary for full-core analyses. In addition, the use of OpenFOAM as base library allows for a straightforward interface with the standard Open-FOAM CFD solvers, as well as with the several OpenFOAM-based applications developed by the nuclear engineering community. In this paper, the newly developed FRED-based Open-FOAM class has been integrated in the GeN-Foam multi-physics code mainly developed at the École polytechnique fédérale de Lausanne and at the Paul Scherrer Institut. The paper presents the details of both the re-implementation of the FRED code and of its integration in GeN-Foam. The performances and parallel scalability of the tool are preliminary investigated and an example of application is provided by performing a full-core multi-physics analysis of the European Sodium Fast Reactor.

Download Full-text

Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

Computational Particle Mechanics ◽

10.1007/s40571-018-0185-8 ◽

2018 ◽

Vol 5 (4) ◽

pp. 493-506 ◽

Cited By ~ 1

Author(s):

Khaled Ibrahim Tolba ◽

Guido Morgenthal

Keyword(s):

Particle Method ◽

Bluff Bodies ◽

Parallel Scalability ◽

Vortex Particle Method ◽

Vortex Particle

Download Full-text

parallel scalability
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A shape optimization algorithm for cellular composites

Disk

Reducing communication in algebraic multigrid with multi-step node aware communication

Compatible finite element methods and parallel-in-time schemes for numerical weather prediction.

Scalability and some optimization of the Finite-volumE Sea ice–Ocean Model, Version 2.0 (FESOM2)

Scalability and some optimization of the Finite-volumE Sea ice-Ocean Model, Version 2.0 (FESOM2)

A massively scalable distributed multigrid framework for nonlinear marine hydrodynamics

PaScal Viewer: A Tool for the Visualization of Parallel Scalability Trends

Creation of an OpenFOAM Fuel Performance Class Based on FRED and Integration Into the GeN-Foam Multi-Physics Code

Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

Export Citation Format

parallel scalabilityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A shape optimization algorithm for cellular composites

Disk

Reducing communication in algebraic multigrid with multi-step node aware communication

Compatible finite element methods and parallel-in-time schemes for numerical weather prediction.

Scalability and some optimization of the Finite-volumE Sea ice–Ocean Model, Version 2.0 (FESOM2)

Scalability and some optimization of the Finite-volumE Sea ice-Ocean Model, Version 2.0 (FESOM2)

A massively scalable distributed multigrid framework for nonlinear marine hydrodynamics

PaScal Viewer: A Tool for the Visualization of Parallel Scalability Trends

Creation of an OpenFOAM Fuel Performance Class Based on FRED and Integration Into the GeN-Foam Multi-Physics Code

Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

parallel scalability
Recently Published Documents