parallel scalability
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 8)

H-INDEX

10
(FIVE YEARS 1)

We propose and investigate a mesh deformation technique for PDE constrained shape optimization. Introducing a gradient penalization to the inner product for linearized shape spaces, mesh degeneration can be prevented within the optimization iteration allowing for the scalability of employed solvers. We illustrate the approach by a shape optimization for cellular composites with respect to linear elastic energy under tension. The influence of the gradient penalization is evaluated and the parallel scalability of the approach demonstrated employing a geometric multigrid solver on hierarchically distributed meshes.


2020 ◽  
Vol 14 (3) ◽  
pp. 351-363
Author(s):  
Yue Wang ◽  
Ruiqi Xu ◽  
Zonghao Feng ◽  
Yulin Che ◽  
Lei Chen ◽  
...  

Measuring similarities among different nodes is important in graph analysis. SimRank is one of the most popular similarity measures. Given a graph G ( V , E ) and a source node u , a single-source Sim-Rank query returns the similarities between u and each node v ∈ V. This type of query is often used in link prediction, personalized recommendation and spam detection. While dealing with a large graph is beyond the ability of a single machine due to its limited memory and computational power, it is necessary to process single-source SimRank queries in a distributed environment, where the graph is partitioned and distributed across multiple machines. However, most current solutions are based on shared-memory model, where the whole graph is loaded into a shared memory and all processors can access the graph randomly. It is difficult to deploy such algorithms on shared-nothing model. In this paper, we present DISK, a distributed framework for processing single-source SimRank queries. DISK follows the linearized formulation of SimRank, and consists of offline and online phases. In the offline phase, a tree-based method is used to estimate the diagonal correction matrix of SimRank accurately, and in the online phase, single-source similarities are computed iteratively. Under this framework, we propose different optimization techniques to boost the indexing and queries. DISK guarantees both accuracy and parallel scalability, which distinguishes itself from existing solutions. Its accuracy, efficiency, parallel scalability and scalability are also verified by extensive experimental studies. The experiments show that DISK scales up to graphs of billions of nodes and edges, and answers online queries within seconds, while ensuring the accuracy bounds.


Author(s):  
Amanda Bienz ◽  
William D Gropp ◽  
Luke N Olson

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.


2020 ◽  
Author(s):  
Jemma Shipton ◽  
Colin Cotter ◽  
Tom Bendall ◽  
Thomas Gibson ◽  
Lawrence Mitchell ◽  
...  

<div> <div> <div> <p>I will describe Gusto, a dynamical core toolkit built on top of the Fire- drake finite element library; present recent results from a range of test cases and outline our plans for future code development.</p> <p>Gusto uses compatible finite element methods, a form of mixed finite element methods (meaning that different finite element spaces are used for different fields) that allow the exact representation of the standard vector calculus identities div-curl=0 and curl-grad=0. The popularity of these methods for numerical weather prediction is due to the flexibility to run on non-orthogonal grid, thus avoiding the communication bottleneck at the poles, while retaining the necessary convergence and wave propagation prop- erties required for accuracy.</p> <p>Although the flexibility of the compatible finite element spatial discreti- sation improves the parallel scalability of the model it does not solve the parallel scalability problem inherent in spatial domain decomposition: we need to find a way to perform parallel calculations in the time domain. Ex- ponential integrators, approximated by a near optimal rational expansion, offer a way to take large timesteps and form the basis for parallel timestep- ping schemes based on wave averaging. I will describe the progress we have made towards implementing these schemes in Gusto.</p> </div> </div> </div>


2019 ◽  
Vol 12 (9) ◽  
pp. 3991-4012 ◽  
Author(s):  
Nikolay V. Koldunov ◽  
Vadym Aizinger ◽  
Natalja Rakowsky ◽  
Patrick Scholz ◽  
Dmitry Sidorenko ◽  
...  

Abstract. A study of the scalability of the Finite-volumE Sea ice–Ocean circulation Model, Version 2.0 (FESOM2), the first mature global model of its kind formulated on unstructured meshes, is presented. This study includes an analysis of the main computational kernels with a special focus on bottlenecks in parallel scalability. Several model enhancements improving this scalability for large numbers of processes are described and tested. Model grids at different resolutions are used on four high-performance computing (HPC) systems with differing computational and communication hardware to demonstrate the model's scalability and throughput. Furthermore, strategies for improvements in parallel performance are presented and assessed. We show that, in terms of throughput, FESOM2 is on a par with state-of-the-art structured ocean models and, in a realistic eddy-resolving configuration (1/10∘ resolution), can achieve about 16 years per day on 14 000 cores. This suggests that unstructured-mesh models are becoming very competitive tools in high-resolution climate modeling. We show that the main bottlenecks of FESOM2 parallel scalability are the two-dimensional components of the model, namely the computations of the external (barotropic) mode and the sea-ice model. It is argued that these bottlenecks are shared with other general ocean circulation models.


2019 ◽  
Author(s):  
Nikolay V. Koldunov ◽  
Vadym Aizinger ◽  
Natalja Rakowsky ◽  
Patrick Scholz ◽  
Dmitry Sidorenko ◽  
...  

Abstract. A study of the scalability of the Finite-volumE Sea ice-Ocean circulation Model, Version 2.0 (FESOM2), the first mature global model of its kind formulated on unstructured meshes, is presented. This study includes an analysis of main computational kernels with a special focus on bottlenecks in parallel scalability. Several model enhancements, improving this scalability for large numbers of processes, are described and tested. Model grids at different resolutions are used on four HPC systems with differing computation and communication hardware to demonstrate model's scalability and throughput. Furthermore, strategies for improvements in parallel performance are presented and assessed. We show that in terms of throughput FESOM2.0 is on par with the state-of-the-art structured ocean models and in realistic eddy resolving configuration (1/10° resolution) can produce about 16 years per day on 14 000 cores. This suggests that unstructured-mesh models are becoming extremely competitive tools in high-resolution climate modelling. It is shown that main bottlenecks of FESOM parallel scalability are the two-dimensional components of the model, namely the computations of external (barotropic) mode and the sea-ice model. It is argued that these bottlenecks are shared with other general ocean circulation models.


Author(s):  
Stefan Lemvig Glimberg ◽  
Allan Peter Engsig-Karup ◽  
Luke N Olson

The focus of this article is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on graphics processing unit (GPU)-accelerated supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oak Ridge Titan supercomputer, which is equipped with a large number of many-core CPU-GPU nodes. The high-order finite difference scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method designed for high-throughput processing and massive parallelism. In this work, the FNPF discretization is based on a multi-block discretization that allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries to allow for an accurate representation of geometric features such as surface-piercing bottom-mounted structures—for example, mono-pile foundations as demonstrated. Unprecedented performance and scalability results are presented for a system of equations that is historically known as being too expensive to solve in practical applications. A novel feature of the potential flow model is demonstrated, being that a modest number of multigrid restrictions is sufficient for fast convergence, improving overall parallel scalability as the coarse grid problem diminishes. In the numerical benchmarks presented, we demonstrate using 8192 modern Nvidia GPUs enabling large-scale and high-resolution nonlinear marine hydrodynamics applications.


Author(s):  
Anderson B. N. da Silva ◽  
Daniel A. M. Cunha ◽  
Vitor R. G. Silva ◽  
Alex F. de A. Furtunato ◽  
Samuel Xavier-de-Souza
Keyword(s):  

Author(s):  
Carlo Fiorina ◽  
Andreas Pautz ◽  
Konstantin Mikityuk

The FRED code is an in-house tool developed at the Paul Scherrer Institut for the so-called 1.5-D nuclear fuel performance analysis. In order to extend its field of application, this code has been re-implemented as a class of the OpenFOAM numerical library. A first objective of this re-implementation is to provide this tool with the parallel scalability necessary for full-core analyses. In addition, the use of OpenFOAM as base library allows for a straightforward interface with the standard Open-FOAM CFD solvers, as well as with the several OpenFOAM-based applications developed by the nuclear engineering community. In this paper, the newly developed FRED-based Open-FOAM class has been integrated in the GeN-Foam multi-physics code mainly developed at the École polytechnique fédérale de Lausanne and at the Paul Scherrer Institut. The paper presents the details of both the re-implementation of the FRED code and of its integration in GeN-Foam. The performances and parallel scalability of the tool are preliminary investigated and an example of application is provided by performing a full-core multi-physics analysis of the European Sodium Fast Reactor.


Sign in / Sign up

Export Citation Format

Share Document