Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs

Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.

Download Full-text

GPU-based Element-by-Element Strategies for Accelerating Topology Optimization of 3D Continuum Structures Using Unstructured All-Hexahedral Mesh

Journal of Computing and Information Science in Engineering ◽

10.1115/1.4052892 ◽

2021 ◽

pp. 1-17

Author(s):

Shashi Kant Ratnakar ◽

Subhajit Sanfui ◽

Deepak Sharma

Keyword(s):

Finite Element ◽

Topology Optimization ◽

Sparse Matrix ◽

Matrix Multiplication ◽

Computational Cost ◽

Structural Topology ◽

Hexahedral Mesh ◽

Element Analysis ◽

The Matrix ◽

Efficient Memory

Abstract Topology optimization has been successful in generating optimal topologies of various structures arising in real-world applications. Since these applications can have complex and large domains, topology optimization suffers from a high computational cost because of the use of unstructured meshes for discretization of these domains and their finite element analysis (FEA). This paper addresses this challenge by developing three GPU-based element-by-element strategies targeting unstructured all-hexahedral mesh for the matrix-free precondition conjugate gradient (PCG) finite element solver. These strategies mainly perform sparse matrix multiplication (SpMV) arising with the FEA solver by allocating more compute threads of GPU per element. Moreover, the strategies are developed to use shared memory of GPU for efficient memory transactions. The proposed strategies are tested with solid isotropic material with penalization (SIMP) method on four examples of 3D structural topology optimization. Results demonstrate that the proposed strategies achieve speedup up to 8.2× over the standard GPU-based SpMV strategies from the literature.

Download Full-text

Implicit method for the time marching analysis of flutter

The Aeronautical Journal ◽

10.1017/s0001924000025446 ◽

2001 ◽

Vol 105 (1046) ◽

pp. 199-214 ◽

Cited By ~ 26

Author(s):

G. S. L. Goura ◽

K. J. Badcock ◽

M. A. Woodgate ◽

B. E. Richards

Keyword(s):

Krylov Subspace ◽

Structural Model ◽

Computational Cost ◽

Simulation Method ◽

Standard Test ◽

Implicit Method ◽

Test Case ◽

Sequencing Errors ◽

Transfinite Interpolation ◽

Time Marching

Abstract This paper evaluates a time marching simulation method for flutter which is based on a solution of the Euler equations and a linear modal structural model. Jameson’s pseudo time method is used for the time stepping, allowing sequencing errors to be avoided without incurring additional computational cost. Transfinite interpolation of displacements is used for grid regeneration and a constant volume transformation for inter-grid interpolation. The flow pseudo steady state is calculated using an unfactored implicit method which features a Krylov subspace solution of an approximately linearised system. The spatial discretisation is made using Osher’s approximate Riemann solver with MUSCL interpolation. The method is evaluated against available results for the AGARD 445.6 wing. This wing, which is made of laminated mahogany, was tested at NASA Langley in the 1960s and has been the standard test case for simulation methods ever since. The structural model in the current work was built in NASTRAN using homogeneous plate elements. The comparisons show good agreement for the prediction of flutter boundaries. The solution method allows larger time steps to be taken than other methods.

Download Full-text

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

Matrix Vector

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

Comparison of Various Discretization Schemes for Simulation of Large Field Case Reservoirs Using Unstructured Grids

10.2118/203949-ms ◽

2021 ◽

Author(s):

Samier Pierre ◽

Raguenel Margaux ◽

Darche Gilles

Keyword(s):

Large Scale ◽

Unstructured Grids ◽

Computational Cost ◽

Large Field ◽

Real Field ◽

Test Case ◽

Generation Task ◽

Field Case ◽

Flux Approximation ◽

Discretization Schemes

Abstract Solving the equations governing multiphase flow in geological formations involves the generation of a mesh that faithfully represents the structure of the porous medium. This challenging mesh generation task can be greatly simplified by the use of unstructured (tetrahedral) grids that conform to the complex geometric features present in the subsurface. However, running a million-cell simulation problem using an unstructured grid on a real, faulted field case remains a challenge for two main reasons. First, the workflow typically used to construct and run the simulation problems has been developed for structured grids and needs to be adapted to the unstructured case. Second, the use of unstructured grids that do not satisfy the K-orthogonality property may require advanced numerical schemes that preserve the accuracy of the results and reduce potential grid orientation effects. These two challenges are at the center of the present paper. We describe in detail the steps of our workflow to prepare and run a large-scale unstructured simulation of a real field case with faults. We perform the simulation using four different discretization schemes, including the cell-centered Two-Point and Multi-Point Flux Approximation (respectively, TPFA and MPFA) schemes, the cell- and vertex-centered Vertex Approximate Gradient (VAG) scheme, and the cell- and face-centered hybrid Mimetic Finite Difference (MFD) scheme. We compare the results in terms of accuracy, robustness, and computational cost to determine which scheme offers the best compromise for the test case considered here.

Download Full-text

Comparative Analysis of Methodologies for Uncertainty Propagation and Quantification

Volume 2C: Turbomachinery ◽

10.1115/gt2017-63238 ◽

2017 ◽

Cited By ~ 2

Author(s):

Alessandra Cuneo ◽

Alberto Traverso ◽

Shahrokh Shahpar

Keyword(s):

Engineering Design ◽

Polynomial Chaos ◽

Epistemic Uncertainty ◽

Uncertainty Propagation ◽

Computational Cost ◽

Computational Effort ◽

Optimization Under Uncertainty ◽

Design Parameters ◽

Test Case ◽

Aleatory And Epistemic Uncertainty

In engineering design, uncertainty is inevitable and can cause a significant deviation in the performance of a system. Uncertainty in input parameters can be categorized into two groups: aleatory and epistemic uncertainty. The work presented here is focused on aleatory uncertainty, which can cause natural, unpredictable and uncontrollable variations in performance of the system under study. Such uncertainty can be quantified using statistical methods, but the main obstacle is often the computational cost, because the representative model is typically highly non-linear and complex. Therefore, it is necessary to have a robust tool that can perform the uncertainty propagation with as few evaluations as possible. In the last few years, different methodologies for uncertainty propagation and quantification have been proposed. The focus of this study is to evaluate four different methods to demonstrate strengths and weaknesses of each approach. The first method considered is Monte Carlo simulation, a sampling method that can give high accuracy but needs a relatively large computational effort. The second method is Polynomial Chaos, an approximated method where the probabilistic parameters of the response function are modelled with orthogonal polynomials. The third method considered is Mid-range Approximation Method. This approach is based on the assembly of multiple meta-models into one model to perform optimization under uncertainty. The fourth method is the application of the first two methods not directly to the model but to a response surface representing the model of the simulation, to decrease computational cost. All these methods have been applied to a set of analytical test functions and engineering test cases. Relevant aspects of the engineering design and analysis such as high number of stochastic variables and optimised design problem with and without stochastic design parameters were assessed. Polynomial Chaos emerges as the most promising methodology, and was then applied to a turbomachinery test case based on a thermal analysis of a high-pressure turbine disk.

Download Full-text

Fast and Robust Orientation of Cryo-Electron Microscopy Images

Computational and Mathematical Biophysics ◽

10.1515/mlbmb-2015-0010 ◽

2015 ◽

Vol 3 (1) ◽

Author(s):

Guoliang Xu ◽

Xia Wang ◽

Ming Li ◽

Zhucui Jing

Keyword(s):

Sparse Matrix ◽

Three Dimensional ◽

Image Plane ◽

Real Space ◽

Computation Method ◽

Common Line ◽

The Matrix ◽

Common Lines ◽

The Common ◽

Dimensional Object

AbstractWe present an efficient and reliable algorithm for determining the orientations of noisy images obtained fromprojections of a three-dimensional object. Based on the linear relationship among the common line vectors in one image plane, we construct a sparse matrix, and show that the coordinates of the common line vectors are the eigenvectors of the matrix with respect to the eigenvalue 1. The projection directions and in-plane rotation angles can be determined fromthese coordinates. A robust computation method of common lines in the real space using aweighted cross-correlation function is proposed to increase the robustness of the algorithm against the noise. A small number of good leading images, which have the maximal dissimilarity, are used to increase the reliability of orientations and improve the efficiency for determining the orientations of all the images. Numerical experiments show that the proposed algorithm is effective and efficient.

Download Full-text

Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid

10.2172/1504845 ◽

2015 ◽

Cited By ~ 1

Author(s):

Grey Malone Ballard ◽

Jonathan Joseph Hu ◽

Christopher Siefert

Keyword(s):

Sparse Matrix ◽

Matrix Multiplication ◽

Algebraic Multigrid ◽

Communication Costs

Download Full-text

Subsurface reflectivity estimation from imaging of primaries and multiples using amplitude-normalized separated wavefields

Geophysics ◽

10.1190/geo2015-0385.1 ◽

2016 ◽

Vol 81 (3) ◽

pp. S101-S117 ◽

Cited By ~ 2

Author(s):

Alba Ordoñez ◽

Walter Söllner ◽

Tilman Klüver ◽

Leiv J. Gelius

Keyword(s):

Integral Equation ◽

Fredholm Integral Equation ◽

Computational Cost ◽

Image Point ◽

Point Location ◽

Multiple Reflections ◽

Frequency Space ◽

Propagation Effects ◽

The Matrix ◽

Spatial Domains

Several studies have shown the benefits of including multiple reflections together with primaries in the structural imaging of subsurface reflectors. However, to characterize the reflector properties, there is a need to compensate for propagation effects due to multiple scattering and to properly combine the information from primaries and all orders of multiples. From this perspective and based on the wave equation and Rayleigh’s reciprocity theorem, recent works have suggested computing the subsurface image from the Green’s function reflection response (or reflectivity) by inverting a Fredholm integral equation in the frequency-space domain. By following Claerbout’s imaging principle and assuming locally reacting media, the integral equation may be reduced to a trace-by-trace deconvolution imaging condition. For a complex overburden and considering that the structure of the subsurface is angle-dependent, this trace-by-trace deconvolution does not properly solve the Fredholm integral equation. We have inverted for the subsurface reflectivity by solving the matrix version of the Fredholm integral equation at every subsurface level, based on a multidimensional deconvolution of the receiver wavefields with the source wavefields. The total upgoing pressure and the total filtered downgoing vertical velocity were used as receiver and source wavefields, respectively. By selecting appropriate subsets of the inverted reflectivity matrix and by performing an inverse Fourier transform over the frequencies, the process allowed us to obtain wavefields corresponding to virtual sources and receivers located in the subsurface, at a given level. The method has been applied on two synthetic examples showing that the computed reflectivity wavefields are free of propagation effects from the overburden and thus are suited to extract information of the image point location in the angular and spatial domains. To get the computational cost down, our approach is target-oriented; i.e., the reflectivity may only be computed in the area of most interest.

Download Full-text

A Computationally Efficient Rayleigh–Ritz Model for Heterogeneous Oceanic Waveguides Using Fourier Series of Sound Speed Profile

Journal of Theoretical and Computational Acoustics ◽

10.1142/s2591728521500158 ◽

2021 ◽

pp. 2150015

Author(s):

A. D. Chowdhury ◽

S. K. Bhattacharyya ◽

C. P. Vendhan

Keyword(s):

Fourier Series ◽

Sound Speed ◽

Transmission Loss ◽

Ritz Method ◽

Computational Cost ◽

Least Square ◽

Speed Profile ◽

Matrix Elements ◽

Sound Speed Profile ◽

The Matrix

The normal mode method is widely used in ocean acoustic propagation. Usually, finite difference and finite element methods are used in its solution. Recently, a method has been proposed for heterogeneous layered waveguides where the depth eigenproblem is solved using the classical Rayleigh–Ritz approximation. The method has high accuracy for low to high frequency problems. However, the matrices that appear in the eigenvalue problem for radial wavenumbers require numerical integration of the matrix elements since the sound speed and density profiles are numerically defined. In this paper, a technique is proposed to reduce the computational cost of the Rayleigh–Ritz method by expanding the sound speed profile in a Fourier series using nonlinear least square fit so that the integrals of the matrix elements can be computed in closed form. This technique is tested in a variety of problems and found to be sufficiently accurate in obtaining the radial wavenumbers as well as the transmission loss in a waveguide. The computational savings obtained by this approach is remarkable, the improvements being one or two orders of magnitude.

Download Full-text

Towards Multidisciplinary Turbine Blade Tolerance Design Assessment Using Adjoint Methods

Volume 2D: Turbomachinery ◽

10.1115/gt2020-14501 ◽

2020 ◽

Cited By ~ 1

Author(s):

Alexander Liefke ◽

Peter Jaksch ◽

Sebastian Schmitz ◽

Vincent Marciniak ◽

Uwe Janoske ◽

...

Keyword(s):

Assessment Tool ◽

Low Cycle Fatigue ◽

Turbine Blades ◽

Computational Cost ◽

Life Time ◽

Test Case ◽

Evaluation Tool ◽

Design Assessment ◽

Tool Chain ◽

Relative Errors

Abstract This paper shows how to use discrete CFD and FEM adjoint surface sensitivities to derive objective-based tolerances for turbine blades, instead of relying on geometric tolerances. For this purpose a multidisciplinary adjoint evaluation tool chain is introduced to quantify the effect of real manufacturing imperfections on aerodynamic efficiency and probabilistic low cycle fatigue life time. Before the adjoint method is applied, a numerical validation of the CFD and FEM adjoint gradients is performed using 102 heavy duty turbine vane scans. The results show that the relative error for adjoint CFD gradients is below 0.5%, while the FEM life time gradient relative errors are below 5%. The adjoint assessment tool chain further reduces the computational cost by around 85% for the investigated test case compared to non-linear methods. Through the application of the presented tool chain, the definition of specified objective-based tolerances becomes available as a design assessment tool and allows to improve overall turbine efficiency and the accuracy of life time prediction.

Download Full-text