scholarly journals Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs

2019 ◽  
Vol 29 (01) ◽  
pp. 1950001 ◽  
Author(s):  
Ambra Abdullahi Hassan ◽  
Valeria Cardellini ◽  
Pasqua D’Ambra ◽  
Daniela di Serafino ◽  
Salvatore Filippone

Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.

Author(s):  
Shashi Kant Ratnakar ◽  
Subhajit Sanfui ◽  
Deepak Sharma

Abstract Topology optimization has been successful in generating optimal topologies of various structures arising in real-world applications. Since these applications can have complex and large domains, topology optimization suffers from a high computational cost because of the use of unstructured meshes for discretization of these domains and their finite element analysis (FEA). This paper addresses this challenge by developing three GPU-based element-by-element strategies targeting unstructured all-hexahedral mesh for the matrix-free precondition conjugate gradient (PCG) finite element solver. These strategies mainly perform sparse matrix multiplication (SpMV) arising with the FEA solver by allocating more compute threads of GPU per element. Moreover, the strategies are developed to use shared memory of GPU for efficient memory transactions. The proposed strategies are tested with solid isotropic material with penalization (SIMP) method on four examples of 3D structural topology optimization. Results demonstrate that the proposed strategies achieve speedup up to 8.2× over the standard GPU-based SpMV strategies from the literature.


2001 ◽  
Vol 105 (1046) ◽  
pp. 199-214 ◽  
Author(s):  
G. S. L. Goura ◽  
K. J. Badcock ◽  
M. A. Woodgate ◽  
B. E. Richards

Abstract This paper evaluates a time marching simulation method for flutter which is based on a solution of the Euler equations and a linear modal structural model. Jameson’s pseudo time method is used for the time stepping, allowing sequencing errors to be avoided without incurring additional computational cost. Transfinite interpolation of displacements is used for grid regeneration and a constant volume transformation for inter-grid interpolation. The flow pseudo steady state is calculated using an unfactored implicit method which features a Krylov subspace solution of an approximately linearised system. The spatial discretisation is made using Osher’s approximate Riemann solver with MUSCL interpolation. The method is evaluated against available results for the AGARD 445.6 wing. This wing, which is made of laminated mahogany, was tested at NASA Langley in the 1960s and has been the standard test case for simulation methods ever since. The structural model in the current work was built in NASTRAN using homogeneous plate elements. The comparisons show good agreement for the prediction of flutter boundaries. The solution method allows larger time steps to be taken than other methods.


Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


2021 ◽  
Author(s):  
Samier Pierre ◽  
Raguenel Margaux ◽  
Darche Gilles

Abstract Solving the equations governing multiphase flow in geological formations involves the generation of a mesh that faithfully represents the structure of the porous medium. This challenging mesh generation task can be greatly simplified by the use of unstructured (tetrahedral) grids that conform to the complex geometric features present in the subsurface. However, running a million-cell simulation problem using an unstructured grid on a real, faulted field case remains a challenge for two main reasons. First, the workflow typically used to construct and run the simulation problems has been developed for structured grids and needs to be adapted to the unstructured case. Second, the use of unstructured grids that do not satisfy the K-orthogonality property may require advanced numerical schemes that preserve the accuracy of the results and reduce potential grid orientation effects. These two challenges are at the center of the present paper. We describe in detail the steps of our workflow to prepare and run a large-scale unstructured simulation of a real field case with faults. We perform the simulation using four different discretization schemes, including the cell-centered Two-Point and Multi-Point Flux Approximation (respectively, TPFA and MPFA) schemes, the cell- and vertex-centered Vertex Approximate Gradient (VAG) scheme, and the cell- and face-centered hybrid Mimetic Finite Difference (MFD) scheme. We compare the results in terms of accuracy, robustness, and computational cost to determine which scheme offers the best compromise for the test case considered here.


Author(s):  
Alessandra Cuneo ◽  
Alberto Traverso ◽  
Shahrokh Shahpar

In engineering design, uncertainty is inevitable and can cause a significant deviation in the performance of a system. Uncertainty in input parameters can be categorized into two groups: aleatory and epistemic uncertainty. The work presented here is focused on aleatory uncertainty, which can cause natural, unpredictable and uncontrollable variations in performance of the system under study. Such uncertainty can be quantified using statistical methods, but the main obstacle is often the computational cost, because the representative model is typically highly non-linear and complex. Therefore, it is necessary to have a robust tool that can perform the uncertainty propagation with as few evaluations as possible. In the last few years, different methodologies for uncertainty propagation and quantification have been proposed. The focus of this study is to evaluate four different methods to demonstrate strengths and weaknesses of each approach. The first method considered is Monte Carlo simulation, a sampling method that can give high accuracy but needs a relatively large computational effort. The second method is Polynomial Chaos, an approximated method where the probabilistic parameters of the response function are modelled with orthogonal polynomials. The third method considered is Mid-range Approximation Method. This approach is based on the assembly of multiple meta-models into one model to perform optimization under uncertainty. The fourth method is the application of the first two methods not directly to the model but to a response surface representing the model of the simulation, to decrease computational cost. All these methods have been applied to a set of analytical test functions and engineering test cases. Relevant aspects of the engineering design and analysis such as high number of stochastic variables and optimised design problem with and without stochastic design parameters were assessed. Polynomial Chaos emerges as the most promising methodology, and was then applied to a turbomachinery test case based on a thermal analysis of a high-pressure turbine disk.


2015 ◽  
Vol 3 (1) ◽  
Author(s):  
Guoliang Xu ◽  
Xia Wang ◽  
Ming Li ◽  
Zhucui Jing

AbstractWe present an efficient and reliable algorithm for determining the orientations of noisy images obtained fromprojections of a three-dimensional object. Based on the linear relationship among the common line vectors in one image plane, we construct a sparse matrix, and show that the coordinates of the common line vectors are the eigenvectors of the matrix with respect to the eigenvalue 1. The projection directions and in-plane rotation angles can be determined fromthese coordinates. A robust computation method of common lines in the real space using aweighted cross-correlation function is proposed to increase the robustness of the algorithm against the noise. A small number of good leading images, which have the maximal dissimilarity, are used to increase the reliability of orientations and improve the efficiency for determining the orientations of all the images. Numerical experiments show that the proposed algorithm is effective and efficient.


Geophysics ◽  
2016 ◽  
Vol 81 (3) ◽  
pp. S101-S117 ◽  
Author(s):  
Alba Ordoñez ◽  
Walter Söllner ◽  
Tilman Klüver ◽  
Leiv J. Gelius

Several studies have shown the benefits of including multiple reflections together with primaries in the structural imaging of subsurface reflectors. However, to characterize the reflector properties, there is a need to compensate for propagation effects due to multiple scattering and to properly combine the information from primaries and all orders of multiples. From this perspective and based on the wave equation and Rayleigh’s reciprocity theorem, recent works have suggested computing the subsurface image from the Green’s function reflection response (or reflectivity) by inverting a Fredholm integral equation in the frequency-space domain. By following Claerbout’s imaging principle and assuming locally reacting media, the integral equation may be reduced to a trace-by-trace deconvolution imaging condition. For a complex overburden and considering that the structure of the subsurface is angle-dependent, this trace-by-trace deconvolution does not properly solve the Fredholm integral equation. We have inverted for the subsurface reflectivity by solving the matrix version of the Fredholm integral equation at every subsurface level, based on a multidimensional deconvolution of the receiver wavefields with the source wavefields. The total upgoing pressure and the total filtered downgoing vertical velocity were used as receiver and source wavefields, respectively. By selecting appropriate subsets of the inverted reflectivity matrix and by performing an inverse Fourier transform over the frequencies, the process allowed us to obtain wavefields corresponding to virtual sources and receivers located in the subsurface, at a given level. The method has been applied on two synthetic examples showing that the computed reflectivity wavefields are free of propagation effects from the overburden and thus are suited to extract information of the image point location in the angular and spatial domains. To get the computational cost down, our approach is target-oriented; i.e., the reflectivity may only be computed in the area of most interest.


Author(s):  
A. D. Chowdhury ◽  
S. K. Bhattacharyya ◽  
C. P. Vendhan

The normal mode method is widely used in ocean acoustic propagation. Usually, finite difference and finite element methods are used in its solution. Recently, a method has been proposed for heterogeneous layered waveguides where the depth eigenproblem is solved using the classical Rayleigh–Ritz approximation. The method has high accuracy for low to high frequency problems. However, the matrices that appear in the eigenvalue problem for radial wavenumbers require numerical integration of the matrix elements since the sound speed and density profiles are numerically defined. In this paper, a technique is proposed to reduce the computational cost of the Rayleigh–Ritz method by expanding the sound speed profile in a Fourier series using nonlinear least square fit so that the integrals of the matrix elements can be computed in closed form. This technique is tested in a variety of problems and found to be sufficiently accurate in obtaining the radial wavenumbers as well as the transmission loss in a waveguide. The computational savings obtained by this approach is remarkable, the improvements being one or two orders of magnitude.


Author(s):  
Alexander Liefke ◽  
Peter Jaksch ◽  
Sebastian Schmitz ◽  
Vincent Marciniak ◽  
Uwe Janoske ◽  
...  

Abstract This paper shows how to use discrete CFD and FEM adjoint surface sensitivities to derive objective-based tolerances for turbine blades, instead of relying on geometric tolerances. For this purpose a multidisciplinary adjoint evaluation tool chain is introduced to quantify the effect of real manufacturing imperfections on aerodynamic efficiency and probabilistic low cycle fatigue life time. Before the adjoint method is applied, a numerical validation of the CFD and FEM adjoint gradients is performed using 102 heavy duty turbine vane scans. The results show that the relative error for adjoint CFD gradients is below 0.5%, while the FEM life time gradient relative errors are below 5%. The adjoint assessment tool chain further reduces the computational cost by around 85% for the investigated test case compared to non-linear methods. Through the application of the presented tool chain, the definition of specified objective-based tolerances becomes available as a design assessment tool and allows to improve overall turbine efficiency and the accuracy of life time prediction.


Sign in / Sign up

Export Citation Format

Share Document