sparse linear system
Recently Published Documents


TOTAL DOCUMENTS

52
(FIVE YEARS 11)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Giorgio Micaletto ◽  
Ivano Barletta ◽  
Silvia Mocavero ◽  
Ivan Federico ◽  
Italo Epicoco ◽  
...  

Abstract. This paper presents the MPI-based parallelization of the three-dimensional hydrodynamic model SHYFEM (System of HydrodYnamic Finite Element Modules). The original sequential version of the code was parallelized in order to reduce the execution time of high-resolution configurations using state-of-the-art HPC systems. A distributed memory approach was used, based on the message passing interface (MPI). Optimized numerical libraries were used to partition the unstructured grid (with a focus on load balancing) and to solve the sparse linear system of equations in parallel in the case of semi-to-fully implicit time stepping. The parallel implementation of the model was validated by comparing the outputs with those obtained from the sequential version. The performance assessment demonstrates a good level of scalability with a realistic configuration used as benchmark.


2021 ◽  
Vol 54 (7) ◽  
pp. 298-303
Author(s):  
S.M. Fosson ◽  
V. Cerone ◽  
D. Regruto ◽  
T. Abdalla

2020 ◽  
Vol 26 (1) ◽  
pp. 2
Author(s):  
Zhuo-Jia Fu ◽  
Lu-Feng Li ◽  
De-Shun Yin ◽  
Li-Li Yuan

In this paper, we introduce a novel localized collocation solver for two-dimensional (2D) phononic crystal analysis. In the proposed collocation solver, the displacement at each node is expressed as a linear combination of T-complete functions in each stencil support and the sparse linear system is obtained by satisfying the considered governing equation at interior nodes and boundary conditions at boundary nodes. As compared with finite element method (FEM) results and the analytical solutions, the efficiency and accuracy of the proposed localized collocation solver are verified under a benchmark example. Then, the proposed method is applied to 2D phononic crystals with various lattice forms and scatterer shapes, where the related band structures, transmission spectra, and displacement amplitude distributions are calculated as compared with the FEM.


Author(s):  
Tieqiang Mo ◽  
Renfa Li

With the new architecture and new programming paradigms such as task-based scheduling emerging in the parallel high performance computing area, it is of great importance to utilize these features to tune the monolithic computing codes. In this article, the classical conjugate gradient algorithms targeting at sparse linear system Ax = b in Krylov subspace are pipelining to execute interdependent tasks on Parallel Runtime Scheduling and Execution Controller (PaRSEC) runtime. Firstly, the sparse matrix A is split in rows to unfold more coarse-grained parallelism. Secondly, the partitioned sub-vectors are not assembled into one full vector in RAM to run sparse matrix–vector product (SpMV) operations for eliminating the communication overhead. Moreover, in the SpMV computation, if all elements of one column in the split sub-matrix are zeros, the corresponding product operations of these elements may be removed by reorganizing sub-vectors. Finally, the latency of migrating sub-vector is partially overlapped by the duration of performing SpMV operations through the further splitting in columns of sparse matrix on GPUs. In experiments, a series of tests demonstrate that optimal speedup and higher pipelining efficiency has been achieved for the pipelined task scheduling on PaRSEC runtime. Fusing SpMV concurrency and dot product pipelining can achieve higher speedup and efficiency.


Author(s):  
Nur Afza Mat Ali ◽  
Rostang Rahman ◽  
Jumat Sulaiman ◽  
Khadizah Ghazali

<p>Similarity method is used in finding the solutions of partial differential equation (PDE) in reduction to the corresponding ordinary differential equation (ODE) which are not easily integrable in terms of elementary or tabulated functions. Then, the Half-Sweep Successive Over-Relaxation (HSSOR) iterative method is applied in solving the sparse linear system which is generated from the discretization process of the corresponding second order ODEs with Dirichlet boundary conditions. Basically, this ODEs has been constructed from one-dimensional reaction-diffusion equations by using wave variable transformation. Having a large-scale and sparse linear system, we conduct the performances analysis of three iterative methods such as Full-sweep Gauss-Seidel (FSGS), Full-sweep Successive Over-Relaxation (FSSOR) and HSSOR iterative methods to examine the effectiveness of their computational cost. Therefore, four examples of these problems were tested to observe the performance of the proposed iterative methods.  Throughout implementation of numerical experiments, three parameters have been considered which are number of iterations, execution time and maximum absolute error. According to the numerical results, the HSSOR method is the most efficient iterative method in solving the proposed problem with the least number of iterations and execution time followed by FSSOR and FSGS iterative methods.</p>


Author(s):  
Sébastien Cayrols ◽  
Iain S Duff ◽  
Florent Lopez

We describe the parallelization of the solve phase in the sparse Cholesky solver SpLLT when using a sequential task flow model. In the context of direct methods, the solution of a sparse linear system is achieved through three main phases: the analyse, the factorization and the solve phases. In the last two phases, which involve numerical computation, the factorization corresponds to the most computationally costly phase, and it is therefore crucial to parallelize this phase in order to reduce the time-to-solution on modern architectures. As a consequence, the solve phase is often not as optimized as the factorization in state-of-the-art solvers, and opportunities for parallelism are often not exploited in this phase. However, in some applications, the time spent in the solve phase is comparable to or even greater than the time for the factorization, and the user could dramatically benefit from a faster solve routine. This is the case, for example, for a conjugate gradient (CG) solver using a block Jacobi preconditioner. The diagonal blocks are factorized once only, but their factors are used to solve subsystems at each CG iteration. In this study, we design and implement a parallel version of a task-based solve routine for an OpenMP version of the SpLLT solver. We show that we can obtain good scalability on a multicore architecture enabling a dramatic reduction of the overall time-to-solution in some applications.


Sign in / Sign up

Export Citation Format

Share Document