Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model

We describe the parallelization of the solve phase in the sparse Cholesky solver SpLLT when using a sequential task flow model. In the context of direct methods, the solution of a sparse linear system is achieved through three main phases: the analyse, the factorization and the solve phases. In the last two phases, which involve numerical computation, the factorization corresponds to the most computationally costly phase, and it is therefore crucial to parallelize this phase in order to reduce the time-to-solution on modern architectures. As a consequence, the solve phase is often not as optimized as the factorization in state-of-the-art solvers, and opportunities for parallelism are often not exploited in this phase. However, in some applications, the time spent in the solve phase is comparable to or even greater than the time for the factorization, and the user could dramatically benefit from a faster solve routine. This is the case, for example, for a conjugate gradient (CG) solver using a block Jacobi preconditioner. The diagonal blocks are factorized once only, but their factors are used to solve subsystems at each CG iteration. In this study, we design and implement a parallel version of a task-based solve routine for an OpenMP version of the SpLLT solver. We show that we can obtain good scalability on a multicore architecture enabling a dramatic reduction of the overall time-to-solution in some applications.

Download Full-text

Coupling multi-level component interfaces for parallel sparse linear system solvers

Proceedings of the 2009 Workshop on Component-Based High Performance Computing - CBHPC '09 ◽

10.1145/1687774.1687779 ◽

2009 ◽

Author(s):

Fang Liu ◽

Masha Sosonkina ◽

Dane Coffey

Keyword(s):

Linear System ◽

Sparse Linear System ◽

Multi Level

Download Full-text

Numerical Evaluations of Parallelization Efficiencies of Communication Avoiding Krylov Subspace Method for Large Sparse Linear System

International Conference on Computational & Experimental Engineering and Sciences ◽

10.32604/icces.2019.05496 ◽

2019 ◽

Vol 21 (2) ◽

pp. 43-43

Author(s):

Akira Matsumoto ◽

Taku Itoh ◽

Soichiro Ikuno

Keyword(s):

Linear System ◽

Krylov Subspace ◽

Krylov Subspace Method ◽

Subspace Method ◽

Sparse Linear System

Download Full-text

Solutions of reaction-diffusion equations using similarity reduction and HSSOR iteration

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i3.pp1430-1438 ◽

2019 ◽

Vol 16 (3) ◽

pp. 1430

Author(s):

Nur Afza Mat Ali ◽

Rostang Rahman ◽

Jumat Sulaiman ◽

Khadizah Ghazali

Keyword(s):

Differential Equation ◽

Linear System ◽

Iterative Method ◽

Iterative Methods ◽

Reaction Diffusion ◽

Reaction Diffusion Equations ◽

Diffusion Equations ◽

Sparse Linear System ◽

Successive Over Relaxation ◽

Number Of Iterations

<p>Similarity method is used in finding the solutions of partial differential equation (PDE) in reduction to the corresponding ordinary differential equation (ODE) which are not easily integrable in terms of elementary or tabulated functions. Then, the Half-Sweep Successive Over-Relaxation (HSSOR) iterative method is applied in solving the sparse linear system which is generated from the discretization process of the corresponding second order ODEs with Dirichlet boundary conditions. Basically, this ODEs has been constructed from one-dimensional reaction-diffusion equations by using wave variable transformation. Having a large-scale and sparse linear system, we conduct the performances analysis of three iterative methods such as Full-sweep Gauss-Seidel (FSGS), Full-sweep Successive Over-Relaxation (FSSOR) and HSSOR iterative methods to examine the effectiveness of their computational cost. Therefore, four examples of these problems were tested to observe the performance of the proposed iterative methods. Throughout implementation of numerical experiments, three parameters have been considered which are number of iterations, execution time and maximum absolute error. According to the numerical results, the HSSOR method is the most efficient iterative method in solving the proposed problem with the least number of iterations and execution time followed by FSSOR and FSGS iterative methods.</p>

Download Full-text

A Novel Non-Decreasing Temperature Based Simulated Annealing for Flow Shop Problems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.764-765.1390 ◽

2015 ◽

Vol 764-765 ◽

pp. 1390-1394

Author(s):

Ruey Maw Chen ◽

Frode Eika Sandnes

Keyword(s):

Simulated Annealing ◽

Flow Shop ◽

State Of The Art ◽

Computation Time ◽

Search Space ◽

Control Mechanisms ◽

Flow Shop Problem ◽

Simple Implementation ◽

Two Phases ◽

Permutation Schedule

The permutation flow shop problem (PFSP) is an NP-hard permutation sequencing scheduling problem, many meta-heuristics based schemes have been proposed for finding near optimal solutions. A simple insertion simulated annealing (SISA) scheme consisting of two phases is proposed for solving PFSP. First, to reduce the complexity, a simple insertion local search is conducted for constructing the solution. Second, to ensure continuous exploration in the search space, two non-decreasing temperature control mechanisms named Heating SA and Steady SA are introduced in a simulated annealing (SA) procedure. The Heating SA increases the exploration search ability and the Steady SA enhances the exploitation search ability. The most important feature of SISA is its simple implementation and low computation time complexity. Experimental results are compared with other state-of-the-art algorithms and reveal that SISA is able to efficiently yield good permutation schedule.

Download Full-text

Exploiting Task-Parallelism in Message-Passing Sparse Linear System Solvers Using OmpSs

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_46 ◽

2016 ◽

pp. 631-643 ◽

Cited By ~ 2

Author(s):

José I. Aliaga ◽

María Barreda ◽

Matthias Bollhöfer ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear System ◽

Message Passing ◽

Task Parallelism ◽

Sparse Linear System

Download Full-text

Parallelization of Cycle-Based Logic Simulation

Parallel Processing Letters ◽

10.1142/s0129626417500037 ◽

2017 ◽

Vol 27 (02) ◽

pp. 1750003

Author(s):

Toni Mancini ◽

Annalisa Massini ◽

Enrico Tronci

Keyword(s):

Execution Time ◽

Digital Circuits ◽

Parallel Implementation ◽

Logic Simulation ◽

Effective Version ◽

Parallel Version ◽

Two Phases ◽

Number Of Cycles ◽

Time Required ◽

Gpu Architecture

Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks.

Download Full-text

Direct Methods and Powder Data: State of the Art and Perspectives

Acta Crystallographica Section A Foundations of Crystallography ◽

10.1107/s0108767395013651 ◽

1996 ◽

Vol 52 (3) ◽

pp. 331-339 ◽

Cited By ~ 27

Author(s):

C. Giacovazzo

Keyword(s):

State Of The Art ◽

Direct Methods

Download Full-text

A parallel sparse linear system solver based on Hermitian/skew-Hermitian splitting

Computers & Mathematics with Applications ◽

10.1016/j.camwa.2016.08.016 ◽

2016 ◽

Vol 72 (8) ◽

pp. 2000-2007 ◽

Cited By ~ 2

Author(s):

Zhengyi Zhang ◽

Ahmed H. Sameh

Keyword(s):

Linear System ◽

Sparse Linear System

Download Full-text

Solving a very large-scale sparse linear system with a parallel algorithm in the Gaia mission

2014 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcsim.2014.6903675 ◽

2014 ◽

Cited By ~ 2

Author(s):

Ugo Becciani ◽

Eva Sciacca ◽

Marilena Bandieramonte ◽

Alberto Vecchiato ◽

Beatrice Bucciarelli ◽

...

Keyword(s):

Linear System ◽

Parallel Algorithm ◽

Large Scale ◽

Sparse Linear System

Download Full-text

Trilinos Solvers Scalability on a MFiX-Trilinos Framework Applied to Fluidized Bed Simulations

Volume 2: Fluid Mechanics; Multiphase Flows ◽

10.1115/fedsm2020-20250 ◽

2020 ◽

Author(s):

Arturo Rodriguez ◽

V. M. Krushnarao Kotteda ◽

Luis F. Rodriguez ◽

Vinod Kumar ◽

Jorge A. Munoz

Keyword(s):

Linear System ◽

Fluidized Bed ◽

Large Scale ◽

Fossil Fuel ◽

State Of The Art ◽

Iterative Solvers ◽

Flow Solver ◽

Nonlinear Solvers ◽

Fuel Reactor ◽

Preconditioned Iterative

Abstract MFiX is a multiphase open-source suite that is developed at the National Energy Technology Laboratories. It is widely used by fossil fuel reactor communities to simulate flow in a fluidized bed reactor. It does not have advanced linear iterative solvers even though it spends 70% of the run time in solving the linear system. Trilinos contains algorithms and enabling technologies for the solution of large-scale, sophisticated multi-physics engineering and scientific problems. The library developed at Sandia National Laboratories has more than 60 packages. It consists of state-of-the-art preconditioners, nonlinear solvers, direct solvers, and iterative solvers. The packages are performant and portable on various hybrid computing architectures. To improve the capabilities of MFiX, we developed a framework, MFiX-Trilinos, to integrate the advanced linear solvers in Trilinos with the FORTRAN based multiphase flow solver, MFiX. The framework changes the semantics of the array in FORTRAN and C++ and solve the linear system with packages in Trilinos and returns the solution to MFiX. The preconditioned iterative solvers considered for the analysis are BiCGStab and GMRES. The framework is verified on various fluidized bed problems. The performance of the framework is tested on the Stampede supercomputer. The wall time for multiple sizes of fluidized beds is compared.

Download Full-text