scholarly journals Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model

Author(s):  
Sébastien Cayrols ◽  
Iain S Duff ◽  
Florent Lopez

We describe the parallelization of the solve phase in the sparse Cholesky solver SpLLT when using a sequential task flow model. In the context of direct methods, the solution of a sparse linear system is achieved through three main phases: the analyse, the factorization and the solve phases. In the last two phases, which involve numerical computation, the factorization corresponds to the most computationally costly phase, and it is therefore crucial to parallelize this phase in order to reduce the time-to-solution on modern architectures. As a consequence, the solve phase is often not as optimized as the factorization in state-of-the-art solvers, and opportunities for parallelism are often not exploited in this phase. However, in some applications, the time spent in the solve phase is comparable to or even greater than the time for the factorization, and the user could dramatically benefit from a faster solve routine. This is the case, for example, for a conjugate gradient (CG) solver using a block Jacobi preconditioner. The diagonal blocks are factorized once only, but their factors are used to solve subsystems at each CG iteration. In this study, we design and implement a parallel version of a task-based solve routine for an OpenMP version of the SpLLT solver. We show that we can obtain good scalability on a multicore architecture enabling a dramatic reduction of the overall time-to-solution in some applications.

Author(s):  
Nur Afza Mat Ali ◽  
Rostang Rahman ◽  
Jumat Sulaiman ◽  
Khadizah Ghazali

<p>Similarity method is used in finding the solutions of partial differential equation (PDE) in reduction to the corresponding ordinary differential equation (ODE) which are not easily integrable in terms of elementary or tabulated functions. Then, the Half-Sweep Successive Over-Relaxation (HSSOR) iterative method is applied in solving the sparse linear system which is generated from the discretization process of the corresponding second order ODEs with Dirichlet boundary conditions. Basically, this ODEs has been constructed from one-dimensional reaction-diffusion equations by using wave variable transformation. Having a large-scale and sparse linear system, we conduct the performances analysis of three iterative methods such as Full-sweep Gauss-Seidel (FSGS), Full-sweep Successive Over-Relaxation (FSSOR) and HSSOR iterative methods to examine the effectiveness of their computational cost. Therefore, four examples of these problems were tested to observe the performance of the proposed iterative methods.  Throughout implementation of numerical experiments, three parameters have been considered which are number of iterations, execution time and maximum absolute error. According to the numerical results, the HSSOR method is the most efficient iterative method in solving the proposed problem with the least number of iterations and execution time followed by FSSOR and FSGS iterative methods.</p>


2015 ◽  
Vol 764-765 ◽  
pp. 1390-1394
Author(s):  
Ruey Maw Chen ◽  
Frode Eika Sandnes

The permutation flow shop problem (PFSP) is an NP-hard permutation sequencing scheduling problem, many meta-heuristics based schemes have been proposed for finding near optimal solutions. A simple insertion simulated annealing (SISA) scheme consisting of two phases is proposed for solving PFSP. First, to reduce the complexity, a simple insertion local search is conducted for constructing the solution. Second, to ensure continuous exploration in the search space, two non-decreasing temperature control mechanisms named Heating SA and Steady SA are introduced in a simulated annealing (SA) procedure. The Heating SA increases the exploration search ability and the Steady SA enhances the exploitation search ability. The most important feature of SISA is its simple implementation and low computation time complexity. Experimental results are compared with other state-of-the-art algorithms and reveal that SISA is able to efficiently yield good permutation schedule.


2017 ◽  
Vol 27 (02) ◽  
pp. 1750003
Author(s):  
Toni Mancini ◽  
Annalisa Massini ◽  
Enrico Tronci

Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the execution of the simulation, and the simulation phase, that consists in executing the parallel simulation of the considered circuit for a certain number of cycles. During the early phase of design, compilation phase has to be repeated each time a bug is found. Thus, if the time of the compilation phase is too high, the advantages stemming from the parallel approach may be lost. In this work we propose an effective version of the compilation phase and compute the corresponding execution time. We also analyze the percentage of execution time required by the different steps of the compilation phase for a set of literature benchmarks. Further, we implemented the simulation phase exploiting the GPU architecture, and we computed the execution times for a set of benchmarks obtaining values comparable with literature ones. Finally, we implemented the sequential version of the Cycle-based simulation in such a way that the execution time is optimized. We used the sequential values to compute the speedup of the parallel version for the considered set of benchmarks.


Author(s):  
Arturo Rodriguez ◽  
V. M. Krushnarao Kotteda ◽  
Luis F. Rodriguez ◽  
Vinod Kumar ◽  
Jorge A. Munoz

Abstract MFiX is a multiphase open-source suite that is developed at the National Energy Technology Laboratories. It is widely used by fossil fuel reactor communities to simulate flow in a fluidized bed reactor. It does not have advanced linear iterative solvers even though it spends 70% of the run time in solving the linear system. Trilinos contains algorithms and enabling technologies for the solution of large-scale, sophisticated multi-physics engineering and scientific problems. The library developed at Sandia National Laboratories has more than 60 packages. It consists of state-of-the-art preconditioners, nonlinear solvers, direct solvers, and iterative solvers. The packages are performant and portable on various hybrid computing architectures. To improve the capabilities of MFiX, we developed a framework, MFiX-Trilinos, to integrate the advanced linear solvers in Trilinos with the FORTRAN based multiphase flow solver, MFiX. The framework changes the semantics of the array in FORTRAN and C++ and solve the linear system with packages in Trilinos and returns the solution to MFiX. The preconditioned iterative solvers considered for the analysis are BiCGStab and GMRES. The framework is verified on various fluidized bed problems. The performance of the framework is tested on the Stampede supercomputer. The wall time for multiple sizes of fluidized beds is compared.


Sign in / Sign up

Export Citation Format

Share Document