scholarly journals Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems

2017 ◽  
Vol 2 (1) ◽  
pp. 201-212 ◽  
Author(s):  
José I. Aliaga ◽  
Rocío Carratalá-Sáez ◽  
Enrique S. Quintana-Ortí

AbstractWe present a prototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for OpenMP and OmpSs. In contrast with previous efforts, our proposal decouples the numerical aspects of the linear algebra operation from the complexities associated with high performance computing. Our experiments make an exhaustive analysis of the efficiency attained by different parallelization approaches that exploit either task-parallelism or loop-parallelism via a runtime. Alternatively, we also evaluate a solution that leverages multi-threaded parallelism via the parallel implementation of the Basic Linear Algebra Subroutines (BLAS) in Intel MKL.

Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 627
Author(s):  
David Marquez-Viloria ◽  
Luis Castano-Londono ◽  
Neil Guerrero-Gonzalez

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.


2010 ◽  
Vol 15 (3) ◽  
pp. 299-311 ◽  
Author(s):  
Zhuo-Hong Huang ◽  
Ting-Zhu Huang

In this paper, first, by using the diagonally compensated reduction and incomplete Cholesky factorization methods, we construct a constraint preconditioner for solving symmetric positive definite linear systems and then we apply the preconditioner to solve the Helmholtz equations and Poisson equations. Second, according to theoretical analysis, we prove that the preconditioned iteration method is convergent. Third, in numerical experiments, we plot the distribution of the spectrum of the preconditioned matrix M−1A and give the solution time and number of iterations comparing to the results of [5, 19].


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
José Colmenares ◽  
Antonella Galizia ◽  
Jesús Ortiz ◽  
Andrea Clematis ◽  
Walter Rocchia

The Poisson-Boltzmann equation models the electrostatic potential generated by fixed charges on a polarizable solute immersed in an ionic solution. This approach is often used in computational structural biology to estimate the electrostatic energetic component of the assembly of molecular biological systems. In the last decades, the amount of data concerning proteins and other biological macromolecules has remarkably increased. To fruitfully exploit these data, a huge computational power is needed as well as software tools capable of exploiting it. It is therefore necessary to move towards high performance computing and to develop proper parallel implementations of already existing and of novel algorithms. Nowadays, workstations can provide an amazing computational power: up to 10 TFLOPS on a single machine equipped with multiple CPUs and accelerators such as Intel Xeon Phi or GPU devices. The actual obstacle to the full exploitation of modern heterogeneous resources is efficient parallel coding and porting of software on such architectures. In this paper, we propose the implementation of a full Poisson-Boltzmann solver based on a finite-difference scheme using different and combined parallel schemes and in particular a mixed MPI-CUDA implementation. Results show great speedups when using the two schemes, achieving an 18.9x speedup using three GPUs.


2010 ◽  
Vol 32 (5) ◽  
pp. 2468-2484 ◽  
Author(s):  
Carlo Janna ◽  
Massimilano Ferronato ◽  
Giuseppe Gambolati

Sign in / Sign up

Export Citation Format

Share Document