Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems

AbstractWe present a prototype task-parallel algorithm for the solution of hierarchical symmetric positive definite linear systems via the ℋ-Cholesky factorization that builds upon the parallel programming standards and associated runtimes for OpenMP and OmpSs. In contrast with previous efforts, our proposal decouples the numerical aspects of the linear algebra operation from the complexities associated with high performance computing. Our experiments make an exhaustive analysis of the efficiency attained by different parallelization approaches that exploit either task-parallelism or loop-parallelism via a runtime. Alternatively, we also evaluate a solution that leverages multi-threaded parallelism via the parallel implementation of the Basic Linear Algebra Subroutines (BLAS) in Intel MKL.

Download Full-text

A New Bidirectional Cholesky Factorization Algorithm for Parallel Solution of Sparse Symmetric Positive Definite Systems

International Journal of High Speed Computing ◽

10.1142/s0129053397000064 ◽

1997 ◽

Vol 09 (01) ◽

pp. 57-71

Author(s):

K. N. Balasubramanya Murthy ◽

C. Siva Ram Murthy

Keyword(s):

Positive Definite ◽

Cholesky Factorization ◽

Parallel Solution ◽

Factorization Algorithm ◽

Symmetric Positive Definite

Download Full-text

A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators

Electronics ◽

10.3390/electronics10050627 ◽

2021 ◽

Vol 10 (5) ◽

pp. 627

Author(s):

David Marquez-Viloria ◽

Luis Castano-Londono ◽

Neil Guerrero-Gonzalez

Keyword(s):

Real Time ◽

High Performance ◽

Interference Mitigation ◽

Parallel Implementation ◽

Computational Time ◽

Successful Implementation ◽

Interchannel Interference ◽

The Difference ◽

High Level ◽

Performance Computing

A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm.

Download Full-text

Exploiting Lower Precision Arithmetic in Solving Symmetric Positive Definite Linear Systems and Least Squares Problems

SIAM Journal on Scientific Computing ◽

10.1137/19m1298263 ◽

2021 ◽

Vol 43 (1) ◽

pp. A258-A277

Author(s):

Nicholas J. Higham ◽

Srikara Pranesh

Keyword(s):

Least Squares ◽

Linear Systems ◽

Positive Definite ◽

Least Squares Problems ◽

Symmetric Positive Definite

Download Full-text

A CONSTRAINT PRECONDITIONER FOR SOLVING SYMMETRIC POSITIVE DEFINITE SYSTEMS AND APPLICATION TO THE HELMHOLTZ EQUATIONS AND POISSON EQUATIONS

Mathematical Modelling and Analysis ◽

10.3846/1392-6292.2010.15.299-311 ◽

2010 ◽

Vol 15 (3) ◽

pp. 299-311 ◽

Cited By ~ 4

Author(s):

Zhuo-Hong Huang ◽

Ting-Zhu Huang

Keyword(s):

Numerical Experiments ◽

Positive Definite ◽

Cholesky Factorization ◽

Helmholtz Equations ◽

Poisson Equations ◽

Symmetric Positive Definite ◽

Incomplete Cholesky Factorization ◽

Preconditioned Matrix ◽

Factorization Methods ◽

Number Of Iterations

In this paper, first, by using the diagonally compensated reduction and incomplete Cholesky factorization methods, we construct a constraint preconditioner for solving symmetric positive definite linear systems and then we apply the preconditioner to solve the Helmholtz equations and Poisson equations. Second, according to theoretical analysis, we prove that the preconditioned iteration method is convergent. Third, in numerical experiments, we plot the distribution of the spectrum of the preconditioned matrix M−1A and give the solution time and number of iterations comparing to the results of [5, 19].

Download Full-text

Riemannian Gradient-Based Online Identification Method for Linear Systems with Symmetric Positive-Definite Matrix

2019 IEEE 58th Conference on Decision and Control (CDC) ◽

10.1109/cdc40024.2019.9030228 ◽

2019 ◽

Author(s):

Hiroyuki Sato ◽

Kazuhiro Sato

Keyword(s):

Linear Systems ◽

Positive Definite Matrix ◽

Positive Definite ◽

Symmetric Positive Definite Matrix ◽

Online Identification ◽

Symmetric Positive Definite ◽

Identification Method ◽

Gradient Based

Download Full-text

A Combined MPI-CUDA Parallel Solution of Linear and Nonlinear Poisson-Boltzmann Equation

BioMed Research International ◽

10.1155/2014/560987 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 6

Author(s):

José Colmenares ◽

Antonella Galizia ◽

Jesús Ortiz ◽

Andrea Clematis ◽

Walter Rocchia

Keyword(s):

Boltzmann Equation ◽

High Performance ◽

Biological Macromolecules ◽

Computational Power ◽

Parallel Solution ◽

Fixed Charges ◽

Poisson Boltzmann ◽

Poisson Boltzmann Equation ◽

Novel Algorithms ◽

Performance Computing

The Poisson-Boltzmann equation models the electrostatic potential generated by fixed charges on a polarizable solute immersed in an ionic solution. This approach is often used in computational structural biology to estimate the electrostatic energetic component of the assembly of molecular biological systems. In the last decades, the amount of data concerning proteins and other biological macromolecules has remarkably increased. To fruitfully exploit these data, a huge computational power is needed as well as software tools capable of exploiting it. It is therefore necessary to move towards high performance computing and to develop proper parallel implementations of already existing and of novel algorithms. Nowadays, workstations can provide an amazing computational power: up to 10 TFLOPS on a single machine equipped with multiple CPUs and accelerators such as Intel Xeon Phi or GPU devices. The actual obstacle to the full exploitation of modern heterogeneous resources is efficient parallel coding and porting of software on such architectures. In this paper, we propose the implementation of a full Poisson-Boltzmann solver based on a finite-difference scheme using different and combined parallel schemes and in particular a mixed MPI-CUDA implementation. Results show great speedups when using the two schemes, achieving an 18.9x speedup using three GPUs.

Download Full-text