BALANCED DENSE POLYNOMIAL MULTIPLICATION ON MULTI-CORES

In symbolic computation, polynomial multiplication is a fundamental operation akin to matrix multiplication in numerical computation. We present efficient implementation strategies for FFT-based dense polynomial multiplication targeting multi-cores. We show that balanced input data can maximize parallel speedup and minimize cache complexity for bivariate multiplication. However, unbalanced input data, which are common in symbolic computation, are challenging. We provide efficient techniques, that we call contraction and extension, to reduce multivariate (and univariate) multiplication to balanced bivariate multiplication. Our implementation in Cilk++ demonstrates good speedup on multi-cores.

Download Full-text

High performance and memory efficient implementation of matrix multiplication on FPGAs

2010 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2010.5681769 ◽

2010 ◽

Cited By ~ 5

Author(s):

Guiming Wu ◽

Yong Dou ◽

Miao Wang

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Efficient Implementation ◽

Memory Efficient

Download Full-text

Chemical Kinetic Model Reduction and Efficient Implementation Strategies for Hypersonic Propulsion Applications

50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition ◽

10.2514/6.2012-118 ◽

2012 ◽

Cited By ~ 3

Author(s):

Gaetano Esposito ◽

Mohammad Rahimi ◽

Harsha Chelliah ◽

Varun Hiremath ◽

Steve Pope ◽

...

Keyword(s):

Kinetic Model ◽

Model Reduction ◽

Implementation Strategies ◽

Chemical Kinetic ◽

Efficient Implementation ◽

Chemical Kinetic Model ◽

Hypersonic Propulsion

Download Full-text

Efficient implementation strategies for the DRB approach in fault-tolerant hypercubes

Proceedings Twenty-First Annual International Computer Software and Applications Conference (COMPSAC'97) ◽

10.1109/cmpsac.1997.625085 ◽

2002 ◽

Author(s):

T. Williams ◽

J. Tan ◽

Chungti Liang

Keyword(s):

Fault Tolerant ◽

Implementation Strategies ◽

Efficient Implementation

Download Full-text

Novel Low-Complexity Polynomial Multiplication over Hybrid Fields for Efficient Implementation of Binary Ring-LWE Post-Quantum Cryptography

IEEE Journal on Emerging and Selected Topics in Circuits and Systems ◽

10.1109/jetcas.2021.3075456 ◽

2021 ◽

pp. 1-1

Author(s):

Pengzhou He ◽

Ujjwal Guin ◽

Jiafeng Xie

Keyword(s):

Quantum Cryptography ◽

Low Complexity ◽

Efficient Implementation ◽

Polynomial Multiplication ◽

Post Quantum Cryptography

Download Full-text

A Criterion on Existence and Uniqueness of Behavior in Electric Circuit

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i4.10704 ◽

2016 ◽

Vol 6 (4) ◽

pp. 1529

Author(s):

Takuya Hirata ◽

Eko Setiawan ◽

Kazuya Yamaguchi ◽

Ichijo Hodaka

Keyword(s):

Numerical Computation ◽

Symbolic Computation ◽

Existence And Uniqueness ◽

Electric Circuit ◽

Computer Algorithm ◽

Electric Circuits ◽

Uniqueness Of The Solution ◽

Circuit Elements

<p>Behavior of electric circuits can be observed by solving circuit equations symbolically as well as numerically. In general, symbolic computation for circuits with certain number of circuit elements needs much more time than numerical computation. It is reasonable to check the existence and uniqueness of the solution to circuit equations beforehand in order to avoid computation for the case of no solution. Indeed, some circuits have no solution; in that case, one should notice it and avoid to wait meaningless computation. This paper proposes a new theorem to check whether given circuit equations have a solution and their voltages and currents of all circuit elements are uniquely determined or not. The theorem is suitable for developing a computer algorithm and helps quick symbolic computation for electric circuits.</p>

Download Full-text

Improving Accuracy for Matrix Multiplications on GPUs

Scientific Programming ◽

10.1155/2011/417569 ◽

2011 ◽

Vol 19 (1) ◽

pp. 3-11

Author(s):

Matthew Badin ◽

Lubomir Bic ◽

Michael Dillencourt ◽

Alexandru Nicolau

Keyword(s):

Numerical Computation ◽

Scientific Computing ◽

Matrix Multiplication ◽

Floating Point ◽

Double Precision ◽

Rounding Errors ◽

Current Generation ◽

Original Algorithm ◽

Improving Accuracy

Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so many solutions, ranging from simulating extra precision to compensated summation algorithms. These solutions however all suffer from the same problem, abysmal performance when compared against the performance of the original algorithm. Graphics cards are particularly susceptible due to a lack of double precision on all but the most recent generation graphics cards, therefore increasing the accuracy of the precision that is offered becomes paramount. By using our method of selectively applying compensated summation algorithms, we are able to return a whole digit of accuracy on current generation graphics cards and potentially two digits of accuracy on the newly released “fermi” architecture. This is all possible with only a 2% drop in performance.

Download Full-text

An efficient implementation of semi-numerical computation of the Hartree-Fock exchange on the Intel Phi processor

Chemical Physics Letters ◽

10.1016/j.cplett.2018.05.026 ◽

2018 ◽

Vol 703 ◽

pp. 106-111 ◽

Cited By ~ 1

Author(s):

Fenglai Liu ◽

Jing Kong

Keyword(s):

Numerical Computation ◽

Efficient Implementation ◽

Hartree Fock

Download Full-text

Numerical Computation, Symbolic Computation and Result Analysis of Jordan Decomposition of Time-Varying Matrices

2020 IEEE International Conference on Mechatronics and Automation (ICMA) ◽

10.1109/icma49215.2020.9233728 ◽

2020 ◽

Author(s):

Zhuoheng Zhen ◽

Yunong Zhang ◽

Xiao Liu ◽

Yihong Ling ◽

Min Yang

Keyword(s):

Numerical Computation ◽

Symbolic Computation ◽

Time Varying ◽

Jordan Decomposition ◽

Result Analysis

Download Full-text

Modular Mappings and Data Distribution Independent Computations

Parallel Processing Letters ◽

10.1142/s0129626497000188 ◽

1997 ◽

Vol 07 (02) ◽

pp. 169-180 ◽

Cited By ~ 4

Author(s):

Lee Hyuk-Jae ◽

José A.B. Fortes

Keyword(s):

Initial Data ◽

Input Data ◽

Systematic Approach ◽

Matrix Multiplication ◽

Data Distribution ◽

Parallel Computers ◽

Linearly Independent ◽

Zero Entry ◽

Pattern Distribution ◽

Data Redistribution

This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. Modular mappings, which can be used to derive many equally optimal and functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication (c = a × b). Conditions of data distributions for which it is possible to find a modular mapping that yields a programa as efficient as Cannon's algorithm are: (1) the first row of the inverse of pattern distribution of array 'a' should be equal to be equal to the second row of the inverse of pattern distribution of array 'b', (2) the second row of the inverse of pattern distribution of array 'a' should be linearly independent of the first row of the inverse of pattern distribution of array 'b', and (3) each pattern distribution of arrays 'a', 'b', and 'c' should have at least one zero entry, respectively.

Download Full-text

Efficient Implementation of Self-Organizing Map for Sparse Input Data

Proceedings of the 9th International Joint Conference on Computational Intelligence ◽

10.5220/0006499500540063 ◽

2017 ◽

Cited By ~ 2

Author(s):

Josué Melka ◽

Jean-Jacques Mariage

Keyword(s):

Input Data ◽

Efficient Implementation ◽

Self Organizing Map ◽

Self Organizing

Download Full-text