scholarly journals BALANCED DENSE POLYNOMIAL MULTIPLICATION ON MULTI-CORES

2011 ◽  
Vol 22 (05) ◽  
pp. 1035-1055 ◽  
Author(s):  
MARC MORENO MAZA ◽  
YUZHEN XIE

In symbolic computation, polynomial multiplication is a fundamental operation akin to matrix multiplication in numerical computation. We present efficient implementation strategies for FFT-based dense polynomial multiplication targeting multi-cores. We show that balanced input data can maximize parallel speedup and minimize cache complexity for bivariate multiplication. However, unbalanced input data, which are common in symbolic computation, are challenging. We provide efficient techniques, that we call contraction and extension, to reduce multivariate (and univariate) multiplication to balanced bivariate multiplication. Our implementation in Cilk++ demonstrates good speedup on multi-cores.

Author(s):  
Takuya Hirata ◽  
Eko Setiawan ◽  
Kazuya Yamaguchi ◽  
Ichijo Hodaka

<p>Behavior of electric circuits can be observed by solving circuit equations symbolically as well as numerically. In general, symbolic computation for circuits with certain number of circuit elements needs much more time than numerical computation. It is reasonable to check the existence and uniqueness of the solution to circuit equations beforehand in order to avoid computation for the case of no solution. Indeed, some circuits have no solution; in that case, one should notice it and avoid to wait meaningless computation. This paper proposes a new theorem to check whether given circuit equations have a solution and their voltages and currents of all circuit elements are uniquely determined or not. The theorem is suitable for developing a computer algorithm and helps quick symbolic computation for electric circuits.</p>


2011 ◽  
Vol 19 (1) ◽  
pp. 3-11
Author(s):  
Matthew Badin ◽  
Lubomir Bic ◽  
Michael Dillencourt ◽  
Alexandru Nicolau

Reproducibility of an experiment is a commonly used metric to determine its validity. Within scientific computing, this can become difficult due to the accumulation of floating point rounding errors in the numerical computation, greatly reducing the accuracy of the computation. Matrix multiplication is particularly susceptible to these rounding errors which is why there exist so many solutions, ranging from simulating extra precision to compensated summation algorithms. These solutions however all suffer from the same problem, abysmal performance when compared against the performance of the original algorithm. Graphics cards are particularly susceptible due to a lack of double precision on all but the most recent generation graphics cards, therefore increasing the accuracy of the precision that is offered becomes paramount. By using our method of selectively applying compensated summation algorithms, we are able to return a whole digit of accuracy on current generation graphics cards and potentially two digits of accuracy on the newly released “fermi” architecture. This is all possible with only a 2% drop in performance.


1997 ◽  
Vol 07 (02) ◽  
pp. 169-180 ◽  
Author(s):  
Lee Hyuk-Jae ◽  
José A.B. Fortes

This paper considers the problem of writing data distribution independent (DDI) programs in order to eliminate or reduce initial data redistribution overheads for distributed memory parallel computers. The functionality and execution time of DDI programs are independent of initial data distributions. Modular mappings, which can be used to derive many equally optimal and functionally equivalent programs, are briefly reviewed. Relations between modular mappings and input data distributions are then established. These relations are the basis of a systematic approach to the derivation of DDI programs which is illustrated for matrix-matrix multiplication (c = a × b). Conditions of data distributions for which it is possible to find a modular mapping that yields a programa as efficient as Cannon's algorithm are: (1) the first row of the inverse of pattern distribution of array 'a' should be equal to be equal to the second row of the inverse of pattern distribution of array 'b', (2) the second row of the inverse of pattern distribution of array 'a' should be linearly independent of the first row of the inverse of pattern distribution of array 'b', and (3) each pattern distribution of arrays 'a', 'b', and 'c' should have at least one zero entry, respectively.


Sign in / Sign up

Export Citation Format

Share Document