Parallel algorithm for the matrix chain product and the optimal triangulation problems (extended abstract)

Presented is a parallel algorithm based on the fast multipole method (FMM) for the Helmholtz equation. This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns. The FMM decomposes the impedance matrix into sparse components, reducing the operation count of the matrix-vector multiplication in iterative solvers to O(N3/2) (where N is the number of unknowns). The parallel algorithm divides the problem into groups and assigns the computation involved with each group to a processor node. Careful consideration is given to the communications costs. A time complexity analysis of the algorithm is presented and compared with empirical results from a Paragon XP/S running the lightweight Sandia/University of New Mexico operating system (SUNMOS). For a 90,000 unknown problem running on 60 nodes, the sparse representation fits in memory and the algorithm computes the matrix-vector product in 1.26 seconds. It sustains an aggregate rate of 1.4 Gflop/s. The corresponding dense matrix would occupy over 100 Gbytes and, assuming that I/O is free, would require on the order of 50 seconds to form the matrix-vector product.

Download Full-text

A PARALLEL ALGORITHM FOR THE MATRIX SIGN FUNCTION

International Journal of High Speed Computing ◽

10.1142/s0129053390000121 ◽

1990 ◽

Vol 02 (02) ◽

pp. 181-191 ◽

Cited By ~ 32

Author(s):

PRADEEP PANDEY ◽

CHARLES KENNEY ◽

ALAN J. LAUB

Keyword(s):

Parallel Algorithm ◽

Matrix Sign Function ◽

Sign Function ◽

The Matrix

Download Full-text

COMPUTING THE ALL-PAIRS LONGEST CHAINS IN THE PLANE

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195995000155 ◽

1995 ◽

Vol 05 (03) ◽

pp. 257-271 ◽

Cited By ~ 4

Author(s):

MIKHAIL J. ATALLAH ◽

DANNY Z. CHEN

Keyword(s):

Parallel Algorithm ◽

Space Complexity ◽

Sequential Algorithm ◽

Pram Model ◽

The Matrix ◽

Crew Pram ◽

Chain Lengths

Many problems on sequences and on special kinds of graphs involve the computation of longest chains passing points in the plane. Given a set S of n points in the plane, we consider the problem of computing the matrix of longest chain lengths between all pairs of points in S, and the matrix of “parent” pointers that describes the n longest chain trees. We present a simple sequential algorithm for computing these matrices. Our algorithm runs in O(n2) time, and hence is optimal. We also present a rather involved parallel algorithm that computes these matrices in O((log n)2) time using O(n2/log n) processors in the CREW PRAM model. These matrices enable us to report, in O(1) time, the length of a longest chain between any two points in S by using one processor, and the actual chain by using k processors, where k is the number of points of S on that chain. The space complexity of the algorithms is O(n2).

Download Full-text

Parallel forming of preconditioners based on the approximation of the Sherman-Morrison inversion formula

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r109 ◽

2015 ◽

pp. 86-93

Author(s):

А.К. Новиков ◽

C.П. Копысов ◽

Н.С. Недожогин

Keyword(s):

Parallel Algorithm ◽

Conjugate Gradient ◽

Graphics Processing Units ◽

Inversion Formula ◽

Matrix Approximation ◽

The Matrix ◽

New Form ◽

Graphics Processing ◽

Matrix Vector ◽

Parallelization Efficiency

Исследуются возможности ускорения предобусловленных методов бисопряженных градиентов (BiCGStab, Bi-Conjugate Gradient Stabilized) с предобусловливателем на основе аппроксимации обращения матрицы по формуле Шермана-Моррисона. Рассмотрена новая форма параллельного алгоритма, использующая матрично-векторные произведения при формирования матриц предобусловливателя. Показана эффективность распараллеливания наиболее ресурсоемких операций этого предобусловливателя на графических процессорах. Acceleration of preconditioned bi-conjugate gradient stabilized (BiCGStab) methods with preconditioners based on the matrix approximation by the Sherman-Morrison inversion formula is studied. A new form of the parallel algorithm using matrix-vector products to generate preconditioning matrices is proposed. A parallelization efficiency of the most resource-intensive operations of such preconditioners on multi-core central and graphics processing units (CPUs and GPUs) is shown.

Download Full-text

A parallel algorithm for the sparse QR decomposition of a rectangular upper quasi-triangular matrix with ND-type sparsity

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v16r453 ◽

2015 ◽

pp. 566-577

Author(s):

С.А. Харченко

Keyword(s):

Parallel Algorithm ◽

Triangular Matrix ◽

Parallel Computations ◽

Qr Decomposition ◽

Nested Dissection ◽

Rectangular Matrix ◽

Computational Mesh ◽

The Matrix ◽

Householder Transformations ◽

Initial Zero

Рассматривается параллельный алгоритм вычисления разреженного $QR$-разложения специальным образом упорядоченной прямоугольной матрицы на основе разреженных блочных преобразований Хаусхолдера. Для построения необходимого упорядочивания можно использовать столбцевое упорядочивание типа вложенных сечений, построенное по структуре матрицы $A^{T}A$, где $A$ - исходная прямоугольная матрица. Для сеточных задач упорядочивание может быть построено на основе известного объемного разбиения расчетной сетки. В качестве базового алгоритма для организации параллельных вычислений используется $QR$-разложение для наборов строк матрицы с дополнением в виде нулевого начального блока. An algorithm for computing the sparse $QR$ decomposition of a specially ordered rectangular matrix is proposed. This decomposition is based on the block sparse Householder transformations. For ordering computations, the nested dissection ordering is used for the matrix $A^{T}A$, where $A$ is the original rectangular matrix. For mesh based problems, the ordering can be constructed starting from an appropriate volume partitioning of the computational mesh. Parallel computations are based on sparse $QR$ decomposition for sets of rows with an additional initial zero block.

Download Full-text

A Parallel Algorithm for Dividing Octonions

Algorithms ◽

10.3390/a14110309 ◽

2021 ◽

Vol 14 (11) ◽

pp. 309

Author(s):

Aleksandr Cariow ◽

Janusz P. Paplinski

Keyword(s):

Parallel Algorithm ◽

Matrix Representation ◽

Specific Structure ◽

Matrix Product ◽

Naive Method ◽

The Matrix ◽

Speed Up ◽

Vector Matrix

The article presents a parallel hardware-oriented algorithm designed to speed up the division of two octonions. The advantage of the proposed algorithm is that the number of real multiplications is halved as compared to the naive method for implementing this operation. In the synthesis of the discussed algorithm, the matrix representation of this operation was used, which allows us to present the division of octonions by means of a vector–matrix product. Taking into account a specific structure of the matrix multiplicand allows for reducing the number of real multiplications necessary for the execution of the octonion division procedure.

Download Full-text

Crystalloid Inclusions in Hepatocyte Mitochondria and Their Similarity to Cytoplasmic Crystalloids

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100051001 ◽

1974 ◽

Vol 32 ◽

pp. 198-199

Author(s):

Odell T. Minick ◽

Hidejiro Yokoo

Keyword(s):

Liver Disease ◽

Alcoholic Liver Disease ◽

Special Interest ◽

Structural Variations ◽

Mitochondrial Alterations ◽

The Matrix ◽

Crystalloid Inclusions ◽

Liver Biopsies

Mitochondrial alterations were studied in 25 liver biopsies from patients with alcoholic liver disease. Of special interest were the morphologic resemblance of certain fine structural variations in mitochondria and crystalloid inclusions. Four types of alterations within mitochondria were found that seemed to relate to cytoplasmic crystalloids.Type 1 alteration consisted of localized groups of cristae, usually oriented in the long direction of the organelle (Fig. 1A). In this plane they appeared serrated at the periphery with blind endings in the matrix. Other sections revealed a system of equally-spaced diagonal lines lengthwise in the mitochondrion with cristae protruding from both ends (Fig. 1B). Profiles of this inclusion were not unlike tangential cuts of a crystalloid structure frequently seen in enlarged mitochondria described below.

Download Full-text

Coherency loss in γ' precipitates in nickel-base superalloy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100075014 ◽

1983 ◽

Vol 41 ◽

pp. 244-245

Author(s):

R. A. Ricks ◽

Angus J. Porter

Keyword(s):

Lattice Mismatch ◽

Lattice Parameter ◽

Nickel Base Superalloy ◽

Large Size ◽

The Matrix ◽

Nimonic 80A ◽

Interfacial Dislocations ◽

Nickel Base ◽

Coherency Loss ◽

Nimonic 115

During a recent investigation concerning the growth of γ' precipitates in nickel-base superalloys it was observed that the sign of the lattice mismatch between the coherent particles and the matrix (γ) was important in determining the ease with which matrix dislocations could be incorporated into the interface to relieve coherency strains. Thus alloys with a negative misfit (ie. the γ' lattice parameter was smaller than the matrix) could lose coherency easily and γ/γ' interfaces would exhibit regularly spaced networks of dislocations, as shown in figure 1 for the case of Nimonic 115 (misfit = -0.15%). In contrast, γ' particles in alloys with a positive misfit could grow to a large size and not show any such dislocation arrangements in the interface, thus indicating that coherency had not been lost. Figure 2 depicts a large γ' precipitate in Nimonic 80A (misfit = +0.32%) showing few interfacial dislocations.

Download Full-text

Influence of Heat Treatments on Microstructures in an Fe-Co-V Alloy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100052572 ◽

1974 ◽

Vol 32 ◽

pp. 512-513

Author(s):

S. Mahajan ◽

M. R. Pinnel ◽

J. E. Bennett

Keyword(s):

Single Phase ◽

Alloy Composition ◽

Supersaturated Solid Solution ◽

Cell Formation ◽

Heat Treatments ◽

Air Cooling ◽

Iced Brine ◽

Transmission Electron ◽

The Matrix ◽

Bcc Structure

The microstructural changes in an Fe-Co-V alloy (composition by wt.%: 2.97 V, 48.70 Co, 47.34 Fe and balance impurities, such as C, P and Ni) resulting from different heat treatments have been evaluated by optical metallography and transmission electron microscopy. Results indicate that, on air cooling or quenching into iced-brine from the high temperature single phase ϒ (fcc) field, vanadium can be retained in a supersaturated solid solution (α2) which has bcc structure. For the range of cooling rates employed, a portion of the material appears to undergo the γ-α2 transformation massively and the remainder martensitically. Figure 1 shows dislocation topology in a region that may have transformed martensitically. Dislocations are homogeneously distributed throughout the matrix, and there is no evidence for cell formation. The majority of the dislocations project along the projections of <111> vectors onto the (111) plane, implying that they are predominantly of screw character.

Download Full-text