Efficient Algorithms for Some Common Applications on GHCC

Generalized Hypercube-Connected-Cycles (GHCC), is a challenging interconnection network, proposed earlier in the literature. In this paper, we discuss how some important, useful algorithms, like, matrix transpose, matrix multiplication and sorting can efficiently be implemented on GHCC. Matrix transpose and matrix-by-matrix multiplication of matrices of order n × n, [Formula: see text], takes O(l) and [Formula: see text] time, respectively, on GHCC(l,m), with lml processors. Using the same number of processors, a list of ml numbers can be sorted in O(l2 log 3 m) time.

Download Full-text

Evaluating the communications capabilities of the generalized hypercube interconnection network

Concurrency Practice and Experience ◽

10.1002/(sici)1096-9128(199905)11:6<281::aid-cpe428>3.0.co;2-g ◽

1999 ◽

Vol 11 (6) ◽

pp. 281-300 ◽

Cited By ~ 17

Author(s):

Sotirios G. Ziavras ◽

Sanjay Krishnamurthy

Keyword(s):

Interconnection Network ◽

Generalized Hypercube

Download Full-text

Computing the Atom Graph of a Graph and the Union Join Graph of a Hypergraph

Algorithms ◽

10.3390/a14120347 ◽

2021 ◽

Vol 14 (12) ◽

pp. 347

Author(s):

Anne Berry ◽

Geneviève Simonet

Keyword(s):

Real Number ◽

Time Complexity ◽

Matrix Multiplication ◽

Efficient Algorithms ◽

Join Graph ◽

Current Value ◽

Minimal Separator

The atom graph of a graph is a graph whose vertices are the atoms obtained by clique minimal separator decomposition of this graph, and whose edges are the edges of all possible atom trees of this graph. We provide two efficient algorithms for computing this atom graph, with a complexity in O(min(nωlogn,nm,n(n+m¯)) time, where n is the number of vertices of G, m is the number of its edges, m¯ is the number of edges of the complement of G, and ω, also denoted by α in the literature, is a real number, such that O(nω) is the best known time complexity for matrix multiplication, whose current value is 2,3728596. This time complexity is no more than the time complexity of computing the atoms in the general case. We extend our results to α-acyclic hypergraphs, which are hypergraphs having at least one join tree, a join tree of an hypergraph being defined by its hyperedges in the same way as an atom tree of a graph is defined by its atoms. We introduce the notion of union join graph, which is the union of all possible join trees; we apply our algorithms for atom graphs to efficiently compute union join graphs.

Download Full-text

DESIGNING PARALLEL ALGORITHMS FOR HIERARCHICAL SMP CLUSTERS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054103001595 ◽

2003 ◽

Vol 14 (01) ◽

pp. 59-78

Author(s):

MARTIN SCHMOLLINGER ◽

MICHAEL KAUFMANN

Keyword(s):

Message Passing ◽

Interconnection Network ◽

Efficient Algorithms ◽

Programming Models ◽

Dense Matrix ◽

Bulk Synchronous Parallel ◽

Matrix Vector Multiplication ◽

Smp Clusters ◽

The Moment ◽

Matrix Vector

Clusters of symmetric multiprocessor nodes (SMP clusters) are one of the most important parallel architectures at the moment. The architecture consists of shared-memory nodes with multiple processors and a fast interconnection network between the nodes. New programming models try to exploit this architecture by using threads in the nodes and using message-passing-libraries for inter-node communication. In order to develop efficient algorithms, it is necessary to consider the hybrid nature of the architecture and of the programming models. We present the κNUMA-model and a methodology that build a good base for designing efficient algorithms for SMP clusters. The κNUMA-model is a computational model that extends the bulk-synchronous parallel (BSP) model with the characteristics of SMP clusters and new hybrid programming models. The κNUMA-methodology suggests to develop efficient overall algorithms by developing efficient algorithms for each level in the hierarchy. We use the problem of personalized one-to-all-broadcast and the dense matrix-vector-multiplication for the presentation. The theoretical results of the analysis of the dense matrix-vector-multiplication are verified practically. We show results of experiments, made on a Linux-cluster of dual Pentium-III nodes.

Download Full-text

Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication

Electronic Notes in Theoretical Computer Science ◽

10.1016/s1571-0661(04)00313-5 ◽

2002 ◽

Vol 61 ◽

pp. 191-200 ◽

Cited By ~ 26

Author(s):

Tadao Takaoka

Keyword(s):

Matrix Multiplication ◽

Distance Matrix ◽

Efficient Algorithms

Download Full-text

Statistical analysis of Parallel Matrix Multiplication in SIMD model using ‘p’,‘p2’,‘p3’ processor's with different interconnection network

2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence) ◽

10.1109/confluence.2014.6949259 ◽

2014 ◽

Author(s):

Sunil Kumar Panigrahi ◽

Soubhik Chakraborty ◽

Jibitesh Mishra

Keyword(s):

Statistical Analysis ◽

Interconnection Network ◽

Matrix Multiplication

Download Full-text

ANALISIS KEMAMPUAN PEMAHAMAN KONSEP MATEMATIKA PADA KELAS XI BUDAYA DI SMAK St. FRANSISKUS ASISI LARANTUKA

Jurnal Penelitian Pendidikan Matematika Sumba ◽

10.53395/jppms.v3i1.238 ◽

2021 ◽

Vol 3 (1) ◽

pp. 12-23

Author(s):

Agnes Ona Bliti Puka

Keyword(s):

Matrix Multiplication ◽

Problem Based Learning ◽

Mathematical Concept ◽

Mathematical Understanding ◽

Test Results ◽

Mathematical Concepts ◽

Ability Test ◽

Matrix Transpose ◽

Francis Of Assisi

The purpose of this study is to determine the ability to understand the mathematical concept of students of class XI Culture SMAK St. Francis of Assisi Larantuka. The data are collected by the results of students’ ability test in understanding of mathematical concepts and unstructured interviews. The test questions used to measure students’ abilities in understanding mathematical concepts consist of 2 questions in the form of descriptions with matrix transpose material and matrix multiplication. The test results are analysed based on indicators of mathematical understanding, namely: 1) the ability to explain a definition in their own words according to essential traits/characteristics, 2) the ability to make examples in mathematical concepts, 3) the ability to use concepts in solving problems. Population in this study are students on eleventh culture class of SMAK St. Francis of Assisi Larantuka with two sample of students. The results show that there are the differences in the ability to understand mathematical concepts between students through Problem Based Learning.Keywords: Mathematical concepts understanding, Problem Based Learning, matrix.

Download Full-text

High-Performance Implementation

Multidimensional Programming ◽

10.1093/oso/9780195075977.003.0010 ◽

1995 ◽

Author(s):

E. A. Ashcroft ◽

A. A. Faustini ◽

R. Jaggannathan ◽

W. W. Wadge

Keyword(s):

Message Passing ◽

High Performance ◽

Interconnection Network ◽

Matrix Multiplication ◽

Parallel Computers ◽

Parallel Applications ◽

Low Level ◽

Two Factors ◽

Hybrid Language ◽

Memory Architectures

In Chapter 1, we saw how Lucid could be used to express solutions to standard problems such as sorting and matrix multiplication. One of the unique characteristics of Lucid is not only that it can be used as a programming language but it can also be used as a “composition” language. That is, instead of using Lucid to specify computations, it can be used to express how computation components (expressed in some other language) can be “glued” together to form a coherent application. By doing so, the resulting application can enjoy some of the practical benefits attributable to Lucid such as high performance through exploitation of implicit parallelism and robustness through software fault tolerance. In this chapter, we discuss one such use of Lucid—as part of a hybrid language to construct parallel applications to be executed on conventional parallel computers. A conventional parallel computer either consists of a number of processors each with local memory interconnected by a network (as in distributed memory architectures) or a number of processors that share memory possibly using an interconnection network (as in shared memory architectures). The past decade has seen the advent of conventional parallel computers starting with the Denelcor HEP evolving to the CM-2 and Intel Hypercube and further evolving to the CM-5, Intel Paragon, Cray T3D, and IBM SP-2. Even networks of workstations (or workstation clusters) are seen as low-cost (“poor man’s”) parallel computers. Programming of conventional parallel computers has proven to be far more challenging than had been expected. Part of the reason is the continued use of low-level, explicitly parallel programming models such as PVM [42], Linda [10]. Two factors have fueled the continuing use of such languages despite their limited success. 1. The need to reuse existing sequential code because the cost of rewriting legacy applications from scratch is considered prohibitive both in economic and technical terms. 2. The need to run on conventional parallel computers that view a “parallel program” at a low level—as consisting of sequential processes that frequently synchronize and communicate with each other using some form of message passing.

Download Full-text

DISJOINT-PATHS AND FAULT-TOLERANT ROUTING ON RECURSIVE DUAL-NET

International Journal of Foundations of Computer Science ◽

10.1142/s0129054111008532 ◽

2011 ◽

Vol 22 (05) ◽

pp. 1001-1018 ◽

Cited By ~ 7

Author(s):

YAMIN LI ◽

SHIETUNG PENG ◽

WANMING CHU

Keyword(s):

Fault Tolerant ◽

Interconnection Network ◽

Parallel Computers ◽

Efficient Algorithms ◽

Disjoint Paths ◽

Node Degree ◽

Huge Number ◽

Critical Issues ◽

Base Network ◽

Short Diameter

The recursive dual-net is a newly proposed interconnection network for massive parallel computers. The recursive dual-net is based on recursive dual-construction of a symmetric base network. A k-level dual-construction for k > 0 creates a network containing (2n0)2k/2 nodes with node-degree d0 + k, where n0 and d0 are the number of nodes and the node-degree of the base network, respectively. The recursive dual-net is node and edge symmetric and can contain huge number of nodes with small node-degree and short diameter. Disjoint-paths routing and fault-tolerant routing are fundamental and critical issues for the performance of an interconnection network. In this paper, we propose efficient algorithms for disjoint-paths and fault-tolerant routings on the recursive dual-net.

Download Full-text

THE RECONFIGURABLE RING OF PROCESSORS: EFFICIENT ALGORITHMS VIA HYPERCUBE SIMULATION

Parallel Processing Letters ◽

10.1142/s0129626495000059 ◽

1995 ◽

Vol 05 (01) ◽

pp. 37-48 ◽

Cited By ~ 2

Author(s):

ARNOLD L. ROSENBERG ◽

VITTORIO SCARANO ◽

RAMESH K. SITARAMAN

Keyword(s):

Communication Protocol ◽

Matrix Multiplication ◽

Constant Factor ◽

Efficient Algorithms ◽

Parallel Computer ◽

Computational Power ◽

Processing Elements ◽

Dynamically Reconfigurable ◽

Small Constant ◽

In Transit

We propose a design for, and investigate the computational power of a dynamically reconfigurable parallel computer that we call the Reconfigurable Ring of Processors ([Formula: see text], for short). The [Formula: see text] is a ring of identical processing elements (PEs) that are interconnected via a flexible multi-line reconfigurable bus, each of whose lines has one-packet width and can be configured, independently of the other lines, to establish an arbitrary PE-to-PE connection. A novel aspect of our design is a communication protocol we call COMET — for Cooperative MEssage Transmission — which allows PEs of an [Formula: see text] to exchange one-packet messages with latency that is logarithmic in the number of PEs the message passes over in transit. The main contribution of this paper is an algorithm that allows an N-PE, N-line [Formula: see text] to simulate an N-PE hypercube executing a normal algorithm, with slowdown less than 4 log log N, provided that the local state of a hypercube PE can be encoded and transmitted using a single packet. This simulation provides a rich class of efficient algorithms for the [Formula: see text], including algorithms for matrix multiplication, sorting, and the Fast Fourer Transform (often using fewer than N buslines). The resulting algorithms for the [Formula: see text] are often within a small constant factor of optimal.

Download Full-text

Tidal Flow: A Fast and Teachable Maximum Flow Algorithm

OLYMPIADS IN INFORMATICS ◽

10.15388/ioi.2018.03 ◽

2018 ◽

Vol 12 ◽

pp. 25-41

Author(s):

Matthew C. FONTAINE

Keyword(s):

Computer Science ◽

Tidal Flow ◽

Maximum Flow ◽

Efficient Algorithms ◽

Science Students ◽

Flow Algorithm ◽

Maximum Flows

Among the most interesting problems in competitive programming involve maximum flows. However, efficient algorithms for solving these problems are often difficult for students to understand at an intuitive level. One reason for this difficulty may be a lack of suitable metaphors relating these algorithms to concepts that the students already understand. This paper introduces a novel maximum flow algorithm, Tidal Flow, that is designed to be intuitive to undergraduate andpre-university computer science students.

Download Full-text