Parallel Computing Using the Prefix Problem
Latest Publications


TOTAL DOCUMENTS

8
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780195088496, 9780197560549

Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

This chapter presents a survey of parallel algorithms for computing the prefixes using circuit models. The circuits considered in this Chapter are constrained by the fixed fan-in (equal to two) but are allowed to have arbitrary or unbounded fan-out. Prefix circuits with fixed fan-in and fan-out are described in Chapter 7, and those with unbounded fan-in in Chapter 8. The characteristics of algorithms described in this Chapter are judged according to a number of measures including the size, depth, and fan-out. (Refer to Chapter 2 for definitions.) The layout of the serial circuit, S(N), is given in Figure 1. Clearly, the size, s(N), of this circuit is (N - 1), and the depth, d(N), is (N - 1). The sum of the size and depth for this circuit is. . . s(N) + d(N) = 2N – 2. . . . (1) Each node has a constant fan-out. In Chapter 6, it is shown that the sum of the size and depth of any prefix circuit is lower-bounded by 2N - 2. From this, it follows that the serial circuit is optimal. Thus, any circuit that computes all prefixes in less than serial time must have sizes larger than (N - 1). Perhaps an easy approach to the design of a parallel prefix circuit is to invoke the well known divide-conquer strategy. Let N = 2". According to this strategy, if DC(N) is a circuit that computes the prefixes of N elements, then DC (N) can be designed according to the principle illustrated in Figure 1.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

It is well known that among the three classes of the PRAM models, namely, CRCW, CREW, and EREW, the CRCW models are the weakest, in the sense that, they permit concurrent read/write by processors. Accordingly, algorithms on the CRCW model mainly concentrate on the core computations without much ado about data access. Consequently, this model, at least in principle, allows for the design of the fastest algorithm for a problem. It is intriguing to ask how fast prefixes can be computed on the CRCW models. Since CRCW models are equivalent to the unbounded fan-in circuits (refer to Chapter 2), the task of developing the fastest algorithms for the prefix problems is pursued in the context of the unbounded fan-in circuits. Recall from Chapter 2, that while the standard measures, such as, size and depth are still used to quantify the goodness of unbounded fanin circuits, the size of the circuit is measured by the total number of edges incident on all of its operation nodes, instead of by the number of operations nodes. It turns out that the size and depth of unbounded fanin circuits for computing prefixes, depends critically on the structure of the underlying semigroup from which the input elements are drawn. The principal result of this concluding Chapter may be stated as follows: There exists unbounded fan-in parallel prefix circuits of constant depth and polynomial size if, and only if, the underlying semigroup is group free. The proof of this result involves a very clever synthesis of a number of ideas drawn from different directions — structure of group free semigroups, their relations to a special class of regular sets, called non-counting regular sets, the relation of this latter class of regular sets to yet another class of regular sets defined by star-free regular expressions, and the design of a special class of finite state deterministic automata called RS machines that accept star-free regular expressions. In this context, it is convenient to define the notion of small circuits as the class of circuits with constant depth and polynomial size.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

The depth-optimal and (size + depth)-optimal circuits described in Chapters 5 and 6 have the property that they have a constant or fixed fan-in but unbounded fan-out. From practical considerations, it is desirable to construct circuits with bounded fan-out as well. In this Chapter, we first describe general methods for bounding fan-out in circuits, and then demonstrate an application of these techniques to obtain parallel prefix circuits with fixed fan-in and fixed fan-out. However, bounding fan-out increases the size and depth. The principal result of this Chapter is to derive bounds on the increase in the size and depth resulting from fixing the fan-out. A glance at the structure of the d-optimal and (s, d)-optimal circuits defined in Chapters 5 and 6 reveals that these circuits have unbounded fan-out (that is, fan-out is a function of N, the number of inputs). However, in practice, such as, in VLSI implementation, for practical reasons stemming from power, noise, and reliability considerations, the fan-out has to be limited to a small, fixed number, for instance, k (such as 2 or 3). We first describe an optimal method for bounding fan-out and then apply it to prefix circuits. To understand the effect of limiting the fan-out, consider an example of a node a in Figure 1, with fan-out 9. This can be thought of as a 9-ary tree with a as its root, and bi, 1 ≤ i ≤ 9 as the leaves. Suppose that we have to limit the fan-out to 3. That is, we have to replace the 9- ary tree in Figure 1 by a 3-ary tree with a as its root and bi, 1 ≤ i ≤ 9, as the leaves with the constraint that the values of the leaves remain unchanged. One such tree is given in Figure 2. Clearly, restricting the fan-out simultaneously increases the depth and the size of the circuit. In this example, the size is increased by 3 and the depth by 1.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

Ladner-Fischer [1980] was the first one to demonstrate size vs. depth trade-off in parallel prefix circuits. In this Chapter, a lower bound from Snir [1986] on (size + depth) for prefix circuits is first derived. Several designs of parallel prefix circuits with optimal (size + depth) trade-offs are described. In this section, we derive a lower bound from Snir [1986] on the sum of depth and size of a class of prefix circuits. Let x = { x1, x2, • • • , XN }, be a set of N variables. Let D be the domain over which elements of x take their values, and let FN = { f1, f 2, . . . , fN } be a set of functions satisfying the following conditions: (SR1) fi, depends only on the variables x1, x2 , • • • xi, and (SR2) For each variable xi, i = 1, 2, • • • , N, there exist values ai, in D, such that, the set of functions fj |xi = ai = fj(x1,x2, . . . , xi-1, xi = ai, xi+1, . . . , xj), j=1,2, . . . , N, contains a family of (N - 1) functions satisfying these conditions. The family FN satisfying these two conditions is called a self reducible family of functions. Clearly, fi (x1, x2, . . . , xi) = x1 o x2 o . . . o xi, 1 ≤ i ≤ N, satisfies these conditions. Following is an example of a self-reducible family of functions. An associative binary operation θ is said to be non-trivial if the following conditions are satisfied: (i) Let z = x θ y. Then z depends on both x and y, that is, θ is not a projection, nor a constant operation, and (ii) There exists a (right) unit element e of θ, such that, x = x θ e.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

This Chapter describes algorithms for computing prefixes/suffixes in parallel when the input data is in the form of a linked list. Developments in this Chapter complement those in Chapter 3. We begin by defining a version of the prefix problem called the list ranking problem. Let < N > = {1,2, • • • , N} and L be a list of size N. For each i ∈ < N >, the node i in L contains two types of information: the value v(i) of node i, and the successor s(i) of node i. Clearly, s(N) = 0. A linked list may conveniently be represented as a directed, labeled graph G(V, E), where V = <N > and . . . E = { (i, j ) | j = s(i), i, j ∈ V }, and v (i) denotes the value for node i.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

In the parlance of the design and analysis of algorithms, it is now common knowledge that the type of operations used and the overall efficiency of an algorithm critically depend on the organization of the input data for the given problem. Most of the parallel algorithms for prefix computations exploit one of two different types of data organizations, namely, arrays and linked lists. This chapter examines parallel prefix algorithms based on the shared memory models when the input data is in an array. Corresponding algorithms for linked lists are described in Chapter 4. Let N = 2", for some n ≥ 1. For definiteness, consider the semigroup of real numbers with the usual addition operation.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

This chapter provides an overview of many issues related to the analysis of parallel algorithms in general. Starting with a discussion of the need for parallelism, a classification of parallel architectures, the need and the use of parallel models in algorithm development and various measures for quantifying the performance of parallel algorithms are presented. The definition, the role, and the properties of the key parallel complexity class called NC (Nick’s Class) are then described. This chapter concludes with a discussion of a basic result called Brent’s inequality and the derivation of a simple lower bound on the parallel time complexity. The conventional approach to engineering design critically depends on laboratory testing of scaled models — witness wind tunnel testing of models of aircraft and its parts. While such an approach has resulted in considerable success, it often involves destructive testing and is time consuming. It is said that it took nearly ten years and approximately three billion dollars to develop the now commercially successful Boeing 747 aircraft used in long-haul commercial flights. Computer simulation provides a convenient and economically attractive alternative. In simulation, the physical processes of interest are represented by a system of non-linear partial differential equations involving the three space dimensions and the time. These model equations are solved numerically using the finite difference or finite element methods. The behavior of the solution of these model equations provides insight on the effectiveness of the model in describing the physical phenomenon of interest, be it a model of an aircraft or a model of a weather phenomenon, etc. This approach, based on computer simulations has several inherent advantages. It is certainly non-destructive and easily allows for the analysis of the quality of the model for various combinations of initial and boundary conditions and the values of many key parameters of interest. The time required to complete a model run is a direct function of a number of factors. The number of grid points in space and time, the nature of the discretization scheme, such as, finite difference, finite elements, etc., the non-linearity and the coupling between the variables in the models.


Author(s):  
S. Lakshmivarahan ◽  
Sudarshan K. Dhall

With the emergence of parallel computing, the notion of prefix computation has gained considerable attention in the literature and it plays a central role in parallel algorithm design. This introductory chapter begins with the definition of the prefix problem. The ubiquitous nature of this problem is then illustrated using a host of examples drawn from a variety of application areas. Readers unfamiliar with a particular application area may choose to consult the appropriate references given in Section 1.4, Notes and References. After gaining sufficient familiarity with the remainder of this book, the reader will profit by revisiting Chapter 1 to apply the parallel prefix algorithms to several of the problems introduced here. In fact, many interesting class projects can be developed by cleverly mixing the problems and the algorithms. Let A be a set and o be a binary operation defined over the elements of A. It is assumed that Cl. A is closed under the binary operation o, that is, if a and b are in A, then so is a o b, and C2. the operation o is associative, that is, if a, b, and c are in A, Then . . . (a o b) o c = a o (b o c) = a o b o c. . . . The system (A, o) satisfying conditions Cl and C2 is called a semi-group (Birkhoff and Bartee [1970]). Examples include, (a) A is the set of integers (or real or complex numbers) and o denotes either the addition or the multiplication operation, and (b) A is a set of finite alphabet and o denotes the concatenation. To render our exposition self-contained, in Appendix A we discuss various properties of semigroups of interest to us in this book. Let d = ( d 1 , d2, • • • , 4N)', where di ∈ A, for 1≤ i ≤ N. Consider the problem of computing . . . xi =xi-1 o di. . . . for 2 ≤ i ≤ N, given that x1 = d1.


Sign in / Sign up

Export Citation Format

Share Document