Limits to Parallel Computation
Latest Publications


TOTAL DOCUMENTS

11
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780195085914, 9780197560518

Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

We consider the selection of two basketball teams at a neighborhood playground to illustrate the greedy method. Usually the top two players are designated captains. All other players line up while the captains alternate choosing one player at a time. Usually, the players are picked using a greedy strategy. That is, the captains choose the best unclaimed player. The system of selection of choosing the best, most obvious, or most convenient remaining candidate is called the greedy method. Greedy algorithms often lead to easily implemented efficient sequential solutions to problems. Unfortunately, it also seems to be that sequential greedy algorithms frequently lead to solutions that are inherently sequential — the solutions produced by these algorithms cannot be duplicated rapidly in parallel, unless NC equals P. In the following subsections we will examine this phenomenon. We illustrate some of the important aspects of greedy algorithms using one that constructs a maximal independent set in a graph. An independent set is a set of vertices of a graph that are pairwise nonadjacent. A maximum independent set is such a set of largest cardinality. It is well known that finding maximum independent sets is NP-hard. An independent set is maximal if no other vertex can be added while maintaining the independent set property. In contrast to the maximum case, finding maxima? independent sets is very easy. Figure 7.1.1 depicts a simple polynomial time sequential algorithm computing a maximal independent set. The algorithm is a greedy algorithm: it processes the vertices in numerical order, always attempting to add the lowest numbered vertex that has not yet been tried. The sequential algorithm in Figure 7.1.1, having processed vertices 1,... , j -1, can easily decide whether to include vertex j. However, notice that its decision about j potentially depends on its decisions about all earlier vertices — j will be included in the maximal independent set if and only if all j' less than j and adjacent to it were excluded.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

The goal of this chapter is to provide the formal basis for many key concepts that are used throughout the book. These include the notions of problem, definitions of important complexity classes, reducibility, and completeness, among others. Thus far, we have used the term "problem" somewhat vaguely. In order to compare the difficulty of various problems we need to make this concept precise. Problems typically come in two flavors: search problems and decision problems. Consider the following search problem, to find the value of the maximum flow in a network. Example 3.1.1 Maximum Flow Value (MaxFlow-V) Given: A directed graph G = (V,E) with each edge e labeled by an integer capacity c(e) ≥ 0, and two distinguished vertices, s and t. Problem: Compute the value of the maximum flow from source s to sink t in G. The problem requires us to compute a number — the value of the maximum flow. Note, in this case we are actually computing a function. Now consider a variant of this problem. Example 3.1.2 Maximum Flow Bit (MaxFlow-B) Given: A directed graph G = (V, E) with each edge e labeled by an integer capacity c(e)≥ 0, and two distinguished vertices, s and t, and an integer i. Problem: Is the ith bit of the value of the maximum flow from source s to sink t in G a 1? This is a decision problem version of the flow problem. Rather than asking for the computation of some value, the problem is asking for a "yes" or "no" answer to a specific question. Yet the decision problem MaxFlow-B is equivalent to the search problem MaxFlow-V in the sense that if one can be solved efficiently in parallel, so can the other. Why is this? First consider how solving an instance of MaxFlow-B can be reduced to solving an instance of MaxFlow-V. Suppose that you are asked a question for MaxFlow-B, that is, "Is bit i of the maximum flow a 1?" It is easy to answer this question by solving MaxFlow-V and then looking at bit i of the flow.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

Before we can discuss the difficulty of solving a problem, we must first choose a suitable machine model in which to describe our computations. A machine model is a parameterized description of a class of machines. Each machine in the class is obtained from the model by giving specific values for the parameters. For example, a Turing machine is specified by giving the number of work tapes, symbol set, and program. The choice of model that we make depends on how we wish to balance such factors as simplicity, generality, historical use, novelty, plausibility of actual implementation, and ease of programming. This flexibility inevitably leads to a proliferation of different models, and parallel computation is no exception to this tendency toward diversity. The menagerie of parallel models includes bit vector machines (Pratt and Stockmeyer [293]), Boolean circuits (Borodin [40]), parallel random access machines, or PRAMs (Fortune and Wyllie [109], Goldschlager [126]), k-PRAMs (Savitch and Stimson [323]), alternating Turing machines (Chandra, Kozen, and Stockmeyer [49]), parallel pointer machines (Cook and Dymond [68], Dymond [98], Dymond and Cook [99, 100]), aggregates ([98, 99, 100]), conglomerates (Goldschlager [126]), and a large variety of machines based on fixed interconnection networks, such as grids, hypercubes, and shuffle-exchange (see Leighton [228]). Such variety makes it difficult to compare competing models. At the qualitative level, models can be distinguished by their processor granularity and their interconnection pattern. One important distinction among models is in the granularity with which they treat parallel operations. A model can be fine-grained and treat bit operations as the basic unit of parallel computation, or it can be coarsegrained and, for example, treat local sub computations on processors as the fundamental unit. In addition the model can be structured, in which case the machine can only manipulate atomic data objects and cannot access their representations (as bits for example). Another important qualitative difference among models is the nature of the communications between processing elements. Some models allow unrestricted communication between processing elements at any time. Other models require a fixed communication pattern. In some models there is no charge for the communication pathway between elements, in others there is.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

Suppose that finding the solution to a problem is P-complete. It is natural to ask if it is any easier to obtain an approximate solution. For decision problems this might mean considering the corresponding combinatorial optimization problem. That is, a problem in which we try to minimize or maximize a given quantity. As one might expect from the theory of NP-completeness, the answer is both yes (for example in the case of Bin Packing, Problem A.4.7) and no (for example in the case of the Lexicographically First Maximal Independent Set Size Problem, see Lemma 10.2.2.). There are several motivations for developing good NC approximation algorithms. First, in all likelihood P-complete problems cannot be solved fast in parallel. Therefore, it may be useful to approximate them quickly in parallel. Second, problems that are P- complete but that can be approximated well seem to be special boundary cases. Perhaps by examining these types of problems more closely we can improve our understanding of parallelism. Third, it is important to build a theoretical foundation for studying and classifying additional approximation problems. Finally, it may be possible to speed up sequential approximation algorithms, of NP-complete problems, using fast parallel approximations. Our goal in this section is to develop the basic theory of parallel approximation algorithms. We begin by showing that certain P-complete problems are not amenable to NC approximation algorithms. Later we present examples of P-complete problems that can be approximated well in parallel. We start by considering the Lexicographically First Maximal Independent Set Problem, introduced in Definition 7.1.1, and proven P-complete in Problem A.2.1. As defined, LFMIS it is not directly amenable to approximation. We can phrase the problem in terms of computing the size of the independent set. Definition 10.2.1 Lexicographically First Maximal Independent Set Size (LFMISsize) Given: An undirected graph G = (V, E) with an ordering on the vertices and an integer k. Problem: Is the size of the lexicographically first maximal independent set of G less than or equal to k ? The following lemma shows that computing just the size of the lexicographically first maximal independent set is P-complete.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

The subject of this book is best illustrated by the following scenario. Suppose that you are employed in the halls of industry. More than a decade ago your company entered the highly competitive "bandersnatch" market. While other companies thought that bandersnatch design was an intractable problem, and spent millions on supercomputers to search for possible designs, your company had the foresight to employ a few theoretical computer scientists. They discovered that there was a feasible algorithm for designing a bandersnatch directly from its specification. Your company can take an n-word specification of a bandersnatch and, in about n3 steps, can test if the specification is reasonable and design an optimal bandersnatch that meets it. With your algorithm, a typical 15,000 word bandersnatch specification takes about one month to design. Construction only takes a week, so design dominates the bandersnatch building process. Your competitors, on the other hand, do not have a fast algorithm for producing optimal bandersnatch designs. The best that they can do is an exhaustive search through the approximately 2n/150 possible different designs that meet the specification looking for the best one. Since this exhaustive search for the typical size of specification would take a while (say, 1016 years, assuming one design per microsecond) your competitors must be content with the suboptimal designs that they can produce in about the same time as you. At least that was until yesterday. Seeing the value of research, your competition formed a consortium and also invested in computer science. They too have discovered the feasible optimal bandersnatch design algorithm. Since design dominates the production process, your bosses decide that the way to regain your company's competitive advantage is to reduce the design time. They give you the task of dramatically speeding up the design process. Like most modern computer scientists you begin working on the problem by reading the news on the Net. One day an ad catches your eye. Sensing its corporate salvation, your company orders one of the machines. When it arrives you unpack it and discover that its architecture is very simple.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

The previous chapters have laid out the history, foundations, and mechanics of the theory of P-completeness. We have shown that this theory plays roughly the same role in the parallel complexity domain as , NP-completeness does in the sequential domain. Having devoted much effort to establishing the notion of feasible highly parallel algorithms and arguing that P-completeness captures the notions of inherently sequential problems and algorithms, it is now appropriate to temper our case a bit with some additional observations. For some problems depending on the relevant input size, it may not be worth the effort to search for a feasible highly parallel algorithm assuming for example that you already have a √n time parallel algorithm. The following table shows the relationship between square roots and logarithms for various input sizes. Of course, for small input sizes the constants on the running times also play a major role. Although it is extremely risky to predict hardware trends, it seems safe to say that massively parallel computers containing billions of processors are not "just around the corner" and although potentially feasible, machines with millions of processors are not soon to become commodity personal computers. Thus, highly parallel algorithms will not be feasible if the processor requirements for an input of size n are much greater than n2, and probably more like n log n. Even if you have sufficient numbers of processors for problems that interest you, your algorithm may succumb to the tyranny of asymptotics. For example, a parallel algorithm that uses √n time is probably preferable to one that uses (logn)4 time, at least for values of n less than 1013. As Table 11.1 illustrates, the only really practical polylogarithmic parallel time algorithms are O((logn)2). Perhaps the limit to feasible highly parallel algorithms are those that run in (logn)2 time and use O(n2) processors. However, the search for an NC algorithm often leads to new insights into how a problem can be effectively parallelized. That is, a problem frequently is found to exhibit unexpected parallelism when the limits of its parallelism are pushed.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

Our focus up to this point has been primarily on problems — either decision, search, or function. In this chapter we shift directions and apply P-completeness theory to the study of algorithms. The theory when extended properly will allow us to make statements about whether or not certain sequential algorithms will parallelize well. The phrase "inherently sequential algorithm" is one that appears frequently in the research literature. The general intent of the phrase is obvious. However, if pressed for details one might come up with several different possible formal meanings. In this chapter we describe one approach that gives the phrase a precise interpretation. The work on P-complete algorithms began with Anderson [12] and was continued by Greenlaw [133, 135]. Much of the discussion contained in this chapter is taken from these references. That a problem is P-complete is evidence it is unlikely to have small space sequential solutions, or unlikely to have fast parallel solutions using a polynomial amount of hardware. Of course, being P-complete also means that the problem does have a polynomial time algorithm. For many P-complete decision problems, this algorithm appears explicitly or implicitly in the statement of the problem. For example, asking whether a vertex is in the lexicographically first maximal clique is essentially asking whether the vertex is in the maximal clique found by the obvious greedy algorithm — the same greedy algorithm that shows the problem is in P. This specification of a particular polynomial time algorithm in addition to the nonalgorithmic properties desired of a solution occurs in most of the search problems in Section A.3, many of the problems of a greedy or lexicographically first nature, and numerous graph problems in Section A.2. These P-completeness results say more about the difficulty of parallelizing the associated sequential algorithm than they do about the intrinsic difficulty of the decision problem. In many cases the particular sequential algorithm does not seem to adapt well to parallelism. Yet it may be the case that a modified version of the problem that avoids mentioning a sequential algorithm does have a highly parallel solution.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

In this chapter we return to the Circuit Value Problem, introduced in Section 4.2. First, we will give the formal proof of Theorem 4.2.2 that CVP is P-complete, which we only sketched previously. Then we will show that a number of useful variants and restricted versions of CVP are also P-complete. Recall the definition of the Circuit Value Problem (Definition 4.2.1) in which given an encoding ᾱ of a Boolean circuit α, a designated output y , and values for the inputs x1,..., xn, we ask if output y of α is TRUE. To show CVP is P-complete under ≤mNC1 reducibility requires showing CVP is in P, and that each language L in P is ≤mNC1 reducible to CVP. It is easy to see that given the encoding ᾱ of a circuit and the values of its inputs, one can compute the value of each gate in a number of steps that is polynomial in the size of α. On a random access machine this can be done in linear time by considering the gates in topological order (which also can be computed in linear time; see Gormen, Leiserson, and Rivest [70], for example). On a deterministic Turing machine the process is a bit more clumsy but can still be done in polynomial time. Pippenger shows that even time O(nlogn) suffices, where n is the length of the encoding of α [284]. Thus, we have the following lemma. Lemma 6.1.1 The Circuit Value Problem is in P. The more difficult step in proving that CVP is P-complete under ≤mNC1 reducibility is showing there is a ≤mNC1 reduction from each language in P to CVP. Ladner proved this by simulating Turing machines with circuits. The idea is as follows. First, recall that for each language L in P, there is a 1-tape Turing machine M that on input x = x1,..., xn halts in time t(n) = nO(1) with output equal to 1 if and only if x is in L. Note that, for each n, the machine M uses at most t(n) space on its tape.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

Why should we believe that NC does not equal P? One form of evidence is that many people have tried, but failed, to show them equal. More persuasive, perhaps, is the way they have failed, or rather, the character of the limited successes. Specifically, known approaches consistently leave a large gap between what we know how to solve by highly parallel algorithms, and general problems in P. In outline, the state of the art is as follows. General simulations are not fast: The best known parallel simulations of general sequential models give very modest improvements, basically reducing sequential time T to parallel time T/logT or √T, depending on the parallel model. Furthermore, 2TΩ(1) processors are needed to achieve even these modest improvements. Fast simulations are not general: Rapid simulations of sequential models by highly parallel models are known only for rather weak sequential models. Natural approaches provably fail: Certain natural approaches to highly parallel simulation are provably insufficient. Equivalently, in certain natural structured models of computation (Borodin [41]), one can prove that the analogs of NC and P are not equal, and indeed are separated by a nearly exponential gap, as suggested by the two points above. In this chapter we will present this evidence in more detail. The nature of the available evidence renders this chapter, especially Section 5.4, somewhat more technical than the rest of Part I. The reader may wish to skim or skip it, at least on first reading. First, consider the Generic Machine Simulation Problem introduced in Section 4.1. Intuitively, why should we expect this problem to be hard to parallelize? Notice that we defined the problem in terms of Turing machines as a technical convenience; they are not in any way fundamental to the result.


Author(s):  
Raymond Greenlaw ◽  
H. James Hoover ◽  
Walter L. Ruzzo

The usual conventions of algorithm analysis express the complexity of finding a solution in terms of the length of the problem input. This will generally make the complexity sensitive to the encoding scheme used to describe problem instances. As an extreme example, the complexity of factoring a positive integer encoded in binary is quite different from that of factoring a number encoded in unary, or encoded as a product of prime factors. Realistic complexity analysis assumes that the encoding conventions for a problem are reasonable, that is, they do not make it trivial to do something for which there is evidence of difficulty (see Section 3.2). This sensitivity to encodings is particularly significant for number problems. In such problems, the numerical values of the parameters of a problem instance are much larger than the size of the problem description. For example, a description of size O(n3) can represent a network of n vertices and n2 edges, with edge capacities of size O(2n). Thus, the flows in the network described by problem instance I can be exponentially larger than the size of I. The following definition captures this concept. Definition 9.1.1 For any instance I of a problem, let Max(I) denote the maximum magnitude of all numbers in I. For any encoding scheme for the problem, let \I\ denote the length of the encoding of the instance. A problem is a number problem if and only if there exists an encoding scheme for the problem such that there does not exist a polynomial p such that for any instance I, Max(I) ≤ p(|I|). A typical characteristic of a number problem is that when binary encodings of parameters in an instance I are replaced with unary encodings, the size of the encoding increases exponentially. Often, the unary encoded version is large enough to permit the problem to be solved efficiently in terms of the input length.


Sign in / Sign up

Export Citation Format

Share Document