On-line String Searching
In the previous two chapters, we have examined various serial and parallel methods to perform exact string searching in a number of operations proportional to the total length of the input. Even though such a performance is optimal, our treatment of exact searches cannot be considered exhausted yet: in many applications, searches for different, a-priorily unknown patterns are performed on a same text or group of texts. It seems natural to ask whether these cases can be handled better than by plain reiteration of the procedures studied so far. As an analogy, consider the classical problem of searching for a given item in a table with n entries. In general, n comparisons are both necessary and sufficient for this task. If we wanted to perform k such searches, however, it is no longer clear that we need kn comparisons. Our table can be sorted once and for all at a cost of O(n log n) comparisons, after which binary search can be used. For sufficiently large k, this approach outperforms that of the k independent searches. In this chapter, we shall see that the philosophy subtending binary search can be fruitfully applied to string searching. Specifically, the text can be pre-processed once and for all in such a way that any query concerning whether or not a pattern occurs in the text can be answered in time proportional to the length of the pattern. It will also be possible to locate all the occurrences of the pattern in the text at an additional cost proportional to the total number of such occurrences. We call this type of search on-line, to refer to the fact that as soon as we finish reading the pattern we can decide whether or not it occurs in our text. As it turns out, the auxiliary structures used to achieve this goal are well suited to a host of other applications. There are several, essentially equivalent digital structures supporting efficient on-line string searching. Here, we base our discussion on a variant known as suffix tree. It is instructive to discuss first a simplified version of suffix trees, which we call expanded suffix tree.