2.2. Computing the longest common prefix (LCP)
Given
two suffixes of a string A, compute their longest common prefix. We have shown before
that with a suffix tree this can be achieved in O(1), with a corresponding pre-calculation. Let’s see if a
suffix array can reach the same performance.
Let two suffixes A
i
si A
j
.
Using matrix P, one can iterate descending from the biggest k down to0
and check whether A
i
k
= A
j
k
. If the two prefixes are equal, a common prefix of length 2
k
had been found.
We only
have left to update i and j, increasing them both by 2
k
and check again if there are any more
common prefixes.
The LCP computing function’s code is extremely simple:
int lcp(int x, int y)
{
int k, ret = 0;
if (x == y) return N - x;
for (k = stp - 1; k >= 0 && x < N && y < N; k --)
if (P[k][x] == P[k][y])
x += 1 << k, y += 1 << k, ret += 1 << k;
return ret;
}
The complexity is O(lg n) for computing one of these prefixes. Reducing this query to an O(1)
complexity, is based on the following observation: lcp(x, y) = minimum { lcp(x, x + 1), lcp(x + 1, x + 2),
… lcp(y – 1, y) }. The proof is immediate, if we look at the corresponding suffix tree. Therefore it is
enough to compute the longest common prefix for all the consecutive pairs of suffixes (O(n lg n) time)
and introduce an additional structure that allows minimum range queries in O(1).
The most efficient
structure is that of RMQ(range minimum query), that we won’t discuss in here, being studied in detail in
[3], [4] and [5]. With another O(n lg n) preprocessing
required by the new structure, we can now answer
to the lcp queries in O(1). The structure needed by RMQ is also using O(n lg n) memory, thus the final
time and memory are O(n lg n).
2.3. Searching
Since the suffix array offers the order of A’s suffixes, searching a string W into A is easily done with a
binary search. Since comparing is done in O(|W|), the search will have an O(|W| lg n) worst case
complexity. Paper [6] offers both the data structure and the searching algorithm some refinements that
allow reducing the time to O(|W| + lg n), but we do not find this very useful during
a programming contest
Dostları ilə paylaş: