property that the repetition contains X as a substring the (in the case where the repetition is (1, L) this is
not true, but it’s a trivial case) .
Now we see that in order to identify within U all the (K, L) repetitions with a fixed L, it is
sufficient to partition the string in n/L chunks and then extend them. It will not be possible to do this in
O(1) for every chunk, thus the final algorithm will have a final complexity of O(n/1 + n/2 + n/3 + .. + n/n)
(every chunk can be repeated partially or
totally only at left or right, and we will not extend every chunk
separately, but we will merge the adjacent chunks into a single one; if we
had p consecutive chunks of
same length, their maximum extensions would be found in O(p)). But we know that the sum 1 + ½ + 1/3 +
¼ + … + 1/n – ln n is convergent toward a known constant c, known as Euler’s constant, and c < 1, so we
can easily figure out that O(1 + ½ + … + 1/n) = O(ln n) so the algorithm would have an O(n lg n)
complexity if the extensions would have been computed easily.
Now we can use the suffix trees. To find out how much the sequence U[i..j]
can be extended to the
right, we need to find the longest common prefix of U[i..j] and U[j + 1..n]. To extend it to the left, it’s
sufficient to reverse U, that would lead to the same problem. We have seen
how to compute the longest
common prefix in O(1) using the suffix array, that is built in O(n lg n) time, then do the required RMQ
pre-calculation in O(n lg n) that allows the lcp queries to be answered in O(1). The final complexity is O(n
lg n).
Dostları ilə paylaş: