Suffix arrays – a contest approach



Yüklə 247,7 Kb.
Pdf görüntüsü
səhifə4/13
tarix20.06.2022
ölçüsü247,7 Kb.
#89807
1   2   3   4   5   6   7   8   9   ...   13
suffix-array

 
 
 
 
6


2.2. Computing the longest common prefix (LCP) 
Given two suffixes of a string A, compute their longest common prefix. We have shown before 
that with a suffix tree this can be achieved in O(1), with a corresponding pre-calculation. Let’s see if a 
suffix array can reach the same performance. 
Let two suffixes A
i
si A
j
. Using matrix P, one can iterate descending from the biggest k down to0 
and check whether A
i
k
= A
j
k
. If the two prefixes are equal, a common prefix of length 2
k
had been found. 
We only have left to update i and j, increasing them both by 2
k
and check again if there are any more 
common prefixes. 
The LCP computing function’s code is extremely simple: 
int lcp(int x, int y) 

int k, ret = 0; 
if (x == y) return N - x; 
for (k = stp - 1; k >= 0 && x < N && y < N; k --) 
if (P[k][x] == P[k][y]) 
x += 1 << k, y += 1 << k, ret += 1 << k; 
return ret; 

The complexity is O(lg n) for computing one of these prefixes. Reducing this query to an O(1) 
complexity, is based on the following observation: lcp(x, y) = minimum { lcp(x, x + 1), lcp(x + 1, x + 2), 
… lcp(y – 1, y) }. The proof is immediate, if we look at the corresponding suffix tree. Therefore it is 
enough to compute the longest common prefix for all the consecutive pairs of suffixes (O(n lg n) time) 
and introduce an additional structure that allows minimum range queries in O(1). The most efficient 
structure is that of RMQ(range minimum query), that we won’t discuss in here, being studied in detail in 
[3], [4] and [5]. With another O(n lg n) preprocessing required by the new structure, we can now answer 
to the lcp queries in O(1). The structure needed by RMQ is also using O(n lg n) memory, thus the final 
time and memory are O(n lg n). 
2.3. Searching 
Since the suffix array offers the order of A’s suffixes, searching a string W into A is easily done with a 
binary search. Since comparing is done in O(|W|), the search will have an O(|W| lg n) worst case 
complexity. Paper [6] offers both the data structure and the searching algorithm some refinements that 
allow reducing the time to O(|W| + lg n), but we do not find this very useful during a programming contest 

Yüklə 247,7 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə