Suffix arrays – a contest approach


  bobocel  step 1:  0405123



Yüklə 247,7 Kb.
Pdf görüntüsü
səhifə3/13
tarix20.06.2022
ölçüsü247,7 Kb.
#89807
1   2   3   4   5   6   7   8   9   ...   13
suffix-array

0404123 
bobocel 
step 1: 
0405123 
bobocel 
4


obocel$ 
step 2: 
0516234 
bobocel 
obocel$ 
bocel$$ 
ocel$$$ 
step 3: 
0516234 
bobocel 
obocel$ 
bocel$$ 
ocel$$$ 
cel$$$$ 
el$$$$$ 
l$$$$$$ 
$$$$$$$ 
Below is a pseudo-code showing the main steps that must be followed: 


length(
A

for i 

0

n – 1
P
(0, i)

position of 
A

in the ordered array of 
A
‘s characters
cnt 


for 
k

1

[log
2
n]
(ceil) 
for 
i

0

n – 1
L
(i)

(
P
(k – 1, i)

P
(k – 1, i + cnt)

i

sort 

compute 
P
(k, i)
 , i = 0, n - 1 
cnt

2

cnt 
 
5


To be noticed that a certain way of numbering the substrings is not necessarily needed, while a 
valid order relation among them is kept. In order to reach the O(n lg n) complexity, radix sort is 
recommended (two times count sort), getting an O(n) time complexity per sort operation. To make the 
implementation easier one may use the 
sort() 
function from STL(Standard Template Library, a library 
containing data structures and algorithms in C++). The complexity may be raised to O(n lg
2
n) worst case, 
but the implementation will become much easier, the differences being scarcely noticeable for strings with 
lengths smaller than 100 000. 
Here you can see an extremely short implementation for suffix arrays in O(n lg
2
n). 
#
include  
#include  
#include  
using namespace std
#define MAXN 65536 
#define MAXLG 17 
char A[MAXN]; 
struct entry { 
int nr[2], p; 
} L[MAXN]; 
int P[MAXLG][MAXN], N, i, stp, cnt; 
int cmp(struct entry a, struct entry b) 

return a.nr[0] == b.nr[0] ? (a.nr[1] < b.nr[1] ? 1 : 0) : (a.nr[0] < b.nr[0] ? 1 : 0); 

int main(void) 

gets(A); 
for (N = strlen(A), i = 0; i < N; i ++) 
P[0][i] = A[i] - 'a'; 
for (stp = 1, cnt = 1; cnt >> 1 < N; stp ++, cnt <<= 1) 

for (i = 0; i < N; i ++) 

L[i].nr[0] = P[stp - 1][i]; 
L[i].nr[1] = i + cnt < N ? P[stp - 1][i + cnt] : -1; 
L[i].p = i; 

sort(L, L + N, cmp); 
for (i = 0; i < N; i ++) 
P[stp][L[i].p] = i > 0 && L[i].nr[0] == L[i - 1].nr[0] && L[i].nr[1] == L[i - 1].nr[1] ? 
P[stp][L[i - 1].p] : i; 

return 0; 

The suffix array will be found on the last row of matrix P. Searching the k
th
suffix is now 
immediate, so we won’t return to this aspect. The quantity of memory used may be reduced, using only 
the last two lines of the matrix P. It is a trade-off, as in this case the structure will not be able any more to 
execute efficiently the following operation.

Yüklə 247,7 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə