obocel$
step 2:
0516234
bobocel
obocel$
bocel$$
ocel$$$
step 3:
0516234
bobocel
obocel$
bocel$$
ocel$$$
cel$$$$
el$$$$$
l$$$$$$
$$$$$$$
Below is a pseudo-code showing the main steps that must be followed:
n
←
length(
A
)
for i
←
0
,
n – 1
P
(0, i)
←
position of
A
i
in
the ordered array of
A
‘s characters
cnt
←
1
for
k
←
1
,
[log
2
n]
(ceil)
for
i
←
0
,
n – 1
L
(i)
←
(
P
(k – 1, i)
,
P
(k – 1, i + cnt)
,
i
)
sort
L
compute
P
(k, i)
, i = 0, n - 1
cnt
←
2
*
cnt
5
To be noticed that a certain way of numbering the substrings
is not necessarily needed, while a
valid order relation among them is kept. In order to reach the O(n lg n)
complexity, radix sort is
recommended (two times count sort), getting an O(n) time complexity per sort operation.
To make the
implementation easier one may use the
sort()
function from STL(Standard Template Library,
a library
containing data structures and algorithms in C++). The complexity may be raised to O(n lg
2
n) worst case,
but the implementation
will become much easier, the differences being scarcely noticeable for strings with
lengths smaller than 100 000.
Here you can see an extremely short implementation for suffix arrays in O(n lg
2
n).
#
include
#include
#include
using namespace std;
#define MAXN 65536
#define MAXLG 17
char A[MAXN];
struct entry {
int nr[2], p;
} L[MAXN];
int P[MAXLG][MAXN], N, i, stp, cnt;
int cmp(struct entry a, struct entry b)
{
return a.nr[0] == b.nr[0] ? (a.nr[1] < b.nr[1] ? 1 : 0) : (a.nr[0] < b.nr[0] ? 1 : 0);
}
int main(void)
{
gets(A);
for (N = strlen(A), i = 0; i < N; i ++)
P[0][i] = A[i] - 'a';
for (stp = 1, cnt = 1; cnt >> 1 < N; stp ++, cnt <<= 1)
{
for (i = 0; i < N; i ++)
{
L[i].nr[0] = P[stp - 1][i];
L[i].nr[1] = i + cnt < N ? P[stp - 1][i + cnt] : -1;
L[i].p = i;
}
sort(L, L + N, cmp);
for (i = 0; i < N; i ++)
P[stp][L[i].p] = i > 0 && L[i].nr[0] == L[i - 1].nr[0] && L[i].nr[1] == L[i - 1].nr[1] ?
P[stp][L[i - 1].p] : i;
}
return 0;
}
The suffix array will be found on the last row of matrix P. Searching the k
th
suffix is now
immediate, so we won’t return to this aspect. The quantity of memory used may be reduced, using only
the last two lines of the matrix P. It is a trade-off, as in this case the structure will not be able any more to
execute efficiently the following operation.
Dostları ilə paylaş: