Suffix arrays – a contest approach



Yüklə 247,7 Kb.
Pdf görüntüsü
səhifə7/13
tarix20.06.2022
ölçüsü247,7 Kb.
#89807
1   2   3   4   5   6   7   8   9   10   ...   13
suffix-array

Solution: 
This is actually asking to compute the number of nodes (without root) of a string’s corresponding 
suffix trie. Each distinct sequence of the string is determined by the unique path traversed in the suffix trie 
when searching it. As in the example above, „abac” has the substrings „a”, „ab”, „aba”, „abac”, „ac”, „b”, 
„ba”, „bac” and „c”, determined by the path starting from the root and going toward nodes 2, 3, 4, 5, 6, 7, 
8 and 9 in this order. Since building the suffix trie is not always a pleasant job and has a quadratic 
complexity, an approach using suffix arrays would be much more useful. Get the sorted array of suffixes 
in O(n lg n), then search the first position where the matching between every pair of consecutive suffixes 
fails (using the lcp function), and add the number of remaining characters to the solution. 
Problem 5: 
seti (ONI 2002 – abridged) 
Given a string of length N (1

N

131072) and M strings of length at most 64, count the number of 
matchings of every small string in the big one. 
Solution: 
Do as in the classical suffix arrays algorithm, only that it is sufficient to stop after step 6, where an 
order relation between all the strings of length 2
6
= 64 was established. Having sorted the substrings of 
length 64, each query is solved by two binary searches. The overall complexity is O(N lg 64 + M * 64 * lg 
N) = O(N + M lg N). 
9


Problem 6:
common subsequence (Polish Olympiad 2001 and Top Coder 2004 - abridged) 
There are given three strings S

, S
2
ş
i S
3
, of lengths m, n and p ( 1 <= m, n, p <= 10000). Determine their 
longest common substring. For example, if S

= abababca S

= aababc and S
3
= aaababca, their longest 
common substring would be ababc. 
Solution: 
If the strings were smaller in length, the problem could have been easily solved using dinamic 
programming, leading to a O(n
2
) complexity. 
Another idea is to take each suffix of S

and try finding its maximum matching in the other two strings. A 
naive maximum matching algorithm gives an O(n^2) complexity, but using KMP [8], we can achieve this 
in O(n), and using this method for each of S
1
’s suffixes we would gain an O(n^2) solution. 
Let’s see what happens if we sort the suffixes of the three strings: 
a$ 
abababca$ 
ababca$ 
abca$ 
bababca$ 
babca$ 
bca$ 
ca$ 
aababc# 
ababc# 
abc# 
babc# 
bc# 
c# 
a@ 
aaababca@ 
aababca@ 
ababca@ 
abca@ 
babca@ 
bca@ 
ca@ 
Now we merge them (consider $ < # < @ < a …): 
a$ 
a@ 
aaababca@ 
10


aababc# 
aababca@ 
abababca$ 
ababc# 
ababca$ 
ababca@ 
abc# 
abca$ 
abca@ 
bababca$ 
babc# 
babca$ 
babca@ 
bc# 
bca$ 
bca@ 
c# 
ca$ 
ca@ 
The maximal common substring corresponds to the longest common prefix of the three suffixes ababca$, 
ababc# and ababca@. Let’s see where they appear in the sorted array: 
a$ 
a@ 
aaababca@ 
aababc# 
aababca@ 
abababca$ 

Yüklə 247,7 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   10   ...   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə