Suffix arrays – a contest approach

Yüklə 247,7 Kb.

Pdf görüntüsü

səhifə	7/13
tarix	20.06.2022
ölçüsü	247,7 Kb.
	#89807

1 2 3 4 5 6 7 8 9 10 ... 13

suffix-array

Solution:
This is actually asking to compute the number of nodes (without root) of a string’s corresponding
suffix trie. Each distinct sequence of the string is determined by the unique path traversed in the suffix trie
when searching it. As in the example above, „abac” has the substrings „a”, „ab”, „aba”, „abac”, „ac”, „b”,
„ba”, „bac” and „c”, determined by the path starting from the root and going toward nodes 2, 3, 4, 5, 6, 7,
8 and 9 in this order. Since building the suffix trie is not always a pleasant job and has a quadratic
complexity, an approach using suffix arrays would be much more useful. Get the sorted array of suffixes
in O(n lg n), then search the first position where the matching between every pair of consecutive suffixes
fails (using the lcp function), and add the number of remaining characters to the solution.
Problem 5:
seti (ONI 2002 – abridged)
Given a string of length N (1
≤
N
≤
131072) and M strings of length at most 64, count the number of
matchings of every small string in the big one.
Solution:
Do as in the classical suffix arrays algorithm, only that it is sufficient to stop after step 6, where an
order relation between all the strings of length 2
6
= 64 was established. Having sorted the substrings of
length 64, each query is solved by two binary searches. The overall complexity is O(N lg 64 + M * 64 * lg
N) = O(N + M lg N).
9

Problem 6:
common subsequence (Polish Olympiad 2001 and Top Coder 2004 - abridged)
There are given three strings S
1
, S
2
ş
i S
3
, of lengths m, n and p ( 1 <= m, n, p <= 10000). Determine their
longest common substring. For example, if S
1
= abababca S
2
= aababc and S
3
= aaababca, their longest
common substring would be ababc.
Solution:
If the strings were smaller in length, the problem could have been easily solved using dinamic
programming, leading to a O(n
2
) complexity.
Another idea is to take each suffix of S
1
and try finding its maximum matching in the other two strings. A
naive maximum matching algorithm gives an O(n^2) complexity, but using KMP [8], we can achieve this
in O(n), and using this method for each of S
1
’s suffixes we would gain an O(n^2) solution.
Let’s see what happens if we sort the suffixes of the three strings:
a$
abababca$
ababca$
abca$
bababca$
babca$
bca$
ca$
aababc#
ababc#
abc#
babc#
bc#
c#
a@
aaababca@
aababca@
ababca@
abca@
babca@
bca@
ca@
Now we merge them (consider $ < # < @ < a …):
a$
a@
aaababca@
10

aababc#
aababca@
abababca$
ababc#
ababca$
ababca@
abc#
abca$
abca@
bababca$
babc#
babca$
babca@
bc#
bca$
bca@
c#
ca$
ca@
The maximal common substring corresponds to the longest common prefix of the three suffixes ababca$,
ababc# and ababca@. Let’s see where they appear in the sorted array:
a$
a@
aaababca@
aababc#
aababca@
abababca$

Yüklə 247,7 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10 ... 13