Suffix arrays – a contest approach



Yüklə 247,7 Kb.
Pdf görüntüsü
səhifə10/13
tarix20.06.2022
ölçüsü247,7 Kb.
#89807
1   ...   5   6   7   8   9   10   11   12   13
suffix-array

Solution2 (Mircea Pa
ş
oi): 
For the string S, compute for every i from 1 to n the length of the longest prefix of S with S[i..n]. 
This can be done using suffix arrays. 
For example, if S is the string and T is the array keeping the maximal matchings, then: 
S = a b b a a b b a a 
T = 9 0 0 1 5 0 0 1 1 
For every possible k in the pattern (1 <= k <= n) check whether the maximum distance d between the 
indexes of the two farthest elements with values greater than equal to k in the string T is greater than k. 
For example: 
k=9: 9 - - - - - - - - => d=9, good. 
k=8: 9 - - - - - - - - => d=9, not good. 
k=7: 9 - - - - - - - - => d=9, not good. 
k=6: 9 - - - - - - - - => d=9, not good
k=5: 9 - - - 5 - - - - => d=5, good. 
k=4: 9 - - - 5 - - - - => d=5, not good. 
k=3: 9 - - - 5 - - - - => d=5, not good. 
k=2: 9 - - - 5 - - - - => d=5, not good.. 
k=1: 9 - - 1 5 - - 1 1 => d=3, not good. 
13


The smallest k where the distance d is small enough represents the length of the searched pattern 
(in this case k = 5).
To get an algorithm of good complexity, this step must be done efficiently. We may use a 
segment tree, walk with k from 1 to n and delete from the tree the elements smaller than k, while updating 
the tree so that it will answer queries like “what is the greatest distance between two consecutive elements 
in the structure. The algorithm has a complexity of O(n log n). A detailed presentation of the segment tree 
can be found in [9] and [10]. 
Problem 9:
(Baltic Olympiad of Informatics 2004)
 
A string 
s
is called an 
(K,L)-repeat
if S is obtained by concatenating 
K

1 times some seed string 
T
with length
 L

1. For example, the string
S = abaabaabaaba 
is a (4,3)-repeat with
T = aba 
as its seed string. That is, the seed string 
T
is 3 characters long, and the whole string 
S
is obtained by 
repeating 
T
4 times.
Write a program for the following task: Your program is given a long string 
U
consisting of 
characters ‘a’ and/or ‘b’ as input. Your program must find some 
(K,L)
-repeat that occurs as substring 
within 
U
with 
K
as large as possible. For example, the input string 
U = babbabaabaabaabab 
contains the underlined (4,3)-repeat S starting at position 5. Since 
U
contains no other contiguous 
substring with more than 4 repeats, your program must output this underlined substring.
Solution: 
We want to find out for a fixed L how to get the greatest K so that in the string U there will be a 
substring S which is a (K, L) - repeat. Check this example: U = bab
aab
aabaabaaab L = 3 and a fixed 
substring X = aab that begins on position 4 of U. We can try to extend the sequence aab by repeating it 
from its both ends as much as possible, as can be seen below: 


Yüklə 247,7 Kb.

Dostları ilə paylaş:
1   ...   5   6   7   8   9   10   11   12   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə