Suffix arrays – a contest approach



Yüklə 247,7 Kb.
Pdf görüntüsü
səhifə1/13
tarix20.06.2022
ölçüsü247,7 Kb.
#89807
  1   2   3   4   5   6   7   8   9   ...   13
suffix-array



 
 
 
 
 
 
 
(appeared in GInfo 15/7, November 2005) 
Suffix arrays – a programming contest approach 
 
 
 
Adrian Vladu and Cosmin Negru
ş
eri 
Summary 
An important algorithmic area often used in practical tasks is that of algorithms on character 
strings. Many programming contests have used problems that could have been easily solved if one 
efficiently managed to determine if one string was a subsequence of another string, or had found an order 
relation within a string’s suffixes. We shall present a versatile structure that allows this along with other 
useful operations on a given string. 
Keywords
: suffix sorting, suffix arrays, suffix trees 
1


1 Introduction 
What are suffix arrays? 
 
In order to let the reader gain a better vista on suffix arrays, we shall make a short presentation of 
two data structures called 
trie
, respectively suffix tree [1] – which is a special case of a trie. A trie is a tree 
meant to store strings. Each of its nodes will have a number of sons equal to the size of the alphabet used 
by the strings that are needed to be stored. In our case, with strings containing only small letters of the 
English alphabet, each node will have at most 26 sons. Every edge going from the father toward its sons is 
labeled with a different letter of the alphabet. The labels on a path starting from the root and ending in a 
leaf will form a string stored in that tree. As it can be easily seen, finding whether a string is contained in 
this data structure is very efficient and is done in O(M) time, where M is the string’s length. Therefore, the 
searching time does not depend on the number of words stored in the structure, this making it an ideal 
structure for implementing dictionaries. 
Let’s see what a suffix trie is: 
Given a string A = a
0
a
1
…a
n – 1, 
denote by A
i
= a
i
a
i + 1
…a
n – 1 
the suffix of A that begins at position i. Let n = 
length of A. The suffix trie is made by compressing all the suffixes of A
1
…A
n – 1
into a trie, as in the 
figure below.
The suffix trie corresponding to the string “abac” is: 
2


Operations on this structure are very easily done: 
-
checking whether a string W is a substring of A – it is enough to traverse the nodes starting from 
the root and going through the edges labeled correspondingly to the characters in W (complexity 
O(|W|)) 
-
searching the longest common prefix for two suffixes of A – choose nodes u and v in the trie, 
corresponding to the ends of the two suffixes, then, with a LCA algorithm (least common 
ancestor), find the node corresponding to the end of the searched prefix. For example, for “abac” 
and “ac”, the corresponding nodes are 5 and 6. Their least common ancestor is 2, that gives the 
prefix “a”. The authors are strongly recommending [2] for an O(

n) solution, [3] for an accessible 
presentation of a solution in O(lg n) or O(1), and [4] for a state of the art algorithm. 
-
finding the k-th suffix in lexicographic order - (complexity O(n), with a corresponding 
preprocessing). For example, the 3
rd
suffix of “abac” is represented in the trie by the 3
rd
leaf. 
Even if the idea of a suffix trie would be very pleasing at first sight, the simplist implementation, 
where at every step one of the strings suffixes is inserted into the structure leads to an O(n
2
) complexity 
algorithm.There is a structure called suffix tree[1] that can be built in linear time, which is a suffix trie 
where the chains containing only nodes with the out-degree equal to 1 were compressed into a single edge 
(in the example above, these are represented by the chains 2 -3 – 4 – 5 and 1 – 7 – 8 – 9). Implementing 
the linear algorithm is scarcely possible in a short time, such as during a contest, this determining us to 
search another structure, easier to implement. 
Let’s see which are the suffixes of A, by a depth first traversal of the trie. Noting that during the depth 
first search we have to consider the nodes in the ascending lexicographic order of the edges linking them 
to their father, we gain the following suffix array: 


Yüklə 247,7 Kb.

Dostları ilə paylaş:
  1   2   3   4   5   6   7   8   9   ...   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə