Problem 6:
common subsequence (Polish Olympiad 2001 and Top Coder 2004 - abridged)
There
are given three strings S
1
, S
2
ş
i S
3
, of lengths m, n and p ( 1 <= m, n, p <= 10000). Determine their
longest common substring. For example, if S
1
= abababca S
2
= aababc and S
3
= aaababca,
their longest
common substring would be ababc.
Solution:
If the strings were smaller in length, the problem could have been easily solved using dinamic
programming, leading to a O(n
2
) complexity.
Another idea
is to take each suffix of S
1
and try finding its maximum matching in the other two strings. A
naive maximum matching algorithm gives an O(n^2) complexity, but using KMP [8], we can achieve this
in O(n), and using
this method for each of S
1
’s suffixes we would gain an O(n^2) solution.
Let’s see what happens if we sort the suffixes of the three strings:
a$
abababca$
ababca$
abca$
bababca$
babca$
bca$
ca$
aababc#
ababc#
abc#
babc#
bc#
c#
a@
aaababca@
aababca@
ababca@
abca@
babca@
bca@
ca@
Now we merge them (consider $ < # < @ < a …):
a$
a@
aaababca@
10
aababc#
aababca@
abababca$
ababc#
ababca$
ababca@
abc#
abca$
abca@
bababca$
babc#
babca$
babca@
bc#
bca$
bca@
c#
ca$
ca@
The maximal common substring corresponds to the longest common prefix of the three suffixes ababca$,
ababc# and ababca@. Let’s see where they appear in the sorted array:
a$
a@
aaababca@
aababc#
aababca@
abababca$
Dostları ilə paylaş: