13
Table 3.1.
Sample of compression ratio data.
File Type RAR,
no RAR5,
no RAR, pass-
RAR5, pass-
password
password
word
word
Text
.247
.247
.247
.248
Executable .356
.356
.356
.356
Other
.490
.491
.490
.491
Hypothesis 2b Given an encrypted archive, the compression ratio of the archive
and the contained files are correlated.
Suppose that an adversary wants to discover whether a particular file is present
in an encrypted compressed archive. He chooses a string, S, that he knows to occur
frequently within the file. If Hypothesis 2a holds true, frequent appearances of
a string S from a file imply that the file is likely contained within the archive. If
Hypothesis 2b holds true, the correlation between the desired file and the archive
can suggest whether the file is present.
Kelsey presents a partial known input attack as follows [12]:
1. Given a string, S, and a known part of a set of messages, the attacker looks for
appearances of substrings of S in the known part of the message.
2. The appearance of substrings of S is correlated with the compressed length of
the message.
3. The attacker determines whether S appears frequently in the message.
The file FP.log is the file to be detected for this experiment. It contains many
repetitive strings, which makes it ideal to use for detection. From this file, the fol
lowing string is chosen:
compatible; MSIE 5.0; Windows 98
14
This string appears 9288 times throughout FP.log. This is an extremely high rate
of occurrence for a string of this length. Examination of other text files in the com
pression corpa shows that repetitions of strings of length greater than five is rare.
Table 3.2 provides the greatest number of repetitions for strings of various lengths for
typical text files in the corpa.
Table 3.2.
Number of repetitions of text strings of indicated length.
File
8-word 7-word 6-word 5-word
alice29
4
4
6
19
asyoulik
6
7
17
22
fields
4
4
4
4
grammar
2
2
2
4
lcet
5
7
10
10
plarbn
2
3
3
4
xargs
0
2
2
4
Nine other text files of varying sizes and contents were selected from the collection.
The ten files were then used to construct 120 encrypted archives each containing three
files. In each archive, one file is assumed to be known. The appearances of substrings
of S are then counted for each known file. The number of substring appearances is
then compared to the compressed archive length using linear regression to determine
if a correlation exists.
For the second half of the experiment, the compression ratio for the encrypted
archive is compared with the compression ratio of the file in question. The two-tailed
t-test is used to determine whether the archive’s compression ratio is equal to the
file’s compression ratio. The following formula is used to calculate the t-value:
x
¯ − µ
0
t =
√
(3.2)
s/ n
15
Where ¯
x is the average archive compression ratio, µ
0
is the file compression ratio,
s is the sample standard deviation and n is the sample size. The t-critical value, t
α,df
,
can be calculated using statistical software for comparison. If the calculated t-value
is less than the critical value, then the null hypothesis of ¯
x = µ
0
can be said to hold
true.
3.3 Man-in-the-Middle attack
This attack exploits the independence between the encryption and compression
algorithms. It was first introduced by Kohno as an attack against WinZip and later
verified by Yeo and Phan [10], [9]. Assume that two users, Alice and Bob, wish to
send a secret message in an encrypted compressed archive. Eve is a third individual
who wants to discover the content of the secret archive. The attack as outline by Yeo
and Phan proceeds as follows:
1. Alice compresses and encrypts Secret.txt into Secret.rar using compression
method 1 and shares the archive with Bob.
2. Eve intercepts Secret.rar and modifies the indicated compression method in
the file header to compression method 2. She sends this modified archive, say
Secret-prime.rar, on to Bob.
3. Bob, unaware of Eve’s actions, attempts to decompress Secret-prime.rar with
his secret password. This results in an incomprehensible file, Corrupted-Secret.txt.
He sends Corrupted-Secret.txt to Alice in an attempt to understand what is
wrong.
4. Eve again intercepts communication to obtain Corrupted-Secret.txt. She
then re-compresses Corrupted-Secret.txt using compression method 2 to ob
tain Unencrypted-Secret.rar.
5. Finally, Eve modifies the compression method in Unencrypted-Secret.rar to
method 1. She then decompresses the archive to recover the original Secret.txt.