Cerias tech Report 2015-01 The Weakness of Winrar encrypted Archives to Compression Side-channel Attacks

Yüklə 274,97 Kb.

Pdf görüntüsü

səhifə	5/10
tarix	17.10.2017
ölçüsü	274,97 Kb.
	#5444

1 2 3 4 5 6 7 8 9 10

a compressed pointer references the compressed representation [24]. The references

can be either left- or right- pointing and the scheme allows for recursion.

Storer and Szymanski’s scheme addresses a ﬂaw in the original LZ77 algorithm.

LZ77 would occasionally generate a reference longer than the target string, resulting

in poor compression. To correct this, LZSS omits references that are longer than a

speciﬁc point. This scheme also uses one-bit ﬂags to indicate whether the following

string of data is the original source or a reference.

2.3.2 PPMII

PPMII was integrated into WinRAR as of version 2.9 to further reduce com

pression ratios [21]. PPMII was developed by Dmitry Shkarin as an improvement

to the Prediction by Partial Matching model [25, 26]. Broadly, the n

symbol of a

string is predicted based on the previous n − 1 symbols. The compression of a string

is deﬁned by code conditional probability distributions and based on the following

assumption [25]:

The larger the common (initial) part of contexts s, the larger (on the

average) the closeness of their conditional probability distributions.

This notes that the greater number of common characters two strings have, the

greater the probability of predicting the n

symbol. This is desirable as a higher

probability requires fewer bits to encode. To eﬃciently store the contexts, an M −ary

tree is utilized. This is particularly eﬃcient if a text consists of large numbers of short

strings.

2.3.3 Intel IA-32

Intel IA-32 is a compression scheme introduced in response to the observation that

database processing correlates with the hardware constraints of storage I/O [27]. It

provides lightweight compression and decompression using single instruction, multiple

data (SIMD) commands to optimize database queries. Data is compressed quickly by

reducing the the dynamic range of data. This is accomplished by applying a mask,

packed shift, and ﬁnally stitching the data together.

2.3.4 Delta encoding

This is the second new technique introduced to optimize compression performace

in the newest version. Delta encoding encompasses several techniques that stores data

as the diﬀerence between successive samples [28]. This is an alternative to directly

storing the samples themselves. Generally, the ﬁrst value in the encoded ﬁle is equal

to the ﬁrst value in the original data. The subsequent values are equal to the diﬀerence

between the current and previous value in the input. That is, for an encoded value

with original inputs x

= x

− x

n−1

(2.1)

This approach is best suited when the values in the original ﬁle have only small

changes between adjacent symbols. It is therefore ideal for ﬁle representation of a

signal, but performs poorly with text ﬁles and executable code.

3. METHODS

This section provides detailed descriptions of the experiments taken to determine

information leakage through the compression side-channel. These experiments include

an examination of compression ratios for ﬁle type and string detection as well as

a man-in-the-middle attack exploiting the independence between the compression

and encryption algorithms. The experiments were chosen based on their focus on

the compression side-channel and the practicality of implementing the attacks with

limited knowledge of an archive’s contents. Unless otherwise indicated, experiments

are performed using WinRAR v5.10

3.1 Compression ratios

This experiment is based on work by Kelsey and Polmirova-Nickolova [11, 12]. It

is run to test the hypothesis:

Hypothesis 1 The compression of diﬀerent ﬁle types under RAR and RAR5 archives

will produce distinct compression ratios.

If this hypothesis holds true, an attacker can make an educated guess as to the

contents of an encrypted archived ﬁle. This knowledge is useful for identifying ﬁle

type even when a user applies obfuscating measure such as renaming a ﬁle. The

information needed to calculate the compression ratio can be obtained by inspecting

the ﬁle header. Once this information is obtained, the compression ratio can be

calculated as shown in Equation 3.1.

The ﬁles used in this test are retrieved from the Canterbury Corpus and Maximum

Compression benchmark [29,30]. The ﬁles included in these collections are selected to

give compression results typical of commonly used ﬁles. In particular, the Canterbury

Corpus is the main benchmark to test compression algorithms. Details describing the

contents of the collections can be found in Appendix A.

The ﬁles are categorized into four types as outlined in the Maximum Compression

collection: text, executable, graphic, and other. Text ﬁle formats include plain text

ﬁles in English. Executables are Windows executable ﬁles such as .exe extensions.

Graphic ﬁles are various image ﬁle types. Other types include any ﬁles not included

under the other categories such as Microsoft Oﬃce documents, Adobe PDF or help

ﬁles. The corpa include only two graphic ﬁles of .jpeg and .bmp types. To increase

sample size and provide a wider range of ﬁle formats under the graphic ﬁle type,

additional .png and .gif formats are included.

All ﬁles in the collection were compressed with and without encryption using both

RAR and RAR5 ﬁle types. For testing purposes, the password “P4ssw0rd” was used for

all encrypted archives. The compression ratio, c, for each archive with packed archive

size x and unpacked size y is calculated as follows:

c =

(3.1)

The experiment is set up as a Block design with the ﬁle types as treatments

and encryption and archive format as blocks. This allows the compression ratio of

ﬁle types to be compared while controlling the variation due to diﬀerent methods.

Analysis of Variance (ANOVA) is then employed to test the existence of a statistical

diﬀerence between treatments. These tests are carried out at the α = 5% signiﬁcance

level.

3.2 File detection

The ﬁle detection experiments are inspired by the String Presence Detection at

tacks outlined by Kelsey [12]. This experiment will test the following two hypotheses:

Hypothesis 2a Given an uncompressed plaintext string S and a known ﬁle from

an encrypted archive, an attacker can determine whether S appears frequently

within the archive.

Yüklə 274,97 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10