27
proceed. Subsequent attempts to sabotage the file header to force an error failed.
The fields altered included CRC32 fields, file flags, and attributes.
28
5. SUMMARY
The findings presented in Section 4 include several novel results. These will be dis
cussed in detail followed by a brief suggestion of countermeasures to prevent infor
mation leakage.
5.0.4 Discussion
In Section 4.1, statistical methods show that it is possible to distinguish different
file types based on an archive’s compression ratio. Therefore, the proposed Hypoth
esis 1 holds true. It is important to notice that, as illustrated in Table 4.3, Text,
Executable and Other compression ratios are not distinct. However, graphic files con
sistently compress at a ratio considerably higher than other file types. This is likely
due to the fact that many image formats implement some form of compression [33].
If the data in an archive has already been compressed, WinRAR’s algorithms can do
little to further reduce an archive’s size. This results in a packed file size very close
to the total file size.
This attack is most effective if an investigator is considering compression ratios to
assist in identifying whether an archive contains images. For example, in child pornog
raphy cases a forensic investigator may need to identify archives with large amounts
of images. Compression ratio inspection provides a simple method of identification
for archives with these types of contents. The information necessary is very minimal
and can be found from any archive which makes this attack easy to implement in a
variety of situations. Table 4.4 provides some intervals to be used for identifying file
types. Generally, an archive with a compression ratio greater than .64 can reasonably
be assumed to contain images. The ability to identify file types within an archive
29
helps save valuable time and effort that could potentially be lost in attempting to
crack archives with irrelevant contents.
The appearance of substring experiment in Section 4.2.1 supported the hypothesis
that substrings in the known part of an archive correlate with the compressed size of
the archive. This correlation is likely due to the general compression scheme utilized
by WinRAR. If a file is present in an archive, the appearance of substrings will allow
both LZSS and PPMII compression schemes to work more efficiently. In turn, this
results in a lower packed size for the archive. This provided the most surprising
results of the experiments as the conclusions were not immediately obvious from the
raw data.
This attack does have several drawbacks. First, the file selected for examination
has an extremely high number of repeated strings as stated in Section 3.2. Unless an
investigator is looking for similarly structured files, such as log files, it is unlikely that
typical text will include similar levels of repetition. This would result in a weaker
effect on the overall compression size which may cause the correlation to become too
weak for detection. However, the appearance of substring attack is ideal to identify
files containing profile or bank account information that include many repeated fields.
Secondly, a relatively large collection of archives was examined, which strength
ened the power of the statistical process. A collection of this size may not be available
for study. Finally, the archives containing the file were known ahead of time. While
this experimental design is sufficient to show correlation, it is not practical to execute
on completely unknown data. For future testing, a Monte Carlo experiment may
provide more accurate results for modeling the relationship between substrings and
archive size.
Section 4.2.2 showed that, with sufficiently significant levels of α it is possible
to distinguish archives that contain a file from those that don’t. Adequate evidence
is given to show that Hypothesis 2b is valid. It should be noted that in this
experiment the ratios of archives containing the file have a significantly different
average than those that don’t. In the event that the averages are closer in value, the
30
author suggests that lower values of α will be capable of distinguishing between them.
This attack is ideal to use on files that are highly compressible as its compression ratio
will have a significant effect on that of the archive. The selection of files most suited
to this attack suffers from the same issues outlined for the appearance of substrings
attack.
Both the appearance of substrings and difference of ratios attacks can extend
their usefulness in exfiltration detection measures. For example, if numerous archives
are detected leaving an organization’s system, sensitive information such as client
data can be checked against the archives as outlined. This can provide a reasonable
perception of what information has been compromised.
Finally, experiments with the Man-in-the-Middle attack in Section 4.2.3 provided
suggestions for improvement. Despite claims that the original attack is capable of
obtaining the plain text of a file in an archive, it does not perform as suggested. Fol
lowing the attack as outlined in the literature will result in the removal of encryption
from an archive. However, the compressed file is still unintelligible.
To remedy this, the author suggests some variation in the final step of the attack.
First, the files tend to accumulate extra padding at the end. This is simple to identify
as it consists of a string of hexadecimal values 0x00. The padding may be generated
from the loss of the password and salt after the encryption is removed. To avoid
conflict with the file size, the packed file size needs to be adjusted according to the
amount of padding removed. Secondly, WinRAR uses standard CRC32 checksums,
which can be computed with off the shelf software and applied in the relevant fields.
Finally, the unpacking version field should be updated to the value in the original
archive’s field to avoid compatibility issues. All of the extra information needed can
be discovered using the archives that an adversary has access to. These steps will
insure that the contents of an encrypted compressed archive can be revealed. The
attack has been verified on RAR archives using both WinRAR v3.42 and v5.10.
Despite the success of the revised implementation, RAR5 formatted archives remain
robust against the attack. This is possibly due to the enhanced archive recovery
31
capabilities in v5.x. The software is more capable of detecting and mitigating changes
to file information. Another potential pitfall in attacking the newest file format is the
new checksum algorithms. The CRC32 and BLAKE2 checksums are now password
dependent. Without knowing the password, it is not feasible to calculate the values
necessary in the final step of the attack. However, the older file format is the default
method for the newest version and remains very widely used. The attack introduced
in this paper is relevant to current information security needs.
5.0.5 Countermeasures
In response to the information discovered through the experiments, there are sug
gestions to circumvent some of the attacks. Aside from the appearance of substrings
attack, all of the attacks rely on the assumption that the adversary is able to at least
view file header information. The default setting in WinRAR only encrypts a file’s
contents and the header information remains in plaintext. For this situation, the
assumption holds. However, users are able to select an option to encrypt file header
information along with the file contents. This would mask information such as total
and packed file size, compression method and any additional file attributes.
For further security of files, the author suggests using the RAR5 file format when
possible. It has the same weakness against the first three attacks as the older file
versions. However, it is resilient against the Man-in-the-middle attack. Thus, it
provides slightly improved security over previous versions.
5.0.6 Conclusion and open questions
This paper shows that knowledge of information in an encrypted archive can be
leaked via the study of compression properties. These attacks require less time and
computing resources than traditional attacks against the encryption of an archive. Is
sues in an attack are addressed to create a successful method for recovering archived
files. This has been verified with two different versions of WinRAR but the effec
32
tiveness with other compression software remains to be evaluated. There is also a
possibility of using this attack against an archive containing multiple files. All of the
presented methods are efficient for investigators to implement as a first line of query
to discover information about an unknown archive. These methods also highlight an
area that is lacking in security for the WinRAR software. It is a future challenge to
provide a good compression scheme with effectively implemented encryption.
Some open questions remain in relation to the string detection attacks. The effect
of string frequency in the appearance of substrings remains open for further investi
gation. As discussed in Section 5.0.4, the attack is effective on highly compressible
files such as logs or databases. However, many text files do not have a high number of
repetitive strings. The length of the string may also influence the correlation with an
archive’s size. Further investigation into the effects of repetition and length remain
open.
The experimental design used emphasizes the use of statistics to conclude the
validity of a hypothesis. When conducting the literature review, very few papers im
plemented rigorous statistical methods to reach conclusions. The meaning of data can
be counter-intuitive and it is possible to reach incorrect conclusions without proper
analysis. The author encourages future researchers to use experimental methods to
provide strong validity for information security research.
REFERENCES
33
REFERENCES
[1] Symantec.
Trojan.Dropper
Technical
Details.
[Online].
Avail
able:
http://www.symantec.com/security response/writeup.jsp?docid=2002
082718-3007-99tabid=2
[2] WinZip. What can I do if I forget the encryption password for my zip file?
[Online]. Available: http://kb.winzip.com/kb/entry/79/
[3] J. Chen, J. Zhou, K. Pan, S. Lin, C. Zhao, and X. Li, “The security of key
derivation functions in WINRAR,” Journal of Computers, vol. 8, no. 9, pp.
2262–2268, 2013.
[4] A. Biryukov, O. Dunkelman, N. Keller, D. Khovratovich, and A. Shamir, “Key
recovery attacks of practical complexity on AES variants with up to 10 rounds,”
IACR eprint server, vol. 374, 2009.
[5] A. Biryukov and D. Khovratovich, “Related-key cryptanalysis of the full AES
192 and AES-256,” in Advances in Cryptology–ASIACRYPT 2009.
Springer,
2009, pp. 1–18.
[6] H. Demirci and A. A. Sel¸cuk, “A meet-in-the-middle attack on 8-round AES,”
in Fast Software Encryption. Springer, 2008, pp. 116–126.
[7] J. Lu, O. Dunkelman, N. Keller, and J. Kim, “New impossible differential attacks
on AES,” in Progress in Cryptology-INDOCRYPT 2008.
Springer, 2008, pp.
279–293.
[8] A. Biryukov, D. Khovratovich, and I. Nikoli´
c, “Distinguisher and related-key at
tack on the full AES-256,” in Advances in Cryptology-CRYPTO 2009. Springer,
2009, pp. 231–249.
[9] G. S.-W. Yeo and R. C.-W. Phan, “On the security of the WinRAR encryption
feature,” International Journal of Information Security, vol. 5, no. 2, pp. 115–
123, 2006.
[10] T. Kohno, “Attacking and repairing the WinZip encryption scheme,” in Pro
ceedings of the 11th ACM conference on Computer and communications security.
ACM, 2004, pp. 72–81.
[11] D. Polimirova-Nickolova and E. Nickolov, “Examination of archived objects’ size
influence on the information security when compression methods are applied,”
in Third International Conference Information Research, Applications and Edu
cation, 2005, p. 130.
[12] J. Kelsey, “Compression and information leakage of plaintext,” in Fast Software
Encryption. Springer, 2002, pp. 263–276.
34
[13] L. Ji-Zhong, J. Lie-Hui, Y. Qing, and X. Yao-Bin, “Hybrid method to analyze
cryptography in software,” in Multimedia Information Networking and Security
(MINES), 2012 Fourth International Conference on. IEEE, 2012, pp. 930–933.
[14] C. Maartmann-Moe, S. E. Thorkildsen, and A. ˚
Arnes, “The persistence of mem
ory: Forensic identification and extraction of cryptographic keys,” digital inves
tigation, vol. 6, pp. S132–S140, 2009.
[15] G. Fellows, “WinRAR temporary folder artefacts,” Digital Investigation, vol. 7,
no. 1, pp. 9–13, 2010.
[16] D. Gupta and B. M. Mehtre, “Recent trends in collection of software forensics
artifacts: Issues and challenges,” in Security in Computing and Communications.
Springer, 2013, pp. 303–312.
[17] RarLab. WinRAR at a glance. [Online]. Available:
http://www.win
rar.com/website/index.php?id=features
[18] ——. WinRAR - what’s new in the latest version. [Online]. Available:
http://www.rarlab.com/rarnew.htm
[19] WinRAR, “User’s manual: Rar 5.10 console version,” 2014.
[20] J. S. Plank, K. M. Greenan, and E. L. Miller, “Screaming fast galois field arith
metic using intel simd instructions.” in FAST, 2013, pp. 299–306.
[21] WinRAR, “What’s new in the latest version - version 3.00,” 2002.
[22] N. Standard, “Announcing the advanced encryption standard (AES),” Federal
Information Processing Standards Publication, vol. 197, 2001.
[23] M. S. Turan, E. B. Barker, W. E. Burr, and L. Chen, “SP 800-132. recommenda
tion for password-based key derivation: Part 1: Storage applications,” National
Institute of Standards & Technology, Gaithersburg, MD, United States, Tech.
Rep., 2010.
[24] J. A. Storer and T. G. Szymanski, “Data compression via textual substitution,”
Journal of the ACM (JACM), vol. 29, no. 4, pp. 928–951, 1982.
[25] D. Shkarin, “Improving the efficiency of the ppm algorithm,” Problems of infor
mation transmission, vol. 37, no. 3, pp. 226–235, 2001.
[26] ——, “PPM: One step to practicality,” in Data Compression Conference, 2002.
Proceedings. DCC 2002, 2002, pp. 202–211.
[27] R. Intel, “Intel 64 and IA–32 architectures optimization reference manual,” Intel
Corporation, May, 2012.
[28] S. W. Smith et al., “The scientist and engineer’s guide to digital signal process
ing,” 1997.
[29] M.
Powell.
The
Canterbury
corpus.
[Online].
Available:
http://corpus.canterbury.ac.nz/
35
[30] MaximumCompression.
Lossless
data
compression
soft
ware
benchmarks/comparisons.
[Online].
Available:
http://www.maximumcompression.com/
[31] M. H¨
orz, “HxD–HexEditor,” http://mh-nexus.de/en/hxd/, 2002–2009.
[32] RarLab.
Rar
5.0
archive
format.
[Online].
Available:
http://www.rarlab.com/technote.htm
[33] M. Prantl, “Image compression overview,” arPX.
[34] WinRAR, “RAR version 3.42 – technical information,” 2004).
APPENDICES
36
A. COMPRESSION CORPA
The following is a description of the files included in the Canterbury Corpus [29] and
Maximum Compression [30] compression testing benchmarks.
Table A.1.: Details of compression testing files
File Name
Description
alice29.txt
English text of ”Alice in Wonderland”
asyoulik.txt
English text of Shakespeare’s ”As You Like”
cp.html
HTML source code
fields.c
C source code
grammar.lsp
LISP source code
kennedy.xls
Microsoft Excel spreadsheet
lcet10.txt
English text of Workshop on Electronic Texts proceedings
plrabn12.txt
English text of ”Paradise Lost”
ptt5
CCITT test set
sum
SPARC executable
xargs.1
GNU manual page
world95.txt
English text of 1995 CIA World Fact Book
FP.txt
Website traffic log file
english.txt
Alphabetically sorted English word list
AcroRd32.exe Acrobat Reader 5.0 executable
MSO97.dll
Microsoft Office 97 Dynamic Link Library
rafale.bmp
Bitmap image
A10.jpg
JPEG image
vcfiu.hlp
Delphi First Impression OCX Help file
continued on next page
37
Table A.1.: continued
File Name
Description
ohs.doc
FlashMX.pdf
tux.png
Nature.gif
Occupational Health and Safety Microsoft Word file
Macromedia Flash MX manual Adobe Acrobat file
PNG image
GIF image
38
B. RAR FILE HEADER
The information presented in this table is based on the WinRAR 3.42 technical note
[34].
Table B.1.: RAR file header fields
Field
Length
HEAD CRC
2 bytes
HEAD TYPE
1 byte
HEAD FLAGS
2 bytes
HEAD SIZE
2 bytes
HEAD CRC
2 bytes
HEAD TYPE
1 byte
HEAD FLAGS
2 bytes
HEAD SIZE
2 bytes
RESERVED1
2 bytes
RESERVED2
4 bytes
HEAD CRC
2 bytes
HEAD TYPE
1 byte
HEAD FLAGS
2 bytes
HEAD SIZE
2 bytes
PACK SIZE
4 bytes
UNP SIZE
4 bytes
HOST OS
1 byte
FILE CRC
4 bytes
continued on next page
39
Table B.1.: continued
Field
Length
FTIME
UNP VER
METHOD
NAME SIZE
ATTR
HIGH PACK SIZE
HIGH UNP SIZE
FILE NAME
SALT
EXT TIME
4 bytes
1 byte
1 byte
2 bytes
4 bytes
4 bytes
4 bytes
variable size
8 bytes
variable size
Dostları ilə paylaş: |