Use the
decode
procedure
to decode the message, and give the result.
Exercise 2.68. The
encode
procedure takes as arguments a message and a tree and produces the list
of bits that gives the encoded message.
(define (encode message tree)
(if (null? message)
’()
(append (encode-symbol (car message) tree)
(encode (cdr message) tree))))
Encode-symbol
is a procedure, which you must write, that returns the list of bits that encodes a
given symbol according to a given tree. You should design
encode-symbol
so that it signals an
error if the symbol is not in the tree at all. Test your procedure by encoding the result you obtained in
exercise 2.67 with the sample tree and seeing whether it is the same as the original sample message.
Exercise 2.69. The following procedure takes as its argument a list of symbol-frequency pairs (where
no symbol appears in more than one pair) and generates a Huffman encoding tree according to the
Huffman algorithm.
(define (generate-huffman-tree pairs)
(successive-merge (make-leaf-set pairs)))
Make-leaf-set
is the procedure given above that transforms the list of pairs into an ordered set of
leaves.
Successive-merge
is the procedure you must write, using
make-code-tree
to
successively merge the smallest-weight elements of the set until there is only one element left, which
is the desired Huffman tree. (This procedure is slightly tricky, but not really complicated. If you find
yourself designing a complex procedure, then you are almost certainly doing something wrong. You
can take significant advantage of the fact that we are using an ordered set representation.)
Exercise 2.70. The following eight-symbol alphabet with associated relative frequencies was
designed to efficiently encode the lyrics of 1950s rock songs. (Note that the ‘‘symbols’’ of an
‘‘alphabet’’ need not be individual letters.)
A
2
NA
16
BOOM
1
SHA
3
GET
2
YIP
9
JOB
2
WAH
1
Use
generate-huffman-tree
(exercise 2.69) to generate a corresponding Huffman tree, and use
encode
(exercise 2.68) to encode the following message:
Get a job
Sha na na na na na na na na
Get a job
Sha na na na na na na na na
Wah yip yip yip yip yip yip yip yip yip
Sha boom
How many bits are required for the encoding? What is the smallest number of bits that would be
needed to encode this song if we used a fixed-length code for the eight-symbol alphabet?
Exercise 2.71. Suppose we have a Huffman tree for an alphabet of n symbols, and that the relative
frequencies of the symbols are 1, 2, 4,
...
, 2
n-1
. Sketch the tree for n=5; for n=10. In such a tree (for
general n) how may bits are required to encode the most frequent symbol? the least frequent symbol?
Exercise 2.72. Consider the encoding procedure that you designed in exercise 2.68. What is the order
of growth in the number of steps needed to encode a symbol? Be sure to include the number of steps
needed to search the symbol list at each node encountered. To answer this question in general is
difficult. Consider the special case where the relative frequencies of the n symbols are as described in
exercise 2.71, and give the order of growth (as a function of n) of the number of steps needed to
encode the most frequent and least frequent symbols in the alphabet.
32
Allowing quotation in a language wreaks havoc with the ability to reason about the language in
simple terms, because it destroys the notion that equals can be substituted for equals. For example,
three is one plus two, but the word ‘‘three’’ is not the phrase ‘‘one plus two.’’ Quotation is powerful
because it gives us a way to build expressions that manipulate other expressions (as we will see when
we write an interpreter in chapter 4). But allowing statements in a language that talk about other
statements in that language makes it very difficult to maintain any coherent principle of what ‘‘equals
can be substituted for equals’’ should mean. For example, if we know that the evening star is the
morning star, then from the statement ‘‘the evening star is Venus’’ we can deduce ‘‘the morning star is
Venus.’’ However, given that ‘‘John knows that the evening star is Venus’’ we cannot infer that ‘‘John
knows that the morning star is Venus.’’
33
The single quote is different from the double quote we have been using to enclose character strings
to be printed. Whereas the single quote can be used to denote lists or symbols, the double quote is used
only with character strings. In this book, the only use for character strings is as items to be printed.
34
Strictly, our use of the quotation mark violates the general rule that all compound expressions in
our language should be delimited by parentheses and look like lists. We can recover this consistency
by introducing a special form
quote
, which serves the same purpose as the quotation mark. Thus, we
would type
(quote a)
instead of
’a
, and we would type
(quote (a b c))
instead of
’(a b
c)
. This is precisely how the interpreter works. The quotation mark is just a single-character
abbreviation for wrapping the next complete expression with
quote
to form
(quote
<
expression>)
. This is important because it maintains the principle that any expression seen by
the interpreter can be manipulated as a data object. For instance, we could construct the expression
(car ’(a b c))
, which is the same as
(car (quote (a b c)))
, by evaluating
(list ’car (list ’quote ’(a b c)))
.
35
We can consider two symbols to be ‘‘the same’’ if they consist of the same characters in the same
order. Such a definition skirts a deep issue that we are not yet ready to address: the meaning of
‘‘sameness’’ in a programming language. We will return to this in chapter 3 (section 3.1.3).