constructed. We will not prove this optimality of Huffman codes here, but we will show how Huffman
trees are constructed.
42
The algorithm for generating a Huffman tree is very simple. The idea is to arrange the tree so that the
symbols with the lowest frequency appear farthest away from the root. Begin with the set of leaf
nodes, containing symbols and their frequencies, as determined by the initial data from which the code
is to be constructed. Now find two leaves with the lowest weights and merge them to produce a node
that has these two nodes as its left and right branches. The weight of the new node is the sum of the
two weights. Remove the two leaves from the original set and replace them by this new node. Now
continue this process. At each step, merge two nodes with the smallest weights, removing them from
the set and replacing them with a node that has these two as its left and right branches. The process
stops when there is only one node left, which is the root of the entire tree. Here is how the Huffman
tree of figure 2.18 was generated:
Initial leaves
{(A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1)}
Merge
{(A 8) (B 3) ({C D} 2) (E 1) (F 1) (G 1) (H 1)}
Merge
{(A 8) (B 3) ({C D} 2) ({E F} 2) (G 1) (H 1)}
Merge
{(A 8) (B 3) ({C D} 2) ({E F} 2) ({G H} 2)}
Merge
{(A 8) (B 3) ({C D} 2) ({E F G H} 4)}
Merge
{(A 8) ({B C D} 5) ({E F G H} 4)}
Merge
{(A 8) ({B C D E F G H} 9)}
Final merge
{({A B C D E F G H} 17)}
The algorithm does not always specify a unique tree, because there may not be unique smallest-weight
nodes at each step. Also, the choice of the order in which the two nodes are merged (i.e., which will be
the right branch and which will be the left branch) is arbitrary.
Representing Huffman trees
In the exercises below we will work with a system that uses Huffman trees to encode and decode
messages and generates Huffman trees according to the algorithm outlined above. We will begin by
discussing how trees are represented.
Leaves of the tree are represented by a list consisting of the symbol
leaf
, the symbol at the leaf, and
the weight:
(define (make-leaf symbol weight)
(list ’leaf symbol weight))
(define (leaf? object)
(eq? (car object) ’leaf))
(define (symbol-leaf x) (cadr x))
(define (weight-leaf x) (caddr x))
A general tree will be a list of a left branch, a right branch, a set of symbols, and a weight. The set of
symbols will be simply a list of the symbols, rather than some more sophisticated set representation.
When we make a tree by merging two nodes, we obtain the weight of the tree as the sum of the
weights of the nodes, and the set of symbols as the union of the sets of symbols for the nodes. Since
our symbol sets are represented as lists, we can form the union by using the
append
procedure we
defined in section 2.2.1:
(define (make-code-tree left right)
(list left
right
(append (symbols left) (symbols right))
(+ (weight left) (weight right))))
If we make a tree in this way, we have the following selectors:
(define (left-branch tree) (car tree))
(define (right-branch tree) (cadr tree))
(define (symbols tree)
(if (leaf? tree)
(list (symbol-leaf tree))
(caddr tree)))
(define (weight tree)
(if (leaf? tree)
(weight-leaf tree)
(cadddr tree)))
The procedures
symbols
and
weight
must do something slightly different depending on whether
they are called with a leaf or a general tree. These are simple examples of generic procedures
(procedures that can handle more than one kind of data), which we will have much more to say about
in sections 2.4 and 2.5.
The decoding procedure
The following procedure implements the decoding algorithm. It takes as arguments a list of zeros and
ones, together with a Huffman tree.
(define (decode bits tree)
(define (decode-1 bits current-branch)
(if (null? bits)
’()
(let ((next-branch
(choose-branch (car bits) current-branch)))
(if (leaf? next-branch)
(cons (symbol-leaf next-branch)
(decode-1 (cdr bits) tree))
(decode-1 (cdr bits) next-branch)))))
(decode-1 bits tree))
(define (choose-branch bit branch)
(cond ((= bit 0) (left-branch branch))
((= bit 1) (right-branch branch))
(else (error "bad bit -- CHOOSE-BRANCH" bit))))
The procedure
decode-1
takes two arguments: the list of remaining bits and the current position in
the tree. It keeps moving ‘‘down’’ the tree, choosing a left or a right branch according to whether the
next bit in the list is a zero or a one. (This is done with the procedure
choose-branch
.) When it
reaches a leaf, it returns the symbol at that leaf as the next symbol in the message by
cons
ing it onto
the result of decoding the rest of the message, starting at the root of the tree. Note the error check in
the final clause of
choose-branch
, which complains if the procedure finds something other than a
zero or a one in the input data.
Sets of weighted elements
In our representation of trees, each non-leaf node contains a set of symbols, which we have
represented as a simple list. However, the tree-generating algorithm discussed above requires that we
also work with sets of leaves and trees, successively merging the two smallest items. Since we will be
required to repeatedly find the smallest item in a set, it is convenient to use an ordered representation
for this kind of set.
We will represent a set of leaves and trees as a list of elements, arranged in increasing order of weight.
The following
adjoin-set
procedure for constructing sets is similar to the one described in
exercise 2.61; however, items are compared by their weights, and the element being added to the set is
never already in it.
(define (adjoin-set x set)
(cond ((null? set) (list x))
((< (weight x) (weight (car set))) (cons x set))
(else (cons (car set)
(adjoin-set x (cdr set))))))
The following procedure takes a list of symbol-frequency pairs such as
((A 4) (B 2) (C 1) (D
1))
and constructs an initial ordered set of leaves, ready to be merged according to the Huffman
algorithm:
(define (make-leaf-set pairs)
(if (null? pairs)
’()
(let ((pair (car pairs)))
(adjoin-set (make-leaf (car pair) ; symbol
(cadr pair)) ; frequency
(make-leaf-set (cdr pairs))))))
Exercise 2.67. Define an encoding tree and a sample message:
(define sample-tree
(make-code-tree (make-leaf ’A 4)
(make-code-tree
(make-leaf ’B 2)
(make-code-tree (make-leaf ’D 1)
(make-leaf ’C 1)))))
(define sample-message ’(0 1 1 0 0 1 0 1 0 1 1 1 0))
Dostları ilə paylaş: |