with
a balanced tree, adding elements with
adjoin-set
may produce an unbalanced result. Since
the position of a newly adjoined element depends on how the element compares with the items already
in the set, we can expect that if we add elements ‘‘randomly’’ the tree will tend to be balanced on the
average. But this is not a guarantee. For example, if we start with an empty set and adjoin the numbers
1 through 7 in sequence we end up with the highly unbalanced tree shown in figure 2.17. In this tree
all the left subtrees are empty, so it has no advantage over a simple ordered list. One way to solve this
problem is to define an operation that transforms an arbitrary tree into a balanced tree with the same
elements. Then we can perform this transformation after every few
adjoin-set
operations to keep
our set in balance. There are also other ways to solve this problem, most of which involve designing
new data structures for which searching and insertion both can be done in (
log
n) steps.
40
Figure 2.17: Unbalanced tree produced by adjoining 1 through 7 in sequence.
Figure 2.17: Unbalanced tree produced by adjoining 1 through 7 in sequence.
Exercise 2.63. Each of the following two procedures converts a binary tree to a list.
(define (tree->list-1 tree)
(if (null? tree)
’()
(append (tree->list-1 (left-branch tree))
(cons (entry tree)
(tree->list-1 (right-branch tree))))))
(define (tree->list-2 tree)
(define (copy-to-list tree result-list)
(if (null? tree)
result-list
(copy-to-list (left-branch tree)
(cons (entry tree)
(copy-to-list (right-branch tree)
result-list)))))
(copy-to-list tree ’()))
a. Do the two procedures produce the same result for every tree? If not, how do the results differ?
What lists do the two procedures produce for the trees in figure 2.16?
b. Do the two procedures have the same order of growth in the number of steps required to convert a
balanced tree with n elements to a list? If not, which one grows more slowly?
Exercise 2.64. The
following procedure
list->tree
converts an ordered list to a balanced binary
tree. The helper procedure
partial-tree
takes as arguments an integer n and list of at least n
elements and constructs a balanced tree containing the first n elements of the list. The result returned
by
partial-tree
is a pair (formed with
cons
) whose
car
is the constructed tree and whose
cdr
is the list of elements not included in the tree.
(define (list->tree elements)
(car (partial-tree elements (length elements))))
(define (partial-tree elts n)
(if (= n 0)
(cons ’() elts)
(let ((left-size (quotient (- n 1) 2)))
(let ((left-result (partial-tree elts left-size)))
(let ((left-tree (car left-result))
(non-left-elts (cdr left-result))
(right-size (- n (+ left-size 1))))
(let ((this-entry (car non-left-elts))
(right-result (partial-tree (cdr non-left-elts)
right-size)))
(let ((right-tree (car right-result))
(remaining-elts (cdr right-result)))
(cons (make-tree this-entry left-tree right-tree)
remaining-elts))))))))
a. Write a short paragraph explaining as clearly as you can how
partial-tree
works. Draw the
tree produced by
list->tree
for the list
(1 3 5 7 9 11)
.
b. What is the order of growth in the number of steps required by
list->tree
to convert a list of n
elements?
Exercise 2.65. Use the results of exercises 2.63 and 2.64 to give (n) implementations of
union-set
and
intersection-set
for sets implemented as (balanced) binary trees.
41
Sets and information retrieval
We have examined options for using lists to represent sets and have seen how the choice of
representation for a data object can have a large impact on the performance of the programs that use
the data. Another reason for concentrating on sets is that the techniques discussed here appear again
and again in applications involving information retrieval.
Consider a data base containing a large number of individual records, such as the personnel files for a
company or the transactions in an accounting system. A typical data-management system spends a
large amount of time accessing or modifying the data in the records and therefore requires an efficient
method for accessing records. This is done by identifying a part of each record to serve as an
identifying key. A key can be anything that uniquely identifies the record. For a personnel file, it might
be an employee’s ID number. For an accounting system, it might be a transaction number. Whatever
the key is, when we define the record as a data structure we should include a
key
selector procedure
that retrieves the key associated with a given record.