Informally, a set is simply a collection of distinct objects. To give a more
precise definition we can
employ the method of data abstraction. That is, we define ‘‘set’’ by specifying the operations that are
to be used on sets. These are
union-set
,
intersection-set
,
element-of-set?
, and
adjoin-set
.
Element-of-set?
is a predicate that determines whether a given element is a
member of a set.
Adjoin-set
takes an object and a set as arguments and returns a set that contains
the elements of the original set and also the adjoined element.
Union-set
computes the union of two
sets, which is the set containing each element that appears in either argument.
Intersection-set
computes the intersection of two sets, which is the set containing only elements that appear in both
arguments. From the viewpoint of data abstraction, we are free to design any representation that
implements these operations in a way consistent with the interpretations given above.
37
Sets as unordered lists
One way to represent a set is as a list of its elements in which no element appears more than once. The
empty set is represented by the empty list. In this representation,
element-of-set?
is similar to
the procedure
memq
of section 2.3.1. It uses
equal?
instead of
eq?
so that the set elements need not
be symbols:
(define (element-of-set? x set)
(cond ((null? set) false)
((equal? x (car set)) true)
(else (element-of-set? x (cdr set)))))
Using this, we can write
adjoin-set
. If the object to be adjoined is already in the set, we just return
the set. Otherwise, we use
cons
to add the object to the list that represents the set:
(define (adjoin-set x set)
(if (element-of-set? x set)
set
(cons x set)))
For
intersection-set
we can use a recursive strategy. If we know how to form the intersection
of
set2
and the
cdr
of
set1
, we only need to decide whether to include the
car
of
set1
in this.
But this depends on whether
(car set1)
is also in
set2
. Here is the resulting procedure:
(define (intersection-set set1 set2)
(cond ((or (null? set1) (null? set2)) ’())
((element-of-set? (car set1) set2)
(cons (car set1)
(intersection-set (cdr set1) set2)))
(else (intersection-set (cdr set1) set2))))
In designing a representation, one of the issues we should be concerned with is efficiency. Consider
the number of steps required by our set operations. Since they all use
element-of-set?
, the speed
of this operation has a major impact on the efficiency of the set implementation as a whole. Now, in
order to check whether an object is a member of a set,
element-of-set?
may have to scan the
entire set. (In the worst case, the object turns out not to be in the set.) Hence, if the set has n elements,
element-of-set?
might take up to n steps. Thus, the number of steps required grows as (n).
The number of steps required by
adjoin-set
, which uses this operation, also grows as (n). For
intersection-set
, which does an
element-of-set?
check for each element of
set1
, the
number of steps required grows as the product of the sizes of the sets involved, or (n
2
) for two sets
of size
n.
The same will be true of
union-set
.
Exercise 2.59. Implement the
union-set
operation for the unordered-list representation of sets.
Exercise 2.60. We specified that a set would be represented as a list with no duplicates. Now suppose
we allow duplicates. For instance, the set {1,2,3} could be represented as the list
(2 3 2 1 3 2
2)
. Design procedures
element-of-set?
,
adjoin-set
,
union-set
, and
intersection-set
that operate on this representation. How does the efficiency of each compare
with the corresponding procedure for the non-duplicate representation? Are there applications for
which you would use this representation in preference to the non-duplicate one?
Sets as ordered lists
One way to speed up our set operations is to change the representation so that the set elements are
listed in increasing order. To do this, we need some way to compare two objects so that we can say
which is bigger. For example, we could compare symbols lexicographically, or we could agree on
some method for assigning a unique number to an object and then compare the elements by comparing
the corresponding numbers. To keep our discussion simple, we will consider only the case where the
set elements are numbers, so that we can compare elements using
>
and
<
. We will represent a set of
numbers by listing its elements in increasing order. Whereas our first representation above allowed us
to represent the set {1,3,6,10} by listing the elements in any order, our new representation allows only
the list
(1 3 6 10)
.
One advantage of ordering shows up in
element-of-set?
: In checking for the presence of an
item, we no longer have to scan the entire set. If we reach a set element that is larger than the item we
are looking for, then we know that the item is not in the set:
(define (element-of-set? x set)
(cond ((null? set) false)
((= x (car set)) true)
((< x (car set)) false)
(else (element-of-set? x (cdr set)))))
How many steps does this save? In the worst case, the item we are looking for may be the largest one
in the set, so the number of steps is the same as for the unordered representation. On the other hand, if
we search for items of many different sizes we can expect that sometimes we will be able to stop
searching at a point near the beginning of the list and that other times we will still need to examine
most of the list. On the average we should expect to have to examine about half of the items in the set.
Thus, the average number of steps required will be about n/2. This is still (n) growth, but it does
save us, on the average, a factor of 2 in number of steps over the previous implementation.
We obtain a more impressive speedup with
intersection-set
. In the unordered representation
this operation required (n
2
) steps, because we performed a complete scan of
set2
for each element
of
set1
. But with the ordered representation, we can use a more clever method.
Begin by comparing
the initial elements,
x1
and
x2
, of the two sets. If
x1
equals
x2
, then that gives an element of the
intersection, and the rest of the intersection is the intersection of the
cdr
s of the two sets. Suppose,
however, that
x1
is less than
x2
. Since
x2
is the smallest element in
set2
, we can immediately
conclude that
x1
cannot appear anywhere in
set2
and hence is not in the intersection. Hence, the
intersection is equal to the intersection of
set2
with the
cdr
of
set1
. Similarly, if
x2
is less than
x1
, then the intersection is given by the intersection of
set1
with the
cdr
of
set2
. Here is the
procedure: