Then it is necessary to break the symmetry and choose
a single test for the root
node. If, for example, a is chosen, the second rule must, in effect, be repeated
twice in the tree, as shown in Figure 3.2. This is known as the replicated subtree
problem.
The replicated subtree problem is sufficiently important that it is worth
looking at a couple more examples. The diagram on the left of Figure 3.3 shows
an exclusive-or function for which the output is a if x
= 1 or y = 1 but not both.
To make this into a tree, you have to split on one attribute first, leading to a
structure like the one shown in the center. In contrast, rules can faithfully reflect
the true symmetry of the problem with respect to the attributes, as shown on
the right.
6 6
C H A P T E R 3
|
O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N
a
b
y
c
n
x
y
c
n
d
y
n
x
y
n
d
y
n
x
y
n
Figure 3.2 Decision tree for a simple disjunction.
P088407-Ch003.qxd 4/30/05 11:09 AM Page 66
In this example the rules are not notably more compact than the tree. In fact,
they are just what you would get by reading rules off the tree in the obvious
way. But in other situations, rules are much more compact than trees, particu-
larly if it is possible to have a “default” rule that covers cases not specified by the
other rules. For example, to capture the effect of the rules in Figure 3.4—in
which there are four attributes, x, y, z, and w, that can each be 1, 2, or 3—requires
the tree shown on the right. Each of the three small gray triangles to the upper
right should actually contain the whole three-level subtree that is displayed in
gray, a rather extreme example of the replicated subtree problem. This is a dis-
tressingly complex description of a rather simple concept.
One reason why rules are popular is that each rule seems to represent an inde-
pendent “nugget” of knowledge. New rules can be added to an existing rule set
without disturbing ones already there, whereas to add to a tree structure may
require reshaping the whole tree. However, this independence is something of
an illusion, because it ignores the question of how the rule set is executed. We
explained earlier (on page 11) the fact that if rules are meant to be interpreted
in order as a “decision list,” some of them, taken individually
and out of context,
may be incorrect. On the other hand, if the order of interpretation is supposed
to be immaterial, then it is not clear what to do when different rules lead to dif-
ferent conclusions for the same instance. This situation cannot arise for rules
that are read directly off a decision tree because the redundancy included in the
structure of the rules prevents any ambiguity in interpretation. But it does arise
when rules are generated in other ways.
If a rule set gives multiple classifications for a particular example, one solu-
tion is to give no conclusion at all. Another is to count how often each rule fires
on the training data and go with the most popular one. These strategies can lead
3 . 3
C L A S S I F I C AT I O N RU L E S
6 7
a
b
1
0
b
a
0
1
x = 1 ?
y = 1 ?
no
y = 1 ?
yes
b
no
a
yes
a
no
b
yes
If x=1 and y=0 then class = a
If x=0 and y=1 then class = a
If x=0 and y=0 then class = b
If x=1 and y=1 then class = b
Figure 3.3 The exclusive-or problem.
P088407-Ch003.qxd 4/30/05 11:09 AM Page 67
to radically different results. A different problem
occurs when an instance is
encountered that the rules fail to classify at all. Again, this cannot occur with
decision trees, or with rules read directly off them, but it can easily happen with
general rule sets. One way of dealing with this situation is to fail to classify such
an example; another is to choose the most frequently occurring class as a default.
Again, radically different results may be obtained for these strategies. Individ-
ual rules are simple, and sets of rules seem deceptively simple—but given just
a set of rules with no additional information, it is not clear how it should be
interpreted.
A particularly straightforward situation occurs when rules lead to a class that
is Boolean (say, yes and no) and when only rules leading to one outcome (say,
yes) are expressed. The assumption is that if a particular instance is not in class
6 8
C H A P T E R 3
|
O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N
x
y
1
2
3
a
1
z
2
3
w
1
b
2
b
3
a
1
b
2
b
3
If x=1 and y=1 then class = a
If z=1 and w=1 then class = a
Otherwise class = b
Figure 3.4 Decision tree with a replicated subtree.
P088407-Ch003.qxd 4/30/05 11:09 AM Page 68