Alternatively, a
three-way split may be used, in which case there are several dif-
ferent possibilities. If missing value is treated as an attribute value in its own
right, that will create a third branch. An alternative for an integer-valued attrib-
ute would be a three-way split into less than, equal to, and greater than. An alter-
native for a real-valued attribute, for which equal to is not such a meaningful
option, would be to test against an interval rather than a single constant, again
giving a three-way split: below, within, and above. A numeric attribute is often
tested several times in any given path down the tree from root to leaf, each test
involving a different constant. We return to this when describing the handling
of numeric attributes in Section 6.1.
Missing values pose an obvious problem. It is not clear which branch should
be taken when a node tests an attribute whose value is missing. Sometimes, as
described in Section 2.4, missing value is treated as an attribute value in its own
right. If this is not the case, missing values should be treated in a special way
rather than being considered as just another possible value that the attribute
might take. A simple solution is to record the number of elements in the train-
ing set that go down each branch and to use the most popular branch if the
value for a test instance is missing.
A more sophisticated solution is to notionally split the instance into pieces
and send part of it down each branch and from there right on down to the leaves
of the subtrees involved. The split is accomplished using a numeric weight
between zero and one, and the weight for a branch is chosen to be proportional
to the number of training instances going down that branch, all weights
summing to one. A weighted instance may be further split at a lower node. Even-
tually, the various parts of the instance will each reach a leaf node, and the deci-
sions at these leaf nodes must be recombined using the weights that have
percolated down to the leaves. We return to this in Section 6.1.
It is instructive and can even be entertaining to build a decision tree for a
dataset manually. To do so effectively, you need a good way of visualizing the
data so that you can decide which are likely to be the best attributes to test and
what an appropriate test might be. The Weka Explorer, described in Part II, has
a User Classifier facility that allows users to construct a decision tree interac-
tively. It presents you with a scatter plot of the data against two selected attrib-
utes, which you choose. When you find a pair of attributes that discriminates
the classes well, you can create a two-way split by drawing a polygon around the
appropriate data points on the scatter plot.
For example, in Figure 3.1(a) the user is operating on a dataset with three
classes, the iris dataset, and has found two attributes, petallength and petalwidth,
that do a good job of splitting up the classes. A rectangle has been drawn, man-
ually, to separate out one of the classes (Iris versicolor). Then the user switches
to the decision tree view in Figure 3.1(b) to see the tree so far. The left-hand
leaf node contains predominantly irises of one type (Iris versicolor, contami-
3 . 2
D E C I S I O N T R E E S
6 3
P088407-Ch003.qxd 4/30/05 11:09 AM Page 63
6 4
C H A P T E R 3
|
O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N
(a)
(b)
Figure 3.1 Constructing a decision tree interactively: (a) creating a rectangular test
involving petallength and petalwidth and (b) the resulting (unfinished) decision tree.
P088407-Ch003.qxd 4/30/05 11:09 AM Page 64
nated by only two
virginicas); the right-hand one contains predominantly two
types (Iris setosa and virginica, contaminated by only two versicolors). The user
will probably select the right-hand leaf and work on it next, splitting it further
with another rectangle—perhaps based on a different pair of attributes
(although, from Figure 3.1[a], these two look pretty good).
Section 10.2 explains how to use Weka’s User Classifier facility. Most people
enjoy making the first few decisions but rapidly lose interest thereafter, and one
very useful option is to select a machine learning method and let it take over at
any point in the decision tree. Manual construction of decision trees is a good
way to get a feel for the tedious business of evaluating different combinations
of attributes to split on.
3.3 Classification rules
Classification rules are a popular alternative to decision trees, and we have
already seen examples for the weather (page 10), contact lens (page 13), iris
(page 15), and soybean (page 18) datasets. The antecedent, or precondition, of
a rule is a series of tests just like the tests at nodes in decision trees, and the con-
sequent, or conclusion, gives the class or classes that apply to instances covered
by that rule, or perhaps gives a probability distribution over the classes. Gener-
ally, the preconditions are logically ANDed together, and all the tests must
succeed if the rule is to fire. However, in some rule formulations the precondi-
tions are general logical expressions rather than simple conjunctions. We often
think of the individual rules as being effectively logically ORed together: if any
one applies, the class (or probability distribution) given in its conclusion is
applied to the instance. However, conflicts arise when several rules with differ-
ent conclusions apply; we will return to this shortly.
It is easy to read a set of rules directly off a decision tree. One rule is gener-
ated for each leaf. The antecedent of the rule includes a condition for every node
on the path from the root to that leaf, and the consequent of the rule is the
class assigned by the leaf. This procedure produces rules that are unambigu-
ous in that the order in which they are executed is irrelevant. However, in
general, rules that are read directly off a decision tree are far more complex than
necessary, and rules derived from trees are usually pruned to remove redundant
tests.
Because decision trees cannot easily express the disjunction implied among
the different rules in a set, transforming a general set of rules into a tree is not
quite so straightforward. A good illustration of this occurs when the rules have
the same structure but different attributes, like:
If a and b then x
If c and d then x
3 . 3
C L A S S I F I C AT I O N RU L E S
6 5
P088407-Ch003.qxd 4/30/05 11:09 AM Page 65