4 . 3
D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G D E C I S I O N T R E E S
9 9
Before we created any of the nascent tree structures in Figure 4.2, the train-
ing examples
at the root comprised nine yes and five
no nodes, corresponding
to an information value of
Thus the tree in Figure 4.2(a) is responsible for an information gain of
which can be interpreted as the informational value of creating a branch on the
outlook attribute.
The way forward is clear. We calculate the information gain for each attri-
bute and choose the one that gains the most information to split on. In the sit-
uation of Figure 4.2,
so we select outlook as the splitting attribute at the root of the tree. Hopefully
this accords with your intuition as the best one to select. It is the only choice
for which one daughter node is completely pure, and this gives it a considerable
advantage over the other attributes. Humidity is the next best choice because it
produces a larger daughter node that is almost completely pure.
Then we continue, recursively. Figure 4.3 shows the possibilities for a further
branch at the node reached when outlook is sunny. Clearly, a further split on
outlook will produce nothing new, so we only consider the other three attributes.
The information gain for each turns out to be
so we select humidity as the splitting attribute at this point. There is no need to
split these nodes any further, so this branch is finished.
Continued application of the same idea leads to the decision tree of Figure
4.4 for the weather data. Ideally, the process terminates when all leaf nodes are
pure, that is, when they contain instances that all have the same classification.
However, it might not be possible to reach this happy situation because there is
nothing to stop the training set containing two examples with identical sets of
attributes but different classes. Consequently, we stop when the data cannot be
split any further.
gain
bits
gain
bits
gain
bits,
temperature
humidity
windy
(
)
=
(
)
=
(
)
=
0 571
0 971
0 020
.
.
.
gain
bits
gain
bits
gain
bits
gain
bits,
outlook
temperature
humidity
windy
(
)
=
(
)
=
(
)
=
(
)
=
0 247
0 029
0 152
0 048
.
.
.
.
gain
info
info
bits,
outlook
(
)
=
[
]
(
)
-
[
] [
] [
]
(
)
=
-
=
9 5
2 3
4 0
3 2
0 940 0 693 0 247
,
, , , , ,
.
.
.
info
bits.
9 5
0 940
,
.
[
]
(
)
=
P088407-Ch004.qxd 4/30/05 11:13 AM Page 99
Calculating information
Now it is time to explain how to calculate the information measure that is used
as a basis for evaluating different splits. We describe the basic idea in this section,
then in the next we examine a correction that is usually made to counter a bias
toward selecting splits on attributes with large numbers of possible values.
Before examining the detailed formula for calculating the amount of infor-
mation required to specify the class of an example given that it reaches a tree
node with a certain number of yes’s and no’s, consider first the kind of proper-
ties we would expect this quantity to have:
1 0 0
C H A P T E R 4
|
A LG O R I T H M S : T H E BA S I C M E T H O D S
...
...
no
no
yes
sunny
hot
mild
cool
outlook
temperature
yes
no
(a)
...
...
no
no
no
yes
yes
sunny
high
normal
outlook
humidity
(b)
...
...
yes
yes
no
no
yes
no
sunny
false
true
outlook
windy
(c)
Figure 4.3 Expanded tree stumps for the weather data.
P088407-Ch004.qxd 4/30/05 11:13 AM Page 100
4 . 3
D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G D E C I S I O N T R E E S
1 0 1
1. When the number of either yes’s or no’s is zero, the information is
zero.
2. When the number of yes’s and no’s is equal, the information reaches a
maximum.
Moreover, the measure should be applicable to multiclass situations, not just to
two-class ones.
The information measure relates to the amount of information obtained by
making a decision, and a more subtle property of information can be derived
by considering the nature of decisions. Decisions can be made in a single stage,
or they can be made in several stages, and the amount of information involved
is the same in both cases. For example, the decision involved in
can be made in two stages. First decide whether it’s the first case or one of the
other two cases:
and then decide which of the other two cases it is:
In some cases the second decision will not need to be made, namely, when
the decision turns out to be the first one. Taking this into account leads to the
equation
info 2,3, 4
info 2,7
info 3, 4
[
]
(
)
=
[
]
(
)
+
(
)
¥
[
]
(
)
7 9
.
info 3, 4
[
]
(
)
info 2,7
[
]
(
)
info 2,3, 4
[
]
(
)
false
true
yes
yes
no
sunny
overcast
rainy
outlook
humidity
windy
high
normal
no
yes
Figure 4.4 Decision tree for the weather data.
P088407-Ch004.qxd 4/30/05 11:13 AM Page 101