makes real-life datasets interesting is that the attributes
are certainly not equally
important or independent. But it leads to a simple scheme that again works sur-
prisingly well in practice.
Table 4.2 shows a summary of the weather data obtained by counting how
many times each attribute–value pair occurs with each value (yes and no) for
play. For example, you can see from Table 1.2 that outlook is sunny for five exam-
ples, two of which have play
= yes and three of which have play = no. The cells
in the first row of the new table simply count these occurrences for all possible
values of each attribute, and the play figure in the final column counts the total
number of occurrences of yes and no. In the lower part of the table, we rewrote
the same information in the form of fractions, or observed probabilities. For
example, of the nine days that play is yes, outlook is sunny for two, yielding a
fraction of 2/9. For play the fractions are different: they are the proportion of
days that play is yes and no, respectively.
Now suppose we encounter a new example with the values that are shown in
Table 4.3. We treat the five features in Table 4.2—outlook, temperature, humid-
ity, windy, and the overall likelihood that
play is
yes or
no—as equally impor-
tant, independent pieces of evidence and multiply the corresponding fractions.
Looking at the outcome yes gives:
The fractions are taken from the yes entries in the table according to the values
of the attributes for the new day, and the final 9/14 is the overall fraction
likelihood of yes
=
¥
¥
¥
¥
=
2 9 3 9 3 9 3 9 9 14
0 0053
.
.
4 . 2
S TAT I S T I C A L M O D E L I N G
8 9
Table 4.2
The weather data with counts and probabilities.
Outlook
Temperature
Humidity
Windy
Play
yes
no
yes
no
yes
no
yes
no
yes
no
sunny
2
3
hot
2
2
high
3
4
false
6
2
9
5
overcast
4
0
mild
4
2
normal
6
1
true
3
3
rainy
3
2
cool
3
1
sunny
2/9
3/5
hot
2/9
2/5
high
3/9
4/5
false
6/9
2/5
9/14
5/14
overcast
4/9
0/5
mild
4/9
2/5
normal
6/9
1/5
true
3/9
3/5
rainy
3/9
2/5
cool
3/9
1/5
Table 4.3
A new day.
Outlook
Temperature
Humidity
Windy
Play
sunny
cool
high
true
?
P088407-Ch004.qxd 4/30/05 11:13 AM Page 89
representing the proportion of days on which
play is
yes. A similar calculation
for the outcome no leads to
This indicates that for the new day, no is more likely than yes—four times more
likely. The numbers can be turned into probabilities by normalizing them so
that they sum to 1:
This simple and intuitive method is based on Bayes’s rule of conditional prob-
ability. Bayes’s rule says that if you have a hypothesis H and evidence E that bears
on that hypothesis, then
We use the notation that Pr[A] denotes the probability of an event A and that
Pr[A
|B] denotes the probability of A conditional on another event B. The
hypothesis H is that play will be, say, yes, and Pr[H
|E] is going to turn out to be
20.5%, just as determined previously. The evidence E is the particular combi-
nation of attribute values for the new day, outlook
= sunny, temperature = cool,
humidity
= high, and windy = true. Let’s call these four pieces of evidence E
1
, E
2
,
E
3
, and E
4
, respectively. Assuming that these pieces of evidence are independent
(given the class), their combined probability is obtained by multiplying the
probabilities:
Don’t worry about the denominator: we will ignore it and eliminate it in the
final normalizing step when we make the probabilities of yes and no sum to 1,
just as we did previously. The Pr[yes] at the end is the probability of a yes
outcome without knowing any of the evidence E, that is, without knowing any-
thing about the particular day referenced—it’s called the prior probability of the
hypothesis H. In this case, it’s just 9/14, because 9 of the 14 training examples
had a yes value for play. Substituting the fractions in Table 4.2 for the appro-
priate evidence probabilities leads to
Pr
Pr
yes E
E
[
]
=
¥
¥
¥
¥
[ ]
2 9 3 9 3 9 3 9 9 14
,
Pr
Pr
Pr
Pr
Pr
Pr
Pr
yes E
E yes
E yes
E yes
E yes
yes
E
[
]
=
[
]
¥
[
]
¥
[
]
¥
[
]
¥
[
]
[ ]
1
2
3
4
.
Pr
Pr
Pr
Pr
H E
E H
H
E
[
]
=
[
] [ ]
[ ]
.
Probability of no
=
+
=
0 0206
0 0053 0 0206
79 5
.
.
.
. %.
Probability of
yes
=
+
=
0 0053
0 0053 0 0206
20 5
.
.
.
. %,
likelihood of no
=
¥
¥
¥
¥
=
3 5 1 5
4 5 3 5 5 14 0 0206
.
.
9 0
C H A P T E R 4
|
A LG O R I T H M S : T H E BA S I C M E T H O D S
P088407-Ch004.qxd 4/30/05 11:13 AM Page 90