set of numbers (the “one less than” is to do with
the number of degrees of
freedom in the sample, a statistical notion that we don’t want to get into here).
The probability density function for a normal distribution with mean
m and
standard deviation
s is given by the rather formidable expression:
But fear not! All this means is that if we are considering a yes outcome when
temperature has a value, say, of 66, we just need to plug x
= 66, m = 73, and s =
6.2 into the formula. So the value of the probability density function is
By the same token, the probability density of a yes outcome when humidity has
value, say, of 90 is calculated in the same way:
The probability density function for an event is very closely related to its prob-
ability. However, it is not quite the same thing. If temperature is a continuous
scale, the probability of the temperature being exactly 66—or exactly any other
value, such as 63.14159262—is zero. The real meaning of the density function
f(x) is that the probability that the quantity lies within a small region around x,
say, between x
- e/2 and x + e/2, is e f(x). What we have written above is correct
f humidity
yes
=
(
)
=
90
0 0221
.
.
f temperature
yes
e
=
(
)
=
◊
=
-
(
)
◊
66
1
2
6 2
0 0340
66 73
2 6 2
2
2
p
.
.
.
.
f x
e
x
(
)
=
-
(
)
1
2
2
2
2
ps
m
s
.
4 . 2
S TAT I S T I C A L M O D E L I N G
9 3
Table 4.4
The numeric weather data with summary statistics.
Outlook
Temperature
Humidity
Windy
Play
yes
no
yes
no
yes
no
yes
no
yes
no
sunny
2
3
83
85
86
85
false
6
2
9
5
overcast
4
0
70
80
96
90
true
3
3
rainy
3
2
68
65
80
70
64
72
65
95
69
71
70
91
75
80
75
70
72
90
81
75
sunny
2/9
3/5
mean
73
74.6
mean
79.1
86.2
false
6/9
2/5
9/14
5/14
overcast
4/9
0/5
std. dev.
6.2
7.9
std. dev.
10.2
9.7
true
3/9
3/5
rainy
3/9
2/5
P088407-Ch004.qxd 4/30/05 11:13 AM Page 93
if temperature is measured to the nearest degree and humidity is measured to
the nearest percentage point. You might think we ought to factor in the accu-
racy figure
e when using these probabilities, but that’s not necessary. The same
e would appear in both the yes and no likelihoods that follow and cancel out
when the probabilities were calculated.
Using these probabilities for the new day in Table 4.5 yields
which leads to probabilities
These figures are very close to the probabilities calculated earlier for the new
day in Table 4.3, because the temperature and humidity values of 66 and 90 yield
similar probabilities to the cool and high values used before.
The normal-distribution assumption makes it easy to extend the Naïve Bayes
classifier to deal with numeric attributes. If the values of any numeric attributes
are missing, the mean and standard deviation calculations are based only on the
ones that are present.
Bayesian models for document classification
One important domain for machine learning is document classification, in
which each instance represents a document and the instance’s class is the doc-
ument’s topic. Documents might be news items and the classes might be domes-
tic news, overseas news, financial news, and sport. Documents are characterized
by the words that appear in them, and one way to apply machine learning to
document classification is to treat the presence or absence of each word as
a Boolean attribute. Naïve Bayes is a popular technique for this application
because it is very fast and quite accurate.
However, this does not take into account the number of occurrences of each
word, which is potentially useful information when determining the category
Probability of no
=
+
=
0 000108
0 000036 0 000108
75 0
.
.
.
. %.
Probability of
yes
=
+
=
0 000036
0 000036 0 000108
25 0
.
.
.
. %,
likelihood of
likelihood of
yes
no
=
¥
¥
¥
¥
=
=
¥
¥
¥
¥
=
2 9 0 0340 0 0221 3 9 9 14
0 000036
3 5 0 0221 0 0381 3 5 5 14
0 000108
.
.
.
,
.
.
.
;
9 4
C H A P T E R 4
|
A LG O R I T H M S : T H E BA S I C M E T H O D S
Table 4.5
Another new day.
Outlook
Temperature
Humidity
Windy
Play
sunny
66
90
true
?
P088407-Ch004.qxd 4/30/05 11:13 AM Page 94