Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə48/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   44   45   46   47   48   49   50   51   ...   219

makes real-life datasets interesting is that the attributes are certainly not equally

important or independent. But it leads to a simple scheme that again works sur-

prisingly well in practice.

Table 4.2 shows a summary of the weather data obtained by counting how

many times each attribute–value pair occurs with each value (yes and  no) for

play. For example, you can see from Table 1.2 that outlook is sunny for five exam-

ples, two of which have play

yes and three of which have play no. The cells

in the first row of the new table simply count these occurrences for all possible

values of each attribute, and the play figure in the final column counts the total

number of occurrences of yes and no. In the lower part of the table, we rewrote

the same information in the form of fractions, or observed probabilities. For

example, of the nine days that play is yes, outlook is  sunny for two, yielding a

fraction of 2/9. For play the fractions are different: they are the proportion of

days that play is yes and no, respectively.

Now suppose we encounter a new example with the values that are shown in

Table 4.3. We treat the five features in Table 4.2—outlook, temperature, humid-



ity, windy, and the overall likelihood that play is yes or no—as equally impor-

tant, independent pieces of evidence and multiply the corresponding fractions.

Looking at the outcome yes gives:

The fractions are taken from the yes entries in the table according to the values

of the attributes for the new day, and the final 9/14 is the overall fraction 

likelihood of yes

=

¥

¥



¥

¥

=



2 9 3 9 3 9 3 9 9 14

0 0053


.

.

4 . 2



S TAT I S T I C A L   M O D E L I N G

8 9


Table 4.2

The weather data with counts and probabilities.

Outlook


Temperature

Humidity


Windy

Play


yes

no

yes

no

yes

no

yes

no

yes

no

sunny


2

3

hot



2

2

high



3

4

false



6

2

9



5

overcast


4

0

mild



4

2

normal



6

1

true



3

3

rainy



3

2

cool



3

1

sunny



2/9

3/5


hot

2/9


2/5

high


3/9

4/5


false

6/9


2/5

9/14


5/14

overcast


4/9

0/5


mild

4/9


2/5

normal


6/9

1/5


true

3/9


3/5

rainy


3/9

2/5


cool

3/9


1/5

Table 4.3

A new day.

Outlook


Temperature

Humidity


Windy

Play


sunny

cool


high

true


?

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 89




representing the proportion of days on which play is yes. A similar calculation

for the outcome no leads to

This indicates that for the new dayno is more likely than yes—four times more

likely. The numbers can be turned into probabilities by normalizing them so

that they sum to 1:

This simple and intuitive method is based on Bayes’s rule of conditional prob-

ability. Bayes’s rule says that if you have a hypothesis and evidence that bears

on that hypothesis, then

We use the notation that Pr[A] denotes the probability of an event and that

Pr[A

|B] denotes the probability of conditional on another event B. The

hypothesis is that play will be, say, yes, and Pr[H

|E] is going to turn out to be

20.5%, just as determined previously. The evidence is the particular combi-

nation of attribute values for the new day, outlook

sunny, temperature cool,



humidity

high, and windy true. Let’s call these four pieces of evidence E

1

E



2

,

E

3

, and E



4

, respectively. Assuming that these pieces of evidence are independent

(given the class), their combined probability is obtained by multiplying the

probabilities:

Don’t worry about the denominator: we will ignore it and eliminate it in the

final normalizing step when we make the probabilities of yes and no sum to 1,

just as we did previously. The Pr[yes] at the end is the probability of a yes

outcome without knowing any of the evidence E, that is, without knowing any-

thing about the particular day referenced—it’s called the prior probability of the

hypothesis H. In this case, it’s just 9/14, because 9 of the 14 training examples

had a yes value for play. Substituting the fractions in Table 4.2 for the appro-

priate evidence probabilities leads to

Pr

Pr

yes E



E

[

]



=

¥

¥



¥

¥

[ ]



2 9 3 9 3 9 3 9 9 14

,

Pr 



Pr

Pr

Pr



Pr

Pr

Pr



yes E

E yes

E yes

E yes

E yes

yes

E

[

]



=

[

]



¥

[

]



¥

[

]



¥

[

]



¥

[

]



[ ]

1

2



3

4

.



Pr

Pr

Pr



Pr

H E

E H

H

E

[

]



=

[

] [ ]



[ ]

.

Probability of no



=

+

=



0 0206

0 0053 0 0206

79 5

.

.



.

. %.


Probability of yes

=

+



=

0 0053


0 0053 0 0206

20 5


.

.

.



. %,

likelihood of no

=

¥

¥



¥

¥

=



3 5 1 5

4 5 3 5 5 14 0 0206

.

.

9 0



C H A P T E R   4

|

A LG O R I T H M S : T H E   BA S I C   M E T H O D S



P088407-Ch004.qxd  4/30/05  11:13 AM  Page 90


Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   44   45   46   47   48   49   50   51   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə