Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə59/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   55   56   57   58   59   60   61   62   ...   219

4 . 5

M I N I N G   A S S O C I AT I O N   RU L E S

1 1 3

accuracy (the same number expressed as a proportion of the number of

instances to which the rule applies). This approach is quite infeasible. (Note that,

as we mentioned in Section 3.4, what we are calling coverage is often called

support and what we are calling accuracy is often called confidence.)

Instead, we capitalize on the fact that we are only interested in association

rules with high coverage. We ignore, for the moment, the distinction between

the left- and right-hand sides of a rule and seek combinations of attribute–value

pairs that have a prespecified minimum coverage. These are called item sets: an

attribute–value pair is an item. The terminology derives from market basket

analysis, in which the items are articles in your shopping cart and the super-

market manager is looking for associations among these purchases.



Item sets

The first column of Table 4.10 shows the individual items for the weather data

of Table 1.2, with the number of times each item appears in the dataset given

at the right. These are the one-item sets. The next step is to generate the two-

item sets by making pairs of one-item ones. Of course, there is no point in 

generating a set containing two different values of the same attribute (such as



outlook 

sunny and outlook overcast), because that cannot occur in any actual

instance.

Assume that we seek association rules with minimum coverage 2: thus we

discard any item sets that cover fewer than two instances. This leaves 47 two-

item sets, some of which are shown in the second column along with the

number of times they appear. The next step is to generate the three-item sets,

of which 39 have a coverage of 2 or greater. There are 6 four-item sets, and no

five-item sets—for this data, a five-item set with coverage 2 or greater could only

correspond to a repeated instance. The first row of the table, for example, shows

that there are five days when outlook

sunny, two of which have temperature =



mild, and, in fact, on both of those days humidity

high and play no as well.



Association rules

Shortly we will explain how to generate these item sets efficiently. But first let

us finish the story. Once all item sets with the required coverage have been gen-

erated, the next step is to turn each into a rule, or set of rules, with at least the

specified minimum accuracy. Some item sets will produce more than one rule;

others will produce none. For example, there is one three-item set with a cov-

erage of 4 (row 38 of Table 4.10):

humidity 

= normal, windy = false, play = yes

This set leads to seven potential rules:

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 113



1 1 4

C H A P T E R   4

|

A LG O R I T H M S : T H E   BA S I C   M E T H O D S



Table 4.10

Item sets for the weather data with coverage 2 or greater.

One-item sets

Two-item sets

Three-item sets

Four-item sets

1

outlook 



= sunny (5)

outlook 


= sunny

outlook 


= sunny

outlook 


= sunny

temperature 

= mild (2)

temperature 

= hot

temperature 



= hot

humidity 

= high (2)

humidity 

= high

play 


= no (2)

2

outlook 



= overcast (4)

outlook 


= sunny

outlook 


= sunny

outlook 


= sunny

temperature 

= hot (2)

temperature 

= hot

humidity 



= high

play 


= no (2)

windy 


= false

play 


= no (2)

3

outlook 



= rainy (5)

outlook 


= sunny

outlook 


= sunny

outlook 


= overcast

humidity 

= normal (2)

humidity 

= normal

temperature 

= hot

play 


= yes (2)

windy 


= false

play 


= yes (2)

4

temperature 



= cool (4)

outlook 


= sunny

outlook 


= sunny

outlook 


= rainy

humidity 

= high (3)

humidity 

= high

temperature 



= mild

windy 


= false (2)

windy 


= false

play 


= yes (2)

5

temperature 



= mild (6)

outlook 


= sunny

outlook 


= sunny

outlook 


= rainy

windy 


= true (2)

humidity 

= high

humidity 



= normal

play 


= no (3)

windy 


= false

play 


= yes (2)

6

temperature 



= hot (4)

outlook 


= sunny

outlook 


= sunny

temperature 

= cool

windy 


= false (3)

windy 


= false

humidity 

= normal

play 


= no (2)

windy 


= false

play 


= yes (2)

7

humidity 



= normal (7)

outlook 


= sunny

outlook 


= overcast

play 


= yes (2)

temperature 

= hot

windy 


= false (2)

8

humidity 



= high (7)

outlook 


= sunny

outlook 


= overcast

play 


= no (3)

temperature 

= hot

play 


= yes (2)

9

windy 



= true (6)

outlook 


= overcast

outlook 


= overcast

temperature 

= hot (2)

humidity 

= normal

play 


= yes (2)

10

windy 



= false (8)

outlook 


= overcast

outlook 


= overcast

humidity 

= normal (2)

humidity 

= high

play 


= yes (2)

11

play 



= yes (9)

outlook 


= overcast

outlook 


= overcast

humidity 

= high (2)

windy 


= true

play 


= yes (2)

12

play 



= no (5)

outlook 


= overcast

outlook 


= overcast

windy 


= true (2)

windy 


= false

play 


= yes (2)

13

outlook 



= overcast

outlook 


= rainy

windy 


= false (2)

temperature 

= cool

humidity 



= normal (2)

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 114




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   55   56   57   58   59   60   61   62   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə