Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	56/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 52 53 54 55 56 57 58 59 ... 219

Rules versus trees
A simple covering algorithm

the ﬁrst test in the rule, split the space vertically as shown in the center picture.

This gives the beginnings of a rule:

If x

> 1.2 then class = a

However, the rule covers many b’s as well as a’s, so a new test is added to the

rule by further splitting the space horizontally as shown in the third diagram:

If x

> 1.2 and y > 2.6 then class = a

This gives a rule covering all but one of the a’s. It’s probably appropriate to leave

it at that, but if it were felt necessary to cover the ﬁnal a, another rule would be

necessary—perhaps

If x

> 1.4 and y < 2.4 then class = a

The same procedure leads to two rules covering the b’s:

If x

£ 1.2 then class = b

If x

> 1.2 and y £ 2.6 then class = b

1 0 6

C H A P T E R 4

A LG O R I T H M S : T H E BA S I C M E T H O D S

Figure 4.6 Covering algorithm: (a) covering the instances and (b) the decision tree for

the same problem.

x > 1.2 ?

y > 2.6 ?

yes

(b)

P088407-Ch004.qxd 4/30/05 11:13 AM Page 106

4 . 4

C OV E R I N G A LG O R I T H M S : C O N S T RU C T I N G RU L E S

1 0 7

Again, one a is erroneously covered by these rules. If it were necessary to exclude

it, more tests would have to be added to the second rule, and additional rules

would need to be introduced to cover the b’s that these new tests exclude.

Rules versus trees

A top-down divide-and-conquer algorithm operates on the same data in a

manner that is, at least superﬁcially, quite similar to a covering algorithm. It

might ﬁrst split the dataset using the x attribute and would probably end up

splitting it at the same place, x

= 1.2. However, whereas the covering algorithm

is concerned only with covering a single class, the division would take both

classes into account, because divide-and-conquer algorithms create a single

concept description that applies to all classes. The second split might also be at

the same place, y

= 2.6, leading to the decision tree in Figure 4.6(b). This tree

corresponds exactly to the set of rules, and in this case there is no difference in

effect between the covering and the divide-and-conquer algorithms.

But in many situations there is a difference between rules and trees in terms

of the perspicuity of the representation. For example, when we described the

replicated subtree problem in Section 3.3, we noted that rules can be symmet-

ric whereas trees must select one attribute to split on ﬁrst, and this can lead to

trees that are much larger than an equivalent set of rules. Another difference is

that, in the multiclass case, a decision tree split takes all classes into account,

trying to maximize the purity of the split, whereas the rule-generating method

concentrates on one class at a time, disregarding what happens to the other

classes.

A simple covering algorithm

Covering algorithms operate by adding tests to the rule that is under construc-

tion, always striving to create a rule with maximum accuracy. In contrast, divide-

and-conquer algorithms operate by adding tests to the tree that is under

construction, always striving to maximize the separation among the classes.

Each of these involves ﬁnding an attribute to split on. But the criterion for the

best attribute is different in each case. Whereas divide-and-conquer algorithms

such as ID3 choose an attribute to maximize the information gain, the cover-

ing algorithm we will describe chooses an attribute–value pair to maximize the

probability of the desired classiﬁcation.

Figure 4.7 gives a picture of the situation, showing the space containing all

the instances, a partially constructed rule, and the same rule after a new term

has been added. The new term restricts the coverage of the rule: the idea is to

include as many instances of the desired class as possible and exclude as many

instances of other classes as possible. Suppose the new rule will cover a total of

t instances, of which p are positive examples of the class and t

- p are in other

P088407-Ch004.qxd 4/30/05 11:13 AM Page 107

classes—that is, they are errors made by the rule. Then choose the new term to

maximize the ratio p/t.

An example will help. For a change, we use the contact lens problem of Table

1.1. We will form rules that cover each of the three classes, hard, soft, and none,

in turn. To begin, we seek a rule:

If ? then recommendation

= hard

For the unknown term ?, we have nine choices:

age

= young

2/8

age

= pre-presbyopic

1/8

age

= presbyopic

1/8

spectacle prescription

= myope

3/12

spectacle prescription

= hypermetrope

1/12

astigmatism

= no

0/12

astigmatism

= yes

4/12

tear production rate

= reduced

0/12

tear production rate

= normal

4/12

The numbers on the right show the fraction of “correct” instances in the set

singled out by that choice. In this case, correct means that the recommendation is

hard. For instance, age

= young selects eight instances, two of which recommend

hard contact lenses, so the ﬁrst fraction is 2/8. (To follow this, you will need to

look back at the contact lens data in Table 1.1 on page 6 and count up the entries

in the table.) We select the largest fraction, 4/12, arbitrarily choosing between

the seventh and the last choice in the preceding list, and create the rule:

If astigmatism

= yes then recommendation = hard

This rule is an inaccurate one, getting only 4 instances correct out of the 12

that it covers, shown in Table 4.8. So we reﬁne it further:

If astigmatism

= yes and ? then recommendation = hard

1 0 8

C H A P T E R 4

A LG O R I T H M S : T H E BA S I C M E T H O D S

Figure 4.7 The instance space during operation of a covering algorithm.

P088407-Ch004.qxd 4/30/05 11:13 AM Page 108

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 52 53 54 55 56 57 58 59 ... 219