Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	58/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 54 55 56 57 58 59 60 61 ... 219

Rules versus decision lists
4.5 Mining association rules

4 . 4

C OV E R I N G A LG O R I T H M S : C O N S T RU C T I N G RU L E S

1 1 1

that it assigns cases to the class in question that actually do not have that class.

PRISM continues adding clauses to each rule until it is perfect: its accuracy is

100%. Figure 4.8 gives a summary of the algorithm. The outer loop iterates over

the classes, generating rules for each class in turn. Note that we reinitialize to

the full set of examples each time round. Then we create rules for that class and

remove the examples from the set until there are none of that class left. When-

ever we create a rule, start with an empty rule (which covers all the examples),

and then restrict it by adding tests until it covers only examples of the desired

class. At each stage choose the most promising test, that is, the one that maxi-

mizes the accuracy of the rule. Finally, break ties by selecting the test with great-

est coverage.

Rules versus decision lists

Consider the rules produced for a particular class, that is, the algorithm in Figure

4.8 with the outer loop removed. It seems clear from the way that these rules

are produced that they are intended to be interpreted in order, that is, as a deci-

sion list, testing the rules in turn until one applies and then using that. This is

because the instances covered by a new rule are removed from the instance set

as soon as the rule is completed (in the third line from the end of the code in

Figure 4.8): thus subsequent rules are designed for instances that are not covered

by the rule. However, although it appears that we are supposed to check the rules

in turn, we do not have to do so. Consider that any subsequent rules generated

for this class will have the same effect—they all predict the same class. This

means that it does not matter what order they are executed in: either a rule will

For each class C

Initialize E to the instance set

While E contains instances in class C

Create a rule R with an empty left-hand side that predicts class C

Until R is perfect (or there are no more attributes to use) do

For each attribute A not mentioned in R, and each value v,

Consider adding the condition A=v to the LHS of R

Select A and v to maximize the accuracy p/t

(break ties by choosing the condition with the largest p)

Add A=v to R

Remove the instances covered by R from E

Figure 4.8 Pseudocode for a basic rule learner.

P088407-Ch004.qxd 4/30/05 11:13 AM Page 111

be found that covers this instance, in which case the class in question is pre-

dicted, or no such rule is found, in which case the class is not predicted.

Now return to the overall algorithm. Each class is considered in turn, and

rules are generated that distinguish instances in that class from the others. No

ordering is implied between the rules for one class and those for another. Con-

sequently, the rules that are produced can be executed independent of order.

As described in Section 3.3, order-independent rules seem to provide more

modularity by each acting as independent nuggets of “knowledge,” but they

suffer from the disadvantage that it is not clear what to do when conﬂicting

rules apply. With rules generated in this way, a test example may receive multi-

ple classiﬁcations, that is, rules that apply to different classes may accept it. Other

test examples may receive no classiﬁcation at all. A simple strategy to force a

decision in these ambiguous cases is to choose, from the classiﬁcations that are

predicted, the one with the most training examples or, if no classiﬁcation is pre-

dicted, to choose the category with the most training examples overall. These

difﬁculties do not occur with decision lists because they are meant to be inter-

preted in order and execution stops as soon as one rule applies: the addition of

a default rule at the end ensures that any test instance receives a classiﬁcation.

It is possible to generate good decision lists for the multiclass case using a slightly

different method, as we shall see in Section 6.2.

Methods such as PRISM can be described as separate-and-conquer algo-

rithms: you identify a rule that covers many instances in the class (and excludes

ones not in the class), separate out the covered instances because they are already

taken care of by the rule, and continue the process on those that are left. This

contrasts nicely with the divide-and-conquer approach of decision trees. The

separate step greatly increases the efﬁciency of the method because the instance

set continually shrinks as the operation proceeds.

4.5 Mining association rules

Association rules are like classiﬁcation rules. You could ﬁnd them in the same

way, by executing a divide-and-conquer rule-induction procedure for each pos-

sible expression that could occur on the right-hand side of the rule. But not only

might any attribute occur on the right-hand side with any possible value; a

single association rule often predicts the value of more than one attribute. To

ﬁnd such rules, you would have to execute the rule-induction procedure once

for every possible combination of attributes, with every possible combination of

values, on the right-hand side. That would result in an enormous number

of association rules, which would then have to be pruned down on the basis of

their coverage (the number of instances that they predict correctly) and their

1 1 2

C H A P T E R 4

A LG O R I T H M S : T H E BA S I C M E T H O D S

P088407-Ch004.qxd 4/30/05 11:13 AM Page 112

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 54 55 56 57 58 59 60 61 ... 219