Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə41/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   37   38   39   40   41   42   43   44   ...   219

If petal length 

≥ 2.45 and petal length < 4.45 then Iris versicolor

If petal length 

≥ 2.45 and petal length < 4.95 and 

petal width 

< 1.55 then Iris versicolor

These rules require modification so that the new instance can be 

treated correctly. However, simply changing the bounds for the attribute-

value tests in these rules may not suffice because the instances used to create the

rule set may then be misclassified. Fixing up a rule set is not as simple as it

sounds.


Instead of changing the tests in the existing rules, an expert might be con-

sulted to explain why the new flower violates them, receiving explanations that

could be used to extend the relevant rules only. For example, the first of these

two rules misclassifies the new Iris setosa as an instance of the genus Iris versi-



color. Instead of altering the bounds on any of the inequalities in the rule, an

exception can be made based on some other attribute:

If petal length 

≥ 2.45 and petal length < 4.45 then 

Iris versicolor EXCEPT if petal width 

< 1.0 then Iris setosa

This rule says that a flower is Iris versicolor if its petal length is between 2.45 cm

and 4.45 cm except when its petal width is less than 1.0 cm, in which case it is

Iris setosa.

Of course, we might have exceptions to the exceptions, exceptions to 

these, and so on, giving the rule set something of the character of a tree. As 

well as being used to make incremental changes to existing rule sets, rules with

exceptions can be used to represent the entire concept description in the first

place.


Figure 3.5 shows a set of rules that correctly classify all examples in the Iris

dataset given earlier (pages 15–16). These rules are quite difficult to compre-

hend at first. Let’s follow them through. A default outcome has been chosenIris

setosa, and is shown in the first line. For this dataset, the choice of default is

rather arbitrary because there are 50 examples of each type. Normally, the most

frequent outcome is chosen as the default.

Subsequent rules give exceptions to this default. The first if . . . then, on lines

2 through 4, gives a condition that leads to the classification Iris versicolor.

However, there are two exceptions to this rule (lines 5 through 8), which we will

deal with in a moment. If the conditions on lines 2 and 3 fail, the else clause on

line 9 is reached, which essentially specifies a second exception to the original

default. If the condition on line 9 holds, the classification is Iris virginica (line

10). Again, there is an exception to this rule (on lines 11 and 12).

Now return to the exception on lines 5 through 8. This overrides the Iris ver-

sicolor conclusion on line 4 if either of the tests on lines 5 and 7 holds. As it

happens, these two exceptions both lead to the same conclusionIris virginica

3 . 5

RU L E S   W I T H   E XC E P T I O N S



7 1

P088407-Ch003.qxd  4/30/05  11:09 AM  Page 71




(lines 6 and 8). The final exception is the one on lines 11 and 12, which over-

rides the Iris virginica conclusion on line 10 when the condition on line 11 is

met, and leads to the classification Iris versicolor.

You will probably need to ponder these rules for some minutes before it

becomes clear how they are intended to be read. Although it takes some time 

to get used to reading them, sorting out the excepts and if . . . then . . .



elses becomes easier with familiarity. People often think of real problems in

terms of rules, exceptions, and exceptions to the exceptions, so it is often a good

way to express a complex rule set. But the main point in favor of this way of

representing rules is that it scales up well. Although the whole rule set is a little

hard to comprehend, each individual conclusion, each individual then state-

ment, can be considered just in the context of the rules and exceptions that lead

to it; whereas with decision lists, all prior rules need to be reviewed to deter-

mine the precise effect of an individual rule. This locality property is crucial

when trying to understand large rule sets. Psychologically, people familiar with

the data think of a particular set of cases, or kind of case, when looking at any

one conclusion in the exception structure, and when one of these cases turns

out to be an exception to the conclusion, it is easy to add an except clause to

cater for it.

It is worth pointing out that the default . . . except  if . . . then . . . structure is

logically equivalent to if . . . then . . . else . . ., where the else is unconditional and

specifies exactly what the default did. An unconditional else is, of course, a

default. (Note that there are no unconditional elses in the preceding rules.) Log-

7 2


C H A P T E R   3

|

O U T P U T: K N OW L E D G E   R E P R E S E N TAT I O N



Default: Iris-setosa

1

except if petal-length 



≥ 2.45 and petal-length < 5.355  

2

          and petal-width < 1.75 



3

       then Iris-versicolor 

4

            except if petal-length 



≥ 4.95 and petal-width < 1.55 

5

                   then Iris-virginica 



6

                   else if sepal-length < 4.95 and sepal-width 

≥ 2.45 

7

                        then Iris-virginica 



8

       else if petal-length 

≥ 3.35 

9

            then Iris-virginica 



10

                 except if petal-length < 4.85 and sepal-length < 5.95  11

                        then Iris-versicolor 

12

Figure 3.5 Rules for the Iris data.

P088407-Ch003.qxd  4/30/05  11:09 AM  Page 72



Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   37   38   39   40   41   42   43   44   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə