Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	38/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 34 35 36 37 38 39 40 41 ... 219

3.3 Classiﬁcation rules

Alternatively, a three-way split may be used, in which case there are several dif-

ferent possibilities. If missing value is treated as an attribute value in its own

right, that will create a third branch. An alternative for an integer-valued attrib-

ute would be a three-way split into less than, equal to, and greater than. An alter-

native for a real-valued attribute, for which equal to is not such a meaningful

option, would be to test against an interval rather than a single constant, again

giving a three-way split: below, within, and above. A numeric attribute is often

tested several times in any given path down the tree from root to leaf, each test

involving a different constant. We return to this when describing the handling

of numeric attributes in Section 6.1.

Missing values pose an obvious problem. It is not clear which branch should

be taken when a node tests an attribute whose value is missing. Sometimes, as

described in Section 2.4, missing value is treated as an attribute value in its own

right. If this is not the case, missing values should be treated in a special way

rather than being considered as just another possible value that the attribute

might take. A simple solution is to record the number of elements in the train-

ing set that go down each branch and to use the most popular branch if the

value for a test instance is missing.

A more sophisticated solution is to notionally split the instance into pieces

and send part of it down each branch and from there right on down to the leaves

of the subtrees involved. The split is accomplished using a numeric weight

between zero and one, and the weight for a branch is chosen to be proportional

to the number of training instances going down that branch, all weights

summing to one. A weighted instance may be further split at a lower node. Even-

tually, the various parts of the instance will each reach a leaf node, and the deci-

sions at these leaf nodes must be recombined using the weights that have

percolated down to the leaves. We return to this in Section 6.1.

It is instructive and can even be entertaining to build a decision tree for a

dataset manually. To do so effectively, you need a good way of visualizing the

data so that you can decide which are likely to be the best attributes to test and

what an appropriate test might be. The Weka Explorer, described in Part II, has

a User Classiﬁer facility that allows users to construct a decision tree interac-

tively. It presents you with a scatter plot of the data against two selected attrib-

utes, which you choose. When you ﬁnd a pair of attributes that discriminates

the classes well, you can create a two-way split by drawing a polygon around the

appropriate data points on the scatter plot.

For example, in Figure 3.1(a) the user is operating on a dataset with three

classes, the iris dataset, and has found two attributes, petallength and petalwidth,

that do a good job of splitting up the classes. A rectangle has been drawn, man-

ually, to separate out one of the classes (Iris versicolor). Then the user switches

to the decision tree view in Figure 3.1(b) to see the tree so far. The left-hand

leaf node contains predominantly irises of one type (Iris versicolor, contami-

3 . 2

D E C I S I O N T R E E S

6 3

P088407-Ch003.qxd 4/30/05 11:09 AM Page 63

6 4

C H A P T E R 3

O U T P U T: K N OW L E D G E R E P R E S E N TAT I O N

(a)

(b)

Figure 3.1 Constructing a decision tree interactively: (a) creating a rectangular test

involving petallength and petalwidth and (b) the resulting (unﬁnished) decision tree.

P088407-Ch003.qxd 4/30/05 11:09 AM Page 64

nated by only two virginicas); the right-hand one contains predominantly two

types (Iris setosa and virginica, contaminated by only two versicolors). The user

will probably select the right-hand leaf and work on it next, splitting it further

with another rectangle—perhaps based on a different pair of attributes

(although, from Figure 3.1[a], these two look pretty good).

Section 10.2 explains how to use Weka’s User Classiﬁer facility. Most people

enjoy making the ﬁrst few decisions but rapidly lose interest thereafter, and one

very useful option is to select a machine learning method and let it take over at

any point in the decision tree. Manual construction of decision trees is a good

way to get a feel for the tedious business of evaluating different combinations

of attributes to split on.

3.3 Classiﬁcation rules

Classiﬁcation rules are a popular alternative to decision trees, and we have

already seen examples for the weather (page 10), contact lens (page 13), iris

(page 15), and soybean (page 18) datasets. The antecedent, or precondition, of

a rule is a series of tests just like the tests at nodes in decision trees, and the con-

sequent, or conclusion, gives the class or classes that apply to instances covered

by that rule, or perhaps gives a probability distribution over the classes. Gener-

ally, the preconditions are logically ANDed together, and all the tests must

succeed if the rule is to ﬁre. However, in some rule formulations the precondi-

tions are general logical expressions rather than simple conjunctions. We often

think of the individual rules as being effectively logically ORed together: if any

one applies, the class (or probability distribution) given in its conclusion is

applied to the instance. However, conﬂicts arise when several rules with differ-

ent conclusions apply; we will return to this shortly.

It is easy to read a set of rules directly off a decision tree. One rule is gener-

ated for each leaf. The antecedent of the rule includes a condition for every node

on the path from the root to that leaf, and the consequent of the rule is the

class assigned by the leaf. This procedure produces rules that are unambigu-

ous in that the order in which they are executed is irrelevant. However, in

general, rules that are read directly off a decision tree are far more complex than

necessary, and rules derived from trees are usually pruned to remove redundant

tests.

Because decision trees cannot easily express the disjunction implied among

the different rules in a set, transforming a general set of rules into a tree is not

quite so straightforward. A good illustration of this occurs when the rules have

the same structure but different attributes, like:

If a and b then x

If c and d then x

3 . 3

C L A S S I F I C AT I O N RU L E S

6 5

P088407-Ch003.qxd 4/30/05 11:09 AM Page 65

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 34 35 36 37 38 39 40 41 ... 219