Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	25/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 21 22 23 24 25 26 27 28 ... 219

Language bias

eliminate the correct description from the space. The result is that the set of

remaining descriptions becomes empty. This situation is very likely to happen

if the examples contain any noise at all, which inevitably they do except in

artiﬁcial situations.

Another way of looking at generalization as search is to imagine it not as a

process of enumerating descriptions and striking out those that don’t apply but

as a kind of hill-climbing in description space to ﬁnd the description that best

matches the set of examples according to some prespeciﬁed matching criterion.

This is the way that most practical machine learning methods work. However,

except in the most trivial cases, it is impractical to search the whole space

exhaustively; most practical algorithms involve heuristic search and cannot

guarantee to ﬁnd the optimal description.

Bias

Viewing generalization as a search in a space of possible concepts makes it clear

that the most important decisions in a machine learning system are as follows:

᭿

The concept description language

᭿

The order in which the space is searched

᭿

The way that overﬁtting to the particular training data is avoided

These three properties are generally referred to as the bias of the search and are

called language bias, search bias, and overﬁtting-avoidance bias. You bias the

learning scheme by choosing a language in which to express concepts, by search-

ing in a particular way for an acceptable description, and by deciding when the

concept has become so complex that it needs to be simpliﬁed.

Language bias

The most important question for language bias is whether the concept descrip-

tion language is universal or whether it imposes constraints on what concepts can

be learned. If you consider the set of all possible examples, a concept is really just

a division of it into subsets. In the weather example, if you were to enumerate all

possible weather conditions, the play concept is a subset of possible weather con-

ditions. A “universal” language is one that is capable of expressing every possible

subset of examples. In practice, the set of possible examples is generally huge, and

in this respect our perspective is a theoretical, not a practical, one.

If the concept description language permits statements involving logical or,

that is, disjunctions, then any subset can be represented. If the description lan-

guage is rule based, disjunction can be achieved by using separate rules. For

example, one possible concept representation is just to enumerate the examples:

If outlook

= overcast and temperature = hot and humidity = high

and windy

= false then play = yes

3 2

C H A P T E R 1

W H AT ’ S I T A L L A B O U T ?

P088407-Ch001.qxd 4/30/05 11:11 AM Page 32

If outlook

= rainy and temperature = mild and humidity = high

and windy

= false then play = yes

If outlook

= rainy and temperature = cool and humidity = normal

and windy

= false then play = yes

If outlook

= overcast and temperature = cool and humidity = normal

and windy

= true then play = yes

. . .

If none of the above then play

= no

This is not a particularly enlightening concept description: it simply records the

positive examples that have been observed and assumes that all the rest are neg-

ative. Each positive example is given its own rule, and the concept is the dis-

junction of the rules. Alternatively, you could imagine having individual rules

for each of the negative examples, too—an equally uninteresting concept. In

either case the concept description does not perform any generalization; it

simply records the original data.

On the other hand, if disjunction is not allowed, some possible concepts—

sets of examples—may not be able to be represented at all. In that case, a

machine learning scheme may simply be unable to achieve good performance.

Another kind of language bias is that obtained from knowledge of the par-

ticular domain being used. For example, it may be that some combinations of

attribute values can never happen. This would be the case if one attribute

implied another. We saw an example of this when considering the rules for the

soybean problem described on page 20. Then, it would be pointless to even con-

sider concepts that involved redundant or impossible combinations of attribute

values. Domain knowledge can be used to cut down the search space. Knowl-

edge is power: a little goes a long way, and even a small hint can reduce the

search space dramatically.

Search bias

In realistic data mining problems, there are many alternative concept descrip-

tions that ﬁt the data, and the problem is to ﬁnd the “best” one according to

some criterion—usually simplicity. We use the term ﬁt in a statistical sense; we

seek the best description that ﬁts the data reasonably well. Moreover, it is often

computationally infeasible to search the whole space and guarantee that the

description found really is the best. Consequently, the search procedure is

heuristic, and no guarantees can be made about the optimality of the ﬁnal result.

This leaves plenty of room for bias: different search heuristics bias the search in

different ways.

For example, a learning algorithm might adopt a “greedy” search for rules by

trying to ﬁnd the best rule at each stage and adding it in to the rule set. However,

it may be that the best pair of rules is not just the two rules that are individu-

ally found to be the best. Or when building a decision tree, a commitment to

1 . 5

G E N E R A L I ZAT I O N A S S E A RC H

3 3

P088407-Ch001.qxd 4/30/05 11:11 AM Page 33

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 21 22 23 24 25 26 27 28 ... 219