Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə25/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   21   22   23   24   25   26   27   28   ...   219

eliminate the correct description from the space. The result is that the set of

remaining descriptions becomes empty. This situation is very likely to happen

if the examples contain any noise at all, which inevitably they do except in 

artificial situations.

Another way of looking at generalization as search is to imagine it not as a

process of enumerating descriptions and striking out those that don’t apply but

as a kind of hill-climbing in description space to find the description that best

matches the set of examples according to some prespecified matching criterion.

This is the way that most practical machine learning methods work. However,

except in the most trivial cases, it is impractical to search the whole space

exhaustively; most practical algorithms involve heuristic search and cannot

guarantee to find the optimal description.



Bias

Viewing generalization as a search in a space of possible concepts makes it clear

that the most important decisions in a machine learning system are as follows:

᭿

The concept description language



᭿

The order in which the space is searched

᭿

The way that overfitting to the particular training data is avoided



These three properties are generally referred to as the bias of the search and are

called  language bias, search bias, and  overfitting-avoidance bias. You bias the

learning scheme by choosing a language in which to express concepts, by search-

ing in a particular way for an acceptable description, and by deciding when the

concept has become so complex that it needs to be simplified.

Language bias

The most important question for language bias is whether the concept descrip-

tion language is universal or whether it imposes constraints on what concepts can

be learned. If you consider the set of all possible examples, a concept is really just

a division of it into subsets. In the weather example, if you were to enumerate all

possible weather conditions, the play concept is a subset of possible weather con-

ditions. A “universal” language is one that is capable of expressing every possible

subset of examples. In practice, the set of possible examples is generally huge, and

in this respect our perspective is a theoretical, not a practical, one.

If the concept description language permits statements involving logical or,

that is, disjunctions, then any subset can be represented. If the description lan-

guage is rule based, disjunction can be achieved by using separate rules. For

example, one possible concept representation is just to enumerate the examples:

If outlook 

= overcast and temperature = hot and humidity = high

and windy 

= false then play = yes

3 2


C H A P T E R   1

|

W H AT ’ S   I T   A L L   A B O U T ?



P088407-Ch001.qxd  4/30/05  11:11 AM  Page 32


If outlook 

= rainy and temperature = mild and humidity = high

and windy 

= false then play = yes

If outlook 

= rainy and temperature = cool and humidity = normal

and windy 

= false then play = yes

If outlook 

= overcast and temperature = cool and humidity = normal

and windy 

= true then play = yes

. . .

If none of the above then play 



= no

This is not a particularly enlightening concept description: it simply records the

positive examples that have been observed and assumes that all the rest are neg-

ative. Each positive example is given its own rule, and the concept is the dis-

junction of the rules. Alternatively, you could imagine having individual rules

for each of the negative examples, too—an equally uninteresting concept. In

either case the concept description does not perform any generalization; it

simply records the original data.

On the other hand, if disjunction is not allowed, some possible concepts—

sets of examples—may not be able to be represented at all. In that case, a

machine learning scheme may simply be unable to achieve good performance.

Another kind of language bias is that obtained from knowledge of the par-

ticular domain being used. For example, it may be that some combinations of

attribute values can never happen. This would be the case if one attribute

implied another. We saw an example of this when considering the rules for the

soybean problem described on page 20. Then, it would be pointless to even con-

sider concepts that involved redundant or impossible combinations of attribute

values. Domain knowledge can be used to cut down the search space. Knowl-

edge is power: a little goes a long way, and even a small hint can reduce the

search space dramatically.



Search bias

In realistic data mining problems, there are many alternative concept descrip-

tions that fit the data, and the problem is to find the “best” one according to

some criterion—usually simplicity. We use the term fit in a statistical sense; we

seek the best description that fits the data reasonably well. Moreover, it is often

computationally infeasible to search the whole space and guarantee that the

description found really is the best. Consequently, the search procedure is

heuristic, and no guarantees can be made about the optimality of the final result.

This leaves plenty of room for bias: different search heuristics bias the search in

different ways.

For example, a learning algorithm might adopt a “greedy” search for rules by

trying to find the best rule at each stage and adding it in to the rule set. However,

it may be that the best pair of rules is not just the two rules that are individu-

ally found to be the best. Or when building a decision tree, a commitment to

1 . 5

G E N E R A L I ZAT I O N   A S   S E A RC H



3 3

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 33




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   21   22   23   24   25   26   27   28   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə