Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	53/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 49 50 51 52 53 54 55 56 ... 219

Calculating information

4 . 3

D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G D E C I S I O N T R E E S

9 9

Before we created any of the nascent tree structures in Figure 4.2, the train-

ing examples at the root comprised nine yes and ﬁve no nodes, corresponding

to an information value of

Thus the tree in Figure 4.2(a) is responsible for an information gain of

which can be interpreted as the informational value of creating a branch on the

outlook attribute.

The way forward is clear. We calculate the information gain for each attri-

bute and choose the one that gains the most information to split on. In the sit-

uation of Figure 4.2,

so we select outlook as the splitting attribute at the root of the tree. Hopefully

this accords with your intuition as the best one to select. It is the only choice

for which one daughter node is completely pure, and this gives it a considerable

advantage over the other attributes. Humidity is the next best choice because it

produces a larger daughter node that is almost completely pure.

Then we continue, recursively. Figure 4.3 shows the possibilities for a further

branch at the node reached when outlook is sunny. Clearly, a further split on

outlook will produce nothing new, so we only consider the other three attributes.

The information gain for each turns out to be

so we select humidity as the splitting attribute at this point. There is no need to

split these nodes any further, so this branch is ﬁnished.

Continued application of the same idea leads to the decision tree of Figure

4.4 for the weather data. Ideally, the process terminates when all leaf nodes are

pure, that is, when they contain instances that all have the same classiﬁcation.

However, it might not be possible to reach this happy situation because there is

nothing to stop the training set containing two examples with identical sets of

attributes but different classes. Consequently, we stop when the data cannot be

split any further.

gain

bits

gain

bits

gain

bits,

temperature

humidity

windy

(

)

(

)

(

)

0 571

0 971

0 020

gain

bits

gain

bits

gain

bits

gain

bits,

outlook

temperature

humidity

windy

(

)

(

)

(

)

(

)

0 247

0 029

0 152

0 048

gain

info

bits,

outlook

(

)

[

]

(

)

[

] [

]

(

)

9 5

2 3

4 0

3 2

0 940 0 693 0 247

, , , , ,

info

bits.

9 5

0 940

[

]

(

)

P088407-Ch004.qxd 4/30/05 11:13 AM Page 99

Calculating information

Now it is time to explain how to calculate the information measure that is used

as a basis for evaluating different splits. We describe the basic idea in this section,

then in the next we examine a correction that is usually made to counter a bias

toward selecting splits on attributes with large numbers of possible values.

Before examining the detailed formula for calculating the amount of infor-

mation required to specify the class of an example given that it reaches a tree

node with a certain number of yes’s and no’s, consider ﬁrst the kind of proper-

ties we would expect this quantity to have:

1 0 0

C H A P T E R 4

A LG O R I T H M S : T H E BA S I C M E T H O D S

...

yes

sunny

hot

mild

cool

outlook

temperature

yes

no

(a)

...

yes

sunny

high

normal

outlook

humidity

(b)

...

yes

sunny

false

true

outlook

windy

P088407-Ch004.qxd 4/30/05 11:13 AM Page 100

4 . 3

D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G D E C I S I O N T R E E S

1 0 1

1. When the number of either yes’s or no’s is zero, the information is

zero.

2. When the number of yes’s and no’s is equal, the information reaches a

maximum.

Moreover, the measure should be applicable to multiclass situations, not just to

two-class ones.

The information measure relates to the amount of information obtained by

making a decision, and a more subtle property of information can be derived

by considering the nature of decisions. Decisions can be made in a single stage,

or they can be made in several stages, and the amount of information involved

is the same in both cases. For example, the decision involved in

can be made in two stages. First decide whether it’s the ﬁrst case or one of the

other two cases:

and then decide which of the other two cases it is:

In some cases the second decision will not need to be made, namely, when

the decision turns out to be the ﬁrst one. Taking this into account leads to the

equation

info 2,3, 4

info 2,7

info 3, 4

[

]

(

)

[

]

(

)

(

)

[

]

(

)

7 9

info 3, 4

[

]

(

)

info 2,7

[

]

(

)

info 2,3, 4

[

]

(

)

false

true

yes

sunny

overcast

rainy

outlook

humidity

windy

high

normal

yes

Figure 4.4 Decision tree for the weather data.

P088407-Ch004.qxd 4/30/05 11:13 AM Page 101

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 49 50 51 52 53 54 55 56 ... 219