Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə53/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   49   50   51   52   53   54   55   56   ...   219

4 . 3

D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G   D E C I S I O N   T R E E S

9 9

Before we created any of the nascent tree structures in Figure 4.2, the train-



ing examples at the root comprised nine yes and five no nodes, corresponding

to an information value of

Thus the tree in Figure 4.2(a) is responsible for an information gain of

which can be interpreted as the informational value of creating a branch on the



outlook attribute.

The way forward is clear. We calculate the information gain for each attri-

bute and choose the one that gains the most information to split on. In the sit-

uation of Figure 4.2,

so we select outlook as the splitting attribute at the root of the tree. Hopefully

this accords with your intuition as the best one to select. It is the only choice

for which one daughter node is completely pure, and this gives it a considerable

advantage over the other attributes. Humidity is the next best choice because it

produces a larger daughter node that is almost completely pure.

Then we continue, recursively. Figure 4.3 shows the possibilities for a further

branch at the node reached when outlook is  sunny. Clearly, a further split on

outlook will produce nothing new, so we only consider the other three attributes.

The information gain for each turns out to be

so we select humidity as the splitting attribute at this point. There is no need to

split these nodes any further, so this branch is finished.

Continued application of the same idea leads to the decision tree of Figure

4.4 for the weather data. Ideally, the process terminates when all leaf nodes are

pure, that is, when they contain instances that all have the same classification.

However, it might not be possible to reach this happy situation because there is

nothing to stop the training set containing two examples with identical sets of

attributes but different classes. Consequently, we stop when the data cannot be

split any further.

gain


bits

gain


bits

gain


bits,

temperature

humidity

windy

(

)



=

(

)



=

(

)



=

0 571


0 971

0 020


.

.

.



gain

bits


gain

bits


gain

bits


gain

bits,


outlook

temperature

humidity

windy

(

)



=

(

)



=

(

)



=

(

)



=

0 247


0 029

0 152


0 048

.

.



.

.

gain



info

info


bits,

outlook

(

)



=

[

]



(

)

-



[

] [


] [

]

(



)

=

-



=

9 5


2 3

4 0


3 2

0 940 0 693 0 247

,

, , , , ,



.

.

.



info

bits.


9 5

0 940


,

.

[



]

(

)



=

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 99




Calculating information

Now it is time to explain how to calculate the information measure that is used

as a basis for evaluating different splits. We describe the basic idea in this section,

then in the next we examine a correction that is usually made to counter a bias

toward selecting splits on attributes with large numbers of possible values.

Before examining the detailed formula for calculating the amount of infor-

mation required to specify the class of an example given that it reaches a tree

node with a certain number of yes’s and no’s, consider first the kind of proper-

ties we would expect this quantity to have:

1 0 0


C H A P T E R   4

|

A LG O R I T H M S : T H E   BA S I C   M E T H O D S



...

...


no

no

yes



sunny

hot


mild

cool


outlook

temperature

yes

no

(a)



...

...


no

no

no



yes

yes


sunny

high


normal

outlook


humidity

(b)


...

...


yes

yes


no

no

yes



no

sunny


false

true


outlook

windy


(c)

Figure 4.3 Expanded tree stumps for the weather data.

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 100




4 . 3

D I V I D E - A N D - C O N Q U E R : C O N S T RU C T I N G   D E C I S I O N   T R E E S

1 0 1

1. When the number of either yes’s or no’s is zero, the information is 



zero.

2. When the number of yes’s and no’s is equal, the information reaches a

maximum.

Moreover, the measure should be applicable to multiclass situations, not just to

two-class ones.

The information measure relates to the amount of information obtained by

making a decision, and a more subtle property of information can be derived

by considering the nature of decisions. Decisions can be made in a single stage,

or they can be made in several stages, and the amount of information involved

is the same in both cases. For example, the decision involved in

can be made in two stages. First decide whether it’s the first case or one of the

other two cases:

and then decide which of the other two cases it is:

In some cases the second decision will not need to be made, namely, when 

the decision turns out to be the first one. Taking this into account leads to the

equation


info 2,3, 4

info 2,7


info 3, 4

[

]



(

)

=



[

]

(



)

+

(



)

¥

[



]

(

)



7 9

.

info 3, 4



[

]

(



)

info 2,7


[

]

(



)

info 2,3, 4

[

]

(



)

false


true

yes


yes

no

sunny



overcast

rainy


outlook

humidity


windy

high


normal

no

yes



Figure 4.4 Decision tree for the weather data.

P088407-Ch004.qxd  4/30/05  11:13 AM  Page 101




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   49   50   51   52   53   54   55   56   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə