Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	62/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 58 59 60 61 62 63 64 65 ... 219

4.6 Linear models
Numeric prediction: Linear regression

4 . 6

L I N E A R M O D E L S

1 1 9

through the dataset for each different size of item set. Sometimes the dataset is

too large to read in to main memory and must be kept on disk; then it may be

worth reducing the number of passes by checking item sets of two consecutive

sizes in one go. For example, once sets with two items have been generated, all

sets of three items could be generated from them before going through the

instance set to count the actual number of items in the sets. More three-item

sets than necessary would be considered, but the number of passes through the

entire dataset would be reduced.

In practice, the amount of computation needed to generate association rules

depends critically on the minimum coverage speciﬁed. The accuracy has less

inﬂuence because it does not affect the number of passes that we must make

through the dataset. In many situations we will want to obtain a certain num-

ber of rules—say 50—with the greatest possible coverage at a prespeciﬁed

minimum accuracy level. One way to do this is to begin by specifying the cov-

erage to be rather high and to then successively reduce it, reexecuting the entire

rule-ﬁnding algorithm for each coverage value and repeating this until the

desired number of rules has been generated.

The tabular input format that we use throughout this book, and in particu-

lar a standard ARFF ﬁle based on it, is very inefﬁcient for many association-rule

problems. Association rules are often used when attributes are binary—either

present or absent—and most of the attribute values associated with a given

instance are absent. This is a case for the sparse data representation described

in Section 2.4; the same algorithm for ﬁnding association rules applies.

4.6 Linear models

The methods we have been looking at for decision trees and rules work most

naturally with nominal attributes. They can be extended to numeric attributes

either by incorporating numeric-value tests directly into the decision tree or rule

induction scheme, or by prediscretizing numeric attributes into nominal ones.

We will see how in Chapters 6 and 7, respectively. However, there are methods

that work most naturally with numeric attributes. We look at simple ones here,

ones that form components of more complex learning methods, which we will

examine later.

Numeric prediction: Linear regression

When the outcome, or class, is numeric, and all the attributes are numeric, linear

regression is a natural technique to consider. This is a staple method in statis-

tics. The idea is to express the class as a linear combination of the attributes,

with predetermined weights:

x

w

w a

w a

w a

k k

1 1

2 2

. . .

P088407-Ch004.qxd 4/30/05 11:13 AM Page 119

where x is the class; a

, a

, . . ., a

are the attribute values; and w

, w

, . . ., w

are

weights.

The weights are calculated from the training data. Here the notation gets a

little heavy, because we need a way of expressing the attribute values for each

training instance. The ﬁrst instance will have a class, say x

(1)

, and attribute values

(1)

, a

(1)

, . . ., a

k

(1)

, where the superscript denotes that it is the ﬁrst example.

Moreover, it is notationally convenient to assume an extra attribute a

whose

value is always 1.

The predicted value for the ﬁrst instance’s class can be written as

This is the predicted, not the actual, value for the ﬁrst instance’s class. Of inter-

est is the difference between the predicted and the actual values. The method of

linear regression is to choose the coefﬁcients w

j

—there are k

+ 1 of them—to

minimize the sum of the squares of these differences over all the training

instances. Suppose there are n training instances; denote the ith one with a

superscript (i). Then the sum of the squares of the differences is

where the expression inside the parentheses is the difference between the ith

instance’s actual class and its predicted class. This sum of squares is what we

have to minimize by choosing the coefﬁcients appropriately.

This is all starting to look rather formidable. However, the minimization

technique is straightforward if you have the appropriate math background.

Sufﬁce it to say that given enough examples—roughly speaking, more examples

than attributes—choosing weights to minimize the sum of the squared differ-

ences is really not difﬁcult. It does involve a matrix inversion operation, but this

is readily available as prepackaged software.

Once the math has been accomplished, the result is a set of numeric weights,

based on the training data, which we can use to predict the class of new

instances. We saw an example of this when looking at the CPU performance

data, and the actual numeric weights are given in Figure 3.7(a). This formula

can be used to predict the CPU performance of new test instances.

Linear regression is an excellent, simple method for numeric prediction, and

it has been widely used in statistical applications for decades. Of course, linear

models suffer from the disadvantage of, well, linearity. If the data exhibits a non-

linear dependency, the best-ﬁtting straight line will be found, where “best” is

interpreted as the least mean-squared difference. This line may not ﬁt very well.

x

w a

i

j

j

i

j

k

i

n

( )

w a

w a

w a

w a

w a

k k

j

j

j

k

0 0

1 1

2 2

( )

. . .

1 2 0

C H A P T E R 4

A LG O R I T H M S : T H E BA S I C M E T H O D S

P088407-Ch004.qxd 4/30/05 11:13 AM Page 120

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 58 59 60 61 62 63 64 65 ... 219