Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	28/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 24 25 26 27 28 29 30 31 ... 219

of some of the other projects mentioned in Section 1.3 (including the ﬁgures

of dollars saved and related literature references) appear at the Web sites of the

Alberta Ingenuity Centre for Machine Learning and MLnet, a European

network for machine learning.

The book Classiﬁcation and regression trees mentioned in Section 1.4 is by

Breiman et al. (1984), and the independently derived but similar scheme of

Quinlan was described in a series of papers that eventually led to a book

(Quinlan 1993).

The ﬁrst book on data mining appeared in 1991 (Piatetsky-Shapiro and

Frawley 1991)—a collection of papers presented at a workshop on knowledge

discovery in databases in the late 1980s. Another book from the same stable has

appeared since (Fayyad et al. 1996) from a 1994 workshop. There followed a

rash of business-oriented books on data mining, focusing mainly on practical

aspects of how it can be put into practice with only rather superﬁcial descrip-

tions of the technology that underlies the methods used. They are valuable

sources of applications and inspiration. For example, Adriaans and Zantige

(1996) from Syllogic, a European systems and database consultancy, provide an

early introduction to data mining. Berry and Linoff (1997), from a Pennsylva-

nia-based company specializing in data warehousing and data mining, give an

excellent and example-studded review of data mining techniques for market-

ing, sales, and customer support. Cabena et al. (1998), written by people from

ﬁve international IBM laboratories, overview the data mining process with

many examples of real-world applications. Dhar and Stein (1997) give a busi-

ness perspective on data mining and include broad-brush, popularized reviews

of many of the technologies involved. Groth (1998), working for a provider of

data mining software, gives a brief introduction to data mining and then a

fairly extensive review of data mining software products; the book includes a

CD-ROM containing a demo version of his company’s product. Weiss and

Indurkhya (1998) look at a wide variety of statistical techniques for making

predictions from what they call “big data.” Han and Kamber (2001) cover data

mining from a database perspective, focusing on the discovery of knowledge in

large corporate databases. Finally, Hand et al. (2001) produced an interdiscipli-

nary book on data mining from an international group of authors who are well

respected in the ﬁeld.

Books on machine learning, on the other hand, tend to be academic texts

suited for use in university courses rather than practical guides. Mitchell (1997)

wrote an excellent book that covers many techniques of machine learning,

including some—notably genetic algorithms and reinforcement learning—that

are not covered here. Langley (1996) offers another good text. Although the pre-

viously mentioned book by Quinlan (1993) concentrates on a particular learn-

ing algorithm, C4.5, which we will cover in detail in Chapters 4 and 6, it is a

good introduction to some of the problems and techniques of machine learn-

3 8

C H A P T E R 1

W H AT ’ S I T A L L A B O U T ?

P088407-Ch001.qxd 4/30/05 11:11 AM Page 38

ing. An excellent book on machine learning from a statistical perspective is from

Hastie et al. (2001). This is quite a theoretically oriented work, and is beauti-

fully produced with apt and telling illustrations.

Pattern recognition is a topic that is closely related to machine learning, and

many of the same techniques apply. Duda et al. (2001) offer the second edition

of a classic and successful book on pattern recognition (Duda and Hart 1973).

Ripley (1996) and Bishop (1995) describe the use of neural networks for pattern

recognition. Data mining with neural networks is the subject of a book by Bigus

(1996) of IBM, which features the IBM Neural Network Utility Product that he

developed.

There is a great deal of current interest in support vector machines, which

we return to in Chapter 6. Cristianini and Shawe-Taylor (2000) give a nice intro-

duction, and a follow-up work generalizes this to cover additional algorithms,

kernels, and solutions with applications to pattern discovery problems in ﬁelds

such as bioinformatics, text analysis, and image analysis (Shawe-Taylor and

Cristianini 2004). Schölkopf and Smola (2002) provide a comprehensive intro-

duction to support vector machines and related kernel methods by two young

researchers who did their PhD research in this rapidly developing area.

1 . 7

F U RT H E R R E A D I N G

3 9

P088407-Ch001.qxd 4/30/05 11:11 AM Page 39

P088407-Ch001.qxd 4/30/05 11:11 AM Page 40

Before delving into the question of how machine learning methods operate, we

begin by looking at the different forms the input might take and, in the next

chapter, the different kinds of output that might be produced. With any soft-

ware system, understanding what the inputs and outputs are is far more impor-

tant than knowing what goes on in between, and machine learning is no

exception.

The input takes the form of concepts, instances, and attributes. We call the

thing that is to be learned a concept description. The idea of a concept, like

the very idea of learning in the ﬁrst place, is hard to pin down precisely, and

we won’t spend time philosophizing about just what it is and isn’t. In a

sense, what we are trying to ﬁnd—the result of the learning process—is a

description of the concept that is intelligible in that it can be understood, dis-

cussed, and disputed, and operational in that it can be applied to actual exam-

ples. The next section explains some distinctions among different kinds of

learning problems, distinctions that are very concrete and very important in

practical data mining.

c h a p t e r

Input:

Concepts, Instances, and Attributes

4 1

P088407-Ch002.qxd 4/30/05 11:10 AM Page 41

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 24 25 26 27 28 29 30 31 ... 219