of some of the other projects mentioned in Section 1.3 (including the figures
of dollars saved and related literature references) appear at the Web sites of the
Alberta Ingenuity Centre for Machine Learning and MLnet, a European
network for machine learning.
The book Classification and regression trees mentioned in Section 1.4 is by
Breiman et al. (1984), and the independently derived but similar scheme of
Quinlan was described in a series of papers that eventually led to a book
(Quinlan 1993).
The first book on data mining appeared in 1991 (Piatetsky-Shapiro and
Frawley 1991)—a collection of papers presented at a workshop on knowledge
discovery in databases in the late 1980s. Another book from the same stable has
appeared since (Fayyad et al. 1996) from a 1994 workshop. There followed a
rash of business-oriented books on data mining, focusing mainly on practical
aspects of how it can be put into practice with only rather superficial descrip-
tions of the technology that underlies the methods used. They are valuable
sources of applications and inspiration. For example, Adriaans and Zantige
(1996) from Syllogic, a European systems and database consultancy, provide an
early introduction to data mining. Berry and Linoff (1997), from a Pennsylva-
nia-based company specializing in data warehousing and data mining, give an
excellent and example-studded review of data mining techniques for market-
ing, sales, and customer support. Cabena et al. (1998), written by people from
five international IBM laboratories, overview the data mining process with
many examples of real-world applications. Dhar and Stein (1997) give a busi-
ness perspective on data mining and include broad-brush, popularized reviews
of many of the technologies involved. Groth (1998), working for a provider of
data mining software, gives a brief introduction to data mining and then a
fairly extensive review of data mining software products; the book includes a
CD-ROM containing a demo version of his company’s product. Weiss and
Indurkhya (1998) look at a wide variety of statistical techniques for making
predictions from what they call “big data.” Han and Kamber (2001) cover data
mining from a database perspective, focusing on the discovery of knowledge in
large corporate databases. Finally, Hand et al. (2001) produced an interdiscipli-
nary book on data mining from an international group of authors who are well
respected in the field.
Books on machine learning, on the other hand, tend to be academic texts
suited for use in university courses rather than practical guides. Mitchell (1997)
wrote an excellent book that covers many techniques of machine learning,
including some—notably genetic algorithms and reinforcement learning—that
are not covered here. Langley (1996) offers another good text. Although the pre-
viously mentioned book by Quinlan (1993) concentrates on a particular learn-
ing algorithm, C4.5, which we will cover in detail in Chapters 4 and 6, it is a
good introduction to some of the problems and techniques of machine learn-
3 8
C H A P T E R 1
|
W H AT ’ S I T A L L A B O U T ?
P088407-Ch001.qxd 4/30/05 11:11 AM Page 38
ing. An excellent book on machine learning from a statistical perspective is from
Hastie et al. (2001). This is quite a theoretically oriented work, and is beauti-
fully produced with apt and telling illustrations.
Pattern recognition is a topic that is closely related to machine learning, and
many of the same techniques apply. Duda et al. (2001) offer the second edition
of a classic and successful book on pattern recognition (Duda and Hart 1973).
Ripley (1996) and Bishop (1995) describe the use of neural networks for pattern
recognition. Data mining with neural networks is the subject of a book by Bigus
(1996) of IBM, which features the IBM Neural Network Utility Product that he
developed.
There is a great deal of current interest in support vector machines, which
we return to in Chapter 6. Cristianini and Shawe-Taylor (2000) give a nice intro-
duction, and a follow-up work generalizes this to cover additional algorithms,
kernels, and solutions with applications to pattern discovery problems in fields
such as bioinformatics, text analysis, and image analysis (Shawe-Taylor and
Cristianini 2004). Schölkopf and Smola (2002) provide a comprehensive intro-
duction to support vector machines and related kernel methods by two young
researchers who did their PhD research in this rapidly developing area.
1 . 7
F U RT H E R R E A D I N G
3 9
P088407-Ch001.qxd 4/30/05 11:11 AM Page 39
Before delving into the question of how machine learning methods operate, we
begin by looking at the different forms the input might take and, in the next
chapter, the different kinds of output that might be produced. With any soft-
ware system, understanding what the inputs and outputs are is far more impor-
tant than knowing what goes on in between, and machine learning is no
exception.
The input takes the form of concepts, instances, and attributes. We call the
thing that is to be learned a concept description. The idea of a concept, like
the very idea of learning in the first place, is hard to pin down precisely, and
we won’t spend time philosophizing about just what it is and isn’t. In a
sense, what we are trying to find—the result of the learning process—is a
description of the concept that is intelligible in that it can be understood, dis-
cussed, and disputed, and operational in that it can be applied to actual exam-
ples. The next section explains some distinctions among different kinds of
learning problems, distinctions that are very concrete and very important in
practical data mining.
c h a p t e r
2
Input:
Concepts, Instances, and Attributes
4 1
P088407-Ch002.qxd 4/30/05 11:10 AM Page 41