Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	23/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 19 20 21 22 23 24 25 26 ... 219

Other applications
1.4 Machine learning and statistics

sending offers only to those likely to want the product. Machine learning can

help companies to ﬁnd the targets.

Other applications

There are countless other applications of machine learning. We brieﬂy mention

a few more areas to illustrate the breadth of what has been done.

Sophisticated manufacturing processes often involve tweaking control

parameters. Separating crude oil from natural gas is an essential prerequisite to

oil reﬁnement, and controlling the separation process is a tricky job. British

Petroleum used machine learning to create rules for setting the parameters. This

now takes just 10 minutes, whereas previously human experts took more than

a day. Westinghouse faced problems in their process for manufacturing nuclear

fuel pellets and used machine learning to create rules to control the process.

This was reported to save them more than $10 million per year (in 1984). The

Tennessee printing company R.R. Donnelly applied the same idea to control

rotogravure printing presses to reduce artifacts caused by inappropriate

parameter settings, reducing the number of artifacts from more than 500 each

year to less than 30.

In the realm of customer support and service, we have already described adju-

dicating loans, and marketing and sales applications. Another example arises

when a customer reports a telephone problem and the company must decide

what kind of technician to assign to the job. An expert system developed by Bell

Atlantic in 1991 to make this decision was replaced in 1999 by a set of rules

learned using machine learning, which saved more than $10 million per year by

making fewer incorrect decisions.

There are many scientiﬁc applications. In biology, machine learning is used

to help identify the thousands of genes within each new genome. In biomedi-

cine, it is used to predict drug activity by analyzing not just the chemical

properties of drugs but also their three-dimensional structure. This accelerates

drug discovery and reduces its cost. In astronomy, machine learning has

been used to develop a fully automatic cataloguing system for celestial objects

that are too faint to be seen by visual inspection. In chemistry, it has been used

to predict the structure of certain organic compounds from magnetic resonance

spectra. In all these applications, machine learning techniques have attained

levels of performance—or should we say skill?—that rival or surpass human

experts.

Automation is especially welcome in situations involving continuous moni-

toring, a job that is time consuming and exceptionally tedious for humans. Eco-

logical applications include the oil spill monitoring described earlier. Some

other applications are rather less consequential—for example, machine learn-

ing is being used to predict preferences for TV programs based on past choices

2 8

C H A P T E R 1

W H AT ’ S I T A L L A B O U T ?

P088407-Ch001.qxd 4/30/05 11:11 AM Page 28

and advise viewers about the available channels. Still others may save lives.

Intensive care patients may be monitored to detect changes in variables that

cannot be explained by circadian rhythm, medication, and so on, raising

an alarm when appropriate. Finally, in a world that relies on vulnerable net-

worked computer systems and is increasingly concerned about cybersecurity,

machine learning is used to detect intrusion by recognizing unusual patterns of

operation.

1.4 Machine learning and statistics

What’s the difference between machine learning and statistics? Cynics, looking

wryly at the explosion of commercial interest (and hype) in this area, equate

data mining to statistics plus marketing. In truth, you should not look for a

dividing line between machine learning and statistics because there is a contin-

uum—and a multidimensional one at that—of data analysis techniques. Some

derive from the skills taught in standard statistics courses, and others are more

closely associated with the kind of machine learning that has arisen out of com-

puter science. Historically, the two sides have had rather different traditions. If

forced to point to a single difference of emphasis, it might be that statistics has

been more concerned with testing hypotheses, whereas machine learning has

been more concerned with formulating the process of generalization as a search

through possible hypotheses. But this is a gross oversimpliﬁcation: statistics is

far more than hypothesis testing, and many machine learning techniques do not

involve any searching at all.

In the past, very similar methods have developed in parallel in machine learn-

ing and statistics. One is decision tree induction. Four statisticians (Breiman et

al. 1984) published a book on Classiﬁcation and regression trees in the mid-1980s,

and throughout the 1970s and early 1980s a prominent machine learning

researcher, J. Ross Quinlan, was developing a system for inferring classiﬁcation

trees from examples. These two independent projects produced quite similar

methods for generating trees from examples, and the researchers only became

aware of one another’s work much later. A second area in which similar methods

have arisen involves the use of nearest-neighbor methods for classiﬁcation.

These are standard statistical techniques that have been extensively adapted by

machine learning researchers, both to improve classiﬁcation performance and

to make the procedure more efﬁcient computationally. We will examine both

decision tree induction and nearest-neighbor methods in Chapter 4.

But now the two perspectives have converged. The techniques we will

examine in this book incorporate a great deal of statistical thinking. From the

beginning, when constructing and reﬁning the initial example set, standard sta-

tistical methods apply: visualization of data, selection of attributes, discarding

1 . 4

M AC H I N E L E A R N I N G A N D S TAT I S T I C S

2 9

P088407-Ch001.qxd 4/30/05 11:11 AM Page 29

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 19 20 21 22 23 24 25 26 ... 219