Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Yüklə 4,3 Mb.

Pdf görüntüsü

səhifə	20/219
tarix	08.10.2017
ölçüsü	4,3 Mb.
	#3816

1 ... 16 17 18 19 20 21 22 23 ... 219

the learning algorithm ﬁnd rules that outperformed those of the expert collab-

orator, but the same expert was so impressed that he allegedly adopted the dis-

covered rules in place of his own!

1.3 Fielded applications

The examples that we opened with are speculative research projects, not pro-

duction systems. And the preceding illustrations are toy problems: they are

deliberately chosen to be small so that we can use them to work through algo-

rithms later in the book. Where’s the beef? Here are some applications of

machine learning that have actually been put into use.

Being ﬁelded applications, the illustrations that follow tend to stress the use

of learning in performance situations, in which the emphasis is on ability to

perform well on new examples. This book also describes the use of learning

systems to gain knowledge from decision structures that are inferred from the

data. We believe that this is as important—probably even more important in

the long run—a use of the technology as merely making high-performance pre-

dictions. Still, it will tend to be underrepresented in ﬁelded applications because

when learning techniques are used to gain insight, the result is not normally a

system that is put to work as an application in its own right. Nevertheless, in

three of the examples that follow, the fact that the decision structure is com-

prehensible is a key feature in the successful adoption of the application.

Decisions involving judgment

When you apply for a loan, you have to ﬁll out a questionnaire that asks for

relevant ﬁnancial and personal information. This information is used by the

loan company as the basis for its decision as to whether to lend you money. Such

decisions are typically made in two stages. First, statistical methods are used to

determine clear “accept” and “reject” cases. The remaining borderline cases are

more difﬁcult and call for human judgment. For example, one loan company

uses a statistical decision procedure to calculate a numeric parameter based on

the information supplied in the questionnaire. Applicants are accepted if this

parameter exceeds a preset threshold and rejected if it falls below a second

threshold. This accounts for 90% of cases, and the remaining 10% are referred

to loan ofﬁcers for a decision. On examining historical data on whether appli-

cants did indeed repay their loans, however, it turned out that half of the bor-

derline applicants who were granted loans actually defaulted. Although it would

be tempting simply to deny credit to borderline customers, credit industry pro-

fessionals pointed out that if only their repayment future could be reliably deter-

mined it is precisely these customers whose business should be wooed; they tend

to be active customers of a credit institution because their ﬁnances remain in a

2 2

C H A P T E R 1

W H AT ’ S I T A L L A B O U T ?

P088407-Ch001.qxd 4/30/05 11:11 AM Page 22

chronically volatile condition. A suitable compromise must be reached between

the viewpoint of a company accountant, who dislikes bad debt, and that of a

sales executive, who dislikes turning business away.

Enter machine learning. The input was 1000 training examples of borderline

cases for which a loan had been made that speciﬁed whether the borrower had

ﬁnally paid off or defaulted. For each training example, about 20 attributes were

extracted from the questionnaire, such as age, years with current employer, years

at current address, years with the bank, and other credit cards possessed. A

machine learning procedure was used to produce a small set of classiﬁcation

rules that made correct predictions on two-thirds of the borderline cases in an

independently chosen test set. Not only did these rules improve the success rate

of the loan decisions, but the company also found them attractive because they

could be used to explain to applicants the reasons behind the decision. Although

the project was an exploratory one that took only a small development effort,

the loan company was apparently so pleased with the result that the rules were

put into use immediately.

Screening images

Since the early days of satellite technology, environmental scientists have been

trying to detect oil slicks from satellite images to give early warning of ecolog-

ical disasters and deter illegal dumping. Radar satellites provide an opportunity

for monitoring coastal waters day and night, regardless of weather conditions.

Oil slicks appear as dark regions in the image whose size and shape evolve

depending on weather and sea conditions. However, other look-alike dark

regions can be caused by local weather conditions such as high wind. Detecting

oil slicks is an expensive manual process requiring highly trained personnel who

assess each region in the image.

A hazard detection system has been developed to screen images for subse-

quent manual processing. Intended to be marketed worldwide to a wide variety

of users—government agencies and companies—with different objectives,

applications, and geographic areas, it needs to be highly customizable to indi-

vidual circumstances. Machine learning allows the system to be trained on

examples of spills and nonspills supplied by the user and lets the user control

the tradeoff between undetected spills and false alarms. Unlike other machine

learning applications, which generate a classiﬁer that is then deployed in the

ﬁeld, here it is the learning method itself that will be deployed.

The input is a set of raw pixel images from a radar satellite, and the output

is a much smaller set of images with putative oil slicks marked by a colored

border. First, standard image processing operations are applied to normalize the

image. Then, suspicious dark regions are identiﬁed. Several dozen attributes

are extracted from each region, characterizing its size, shape, area, intensity,

1 . 3

F I E L D E D A P P L I C AT I O N S

2 3

P088407-Ch001.qxd 4/30/05 11:11 AM Page 23

Yüklə 4,3 Mb.

Dostları ilə paylaş:

1 ... 16 17 18 19 20 21 22 23 ... 219