the learning algorithm find rules that outperformed
those of the expert collab-
orator, but the same expert was so impressed that he allegedly adopted the dis-
covered rules in place of his own!
1.3 Fielded applications
The examples that we opened with are speculative research projects, not pro-
duction systems. And the preceding illustrations are toy problems: they are
deliberately chosen to be small so that we can use them to work through algo-
rithms later in the book. Where’s the beef? Here are some applications of
machine learning that have actually been put into use.
Being fielded applications, the illustrations that follow tend to stress the use
of learning in performance situations, in which the emphasis is on ability to
perform well on new examples. This book also describes the use of learning
systems to gain knowledge from decision structures that are inferred from the
data. We believe that this is as important—probably even more important in
the long run—a use of the technology as merely making high-performance pre-
dictions. Still, it will tend to be underrepresented in fielded applications because
when learning techniques are used to gain insight, the result is not normally a
system that is put to work as an application in its own right. Nevertheless, in
three of the examples that follow, the fact that the decision structure is com-
prehensible is a key feature in the successful adoption of the application.
Decisions involving judgment
When you apply for a loan, you have to fill out a questionnaire that asks for
relevant financial and personal information. This information is used by the
loan company as the basis for its decision as to whether to lend you money. Such
decisions are typically made in two stages. First, statistical methods are used to
determine clear “accept” and “reject” cases. The remaining borderline cases are
more difficult and call for human judgment. For example, one loan company
uses a statistical decision procedure to calculate a numeric parameter based on
the information supplied in the questionnaire. Applicants are accepted if this
parameter exceeds a preset threshold and rejected if it falls below a second
threshold. This accounts for 90% of cases, and the remaining 10% are referred
to loan officers for a decision. On examining historical data on whether appli-
cants did indeed repay their loans, however, it turned out that half of the bor-
derline applicants who were granted loans actually defaulted. Although it would
be tempting simply to deny credit to borderline customers, credit industry pro-
fessionals pointed out that if only their repayment future could be reliably deter-
mined it is precisely these customers whose business should be wooed; they tend
to be active customers of a credit institution because their finances remain in a
2 2
C H A P T E R 1
|
W H AT ’ S I T A L L A B O U T ?
P088407-Ch001.qxd 4/30/05 11:11 AM Page 22
chronically volatile condition. A suitable compromise must be reached between
the viewpoint of a company accountant, who dislikes bad debt, and that of a
sales executive, who dislikes turning business away.
Enter machine learning. The input was 1000 training examples of borderline
cases for which a loan had been made that specified whether the borrower had
finally paid off or defaulted. For each training example, about 20 attributes were
extracted from the questionnaire, such as age, years with current employer, years
at current address, years with the bank, and other credit cards possessed. A
machine learning procedure was used to produce a small set of classification
rules that made correct predictions on two-thirds of the borderline cases in an
independently chosen test set. Not only did these rules improve the success rate
of the loan decisions, but the company also found them attractive because they
could be used to explain to applicants the reasons behind the decision. Although
the project was an exploratory one that took only a small development effort,
the loan company was apparently so pleased with the result that the rules were
put into use immediately.
Screening images
Since the early days of satellite technology, environmental scientists have been
trying to detect oil slicks from satellite images to give early warning of ecolog-
ical disasters and deter illegal dumping. Radar satellites provide an opportunity
for monitoring coastal waters day and night, regardless of weather conditions.
Oil slicks appear as dark regions in the image whose size and shape evolve
depending on weather and sea conditions. However, other look-alike dark
regions can be caused by local weather conditions such as high wind. Detecting
oil slicks is an expensive manual process requiring highly trained personnel who
assess each region in the image.
A hazard detection system has been developed to screen images for subse-
quent manual processing. Intended to be marketed worldwide to a wide variety
of users—government agencies and companies—with different objectives,
applications, and geographic areas, it needs to be highly customizable to indi-
vidual circumstances. Machine learning allows the system to be trained on
examples of spills and nonspills supplied by the user and lets the user control
the tradeoff between undetected spills and false alarms. Unlike other machine
learning applications, which generate a classifier that is then deployed in the
field, here it is the learning method itself that will be deployed.
The input is a set of raw pixel images from a radar satellite, and the output
is a much smaller set of images with putative oil slicks marked by a colored
border. First, standard image processing operations are applied to normalize the
image. Then, suspicious dark regions are identified. Several dozen attributes
are extracted from each region, characterizing its size, shape, area, intensity,
1 . 3
F I E L D E D A P P L I C AT I O N S
2 3
P088407-Ch001.qxd 4/30/05 11:11 AM Page 23