Data Mining: Practical Machine Learning Tools and Techniques, Second Edition



Yüklə 4,3 Mb.
Pdf görüntüsü
səhifə27/219
tarix08.10.2017
ölçüsü4,3 Mb.
#3816
1   ...   23   24   25   26   27   28   29   30   ...   219

status of such a “discovery”? What information is it based on? Under what con-

ditions was that information collected? In what ways is it ethical to use it?

Clearly, insurance companies are in the business of discriminating among

people based on stereotypes—young males pay heavily for automobile insur-

ance—but such stereotypes are not based solely on statistical correlations; they

also involve common-sense knowledge about the world. Whether the preceding

finding says something about the kind of person who chooses a red car, or

whether it should be discarded as an irrelevancy, is a matter for human 

judgment based on knowledge of the world rather than on purely statistical 

criteria.

When presented with data, you need to ask who is permitted to have access

to it, for what purpose it was collected, and what kind of conclusions is it legit-

imate to draw from it. The ethical dimension raises tough questions for those

involved in practical data mining. It is necessary to consider the norms of the

community that is used to dealing with the kind of data involved, standards that

may have evolved over decades or centuries but ones that may not be known to

the information specialist. For example, did you know that in the library com-

munity, it is taken for granted that the privacy of readers is a right that is 

jealously protected? If you call your university library and ask who has such-

and-such a textbook out on loan, they will not tell you. This prevents a student

from being subjected to pressure from an irate professor to yield access to a book

that she desperately needs for her latest grant application. It also prohibits

enquiry into the dubious recreational reading tastes of the university ethics

committee chairman. Those who build, say, digital libraries may not be aware

of these sensitivities and might incorporate data mining systems that analyze

and compare individuals’ reading habits to recommend new books—perhaps

even selling the results to publishers!

In addition to community standards for the use of data, logical and scientific

standards must be adhered to when drawing conclusions from it. If you do come

up with conclusions (such as red car owners being greater credit risks), you need

to attach caveats to them and back them up with arguments other than purely

statistical ones. The point is that data mining is just a tool in the whole process:

it is people who take the results, along with other knowledge, and decide what

action to apply.

Data mining prompts another question, which is really a political one: to

what use are society’s resources being put? We mentioned previously the appli-

cation of data mining to basket analysis, where supermarket checkout records

are analyzed to detect associations among items that people purchase. What use

should be made of the resulting information? Should the supermarket manager

place the beer and chips together, to make it easier for shoppers, or farther apart,

making it less convenient for them, maximizing their time in the store, and

therefore increasing their likelihood of being drawn into unplanned further

3 6

C H A P T E R   1



|

W H AT ’ S   I T   A L L   A B O U T ?

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 36



purchases? Should the manager move the most expensive, most profitable

diapers near the beer, increasing sales to harried fathers of a high-margin item

and add further luxury baby products nearby?

Of course, anyone who uses advanced technologies should consider the

wisdom of what they are doing. If data is characterized as recorded facts, then

information is the set of patterns, or expectations, that underlie the data. You

could go on to define knowledge as the accumulation of your set of expectations

and wisdom as the value attached to knowledge. Although we will not pursue it

further here, this issue is worth pondering.

As we saw at the very beginning of this chapter, the techniques described in

this book may be called upon to help make some of the most profound and

intimate decisions that life presents. Data mining is a technology that we need

to take seriously.



1.7 Further reading

To avoid breaking up the flow of the main text, all references are collected in a

section at the end of each chapter. This first Further reading section describes

papers, books, and other resources relevant to the material covered in Chapter

1. The human in vitro fertilization research mentioned in the opening to this

chapter was undertaken by the Oxford University Computing Laboratory,

and the research on cow culling was performed in the Computer Science

Department at the University of Waikato, New Zealand.

The example of the weather problem is from Quinlan (1986) and has been

widely used to explain machine learning schemes. The corpus of example prob-

lems mentioned in the introduction to Section 1.2 is available from Blake et al.

(1998). The contact lens example is from Cendrowska (1998), who introduced

the PRISM rule-learning algorithm that we will encounter in Chapter 4. The iris

dataset was described in a classic early paper on statistical inference (Fisher

1936). The labor negotiations data is from the Collective bargaining review, a

publication of Labour Canada issued by the Industrial Relations Information

Service (BLI 1988), and the soybean problem was first described by Michalski

and Chilausky (1980).

Some of the applications in Section 1.3 are covered in an excellent paper that

gives plenty of other applications of machine learning and rule induction

(Langley and Simon 1995); another source of fielded applications is a special

issue of the Machine Learning Journal (Kohavi and Provost 1998). The loan

company application is described in more detail by Michie (1989), the oil slick

detector is from Kubat et al. (1998), the electric load forecasting work is by

Jabbour et al. (1988), and the application to preventative maintenance of

electromechanical devices is from Saitta and Neri (1998). Fuller descriptions 

1 . 7

F U RT H E R   R E A D I N G



3 7

P088407-Ch001.qxd  4/30/05  11:11 AM  Page 37




Yüklə 4,3 Mb.

Dostları ilə paylaş:
1   ...   23   24   25   26   27   28   29   30   ...   219




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə