Chapter 1: Introduction to Data Mining and CRISP-DM
11
something no pollster or election insider consider likely, or even possible. In fact, most ‘experts’
expected Stevenson to win by a narrow margin, with some acknowledging that because they
expected it to be close, Eisenhower might also prevail in a tight vote. It was only late that night,
when human vote counts confirmed that Eisenhower was running away with the election, that
CBS went on the air to acknowledge first that Eisenhower had won, and second, that UNIVAC
had predicted this very outcome hours earlier, but network brass had refused to trust the
computer’s prediction. UNIVAC was further vindicated later, when it’s prediction was found to
be within 1% of what the eventually tally showed. New technology is often unsettling to people,
and it is hard sometimes to trust what computers show. Be patient and specific as you explain how
a new data mining model works, what the results mean, and how they can be used.
While the UNIVAC example illustrates the power and utility of predictive computer modeling
(despite inherent mistrust), it should not construed as a reason for blind trust either. In the days of
UNIVAC, the biggest problem was the newness of the technology. It was doing something no
one really expected or could explain, and because few people understood how the computer
worked, it was hard to trust it. Today we face a different but equally troubling problem: computers
have become ubiquitous, and too often, we don’t question enough whether or not the results are
accurate and meaningful. In order for data mining models to be effectively deployed, balance must
be struck. By clearly communicating a model’s function and utility to stake holders, thoroughly
testing and proving the model, then planning for and monitoring its implementation, data mining
models can be effectively introduced into the organizational flow. Failure to carefully and
effectively manage deployment however can sink even the best and most effective models.
DATA MINING AND YOU
Because data mining can be applied to such a wide array of professional fields, this book has been
written with the intent of explaining data mining in plain English, using software tools that are
accessible and intuitive to everyone. You may not have studied algorithms, data structures, or
programming, but you may have questions that can be answered through data mining. It is our
hope that by writing in an informal tone and by illustrating data mining concepts with accessible,
logical examples, data mining can become a useful tool for you regardless of your previous level of
data analysis or computing expertise. Let’s start digging!
Chapter 2: Organizational Understanding and Data Understanding
13
CHAPTER TWO:
ORGANIZATIONAL UNDERSTANDING AND DATA
UNDERSTANDING
CONTEXT AND PERSPECTIVE
Consider some of the activities you’ve been involved with in the past three or four days. Have you
purchased groceries or gasoline? Attended a concert, movie or other public event? Perhaps you
went out to eat at a restaurant, stopped by your local post office to mail a package, made a
purchase online, or placed a phone call to a utility company. Every day, our lives are filled with
interactions – encounters with companies, other individuals, the government, and various other
organizations.
In today’s technology-driven society, many of those encounters involve the transfer of information
electronically. That information is recorded and passed across networks in order to complete
financial transactions, reassign ownership or responsibility, and enable delivery of goods and
services. Think about the amount of data collected each time even one of these activities occurs.
Take the grocery store for example. If you take items off the shelf, those items will have to be
replenished for future shoppers – perhaps even for yourself – after all you’ll need to make similar
purchases again when that case of cereal runs out in a few weeks. The grocery store must
constantly replenish its supply of inventory, keeping the items people want in stock while
maintaining freshness in the products they sell. It makes sense that large databases are running
behind the scenes, recording data about what you bought and how much of it, as you check out
and pay your grocery bill. All of that data must be recorded and then reported to someone whose
job it is to reorder items for the store’s inventory.
However, in the world of data mining, simply keeping inventory up-to-date is only the beginning.
Does your grocery store require you to carry a frequent shopper card or similar device which,
when scanned at checkout time, gives you the best price on each item you’re buying? If so, they
Data Mining for the Masses
14
can now begin not only keep track of store-wide purchasing trends, but individual purchasing
trends as well. The store can target market to you by sending mailers with coupons for products
you tend to purchase most frequently.
Now let’s take it one step further. Remember, if you can, what types of information you provided
when you filled out the form to receive your frequent shopper card. You probably indicated your
address, date of birth (or at least birth year), whether you’re male or female, and perhaps the size of
your family, annual household income range, or other such information. Think about the range of
possibilities now open to your grocery store as they analyze that vast amount of data they collect at
the cash register each day:
Using ZIP codes, the store can locate the areas of greatest customer density, perhaps
aiding their decision about the construction location for their next store.
Using information regarding customer gender, the store may be able to tailor marketing
displays or promotions to the preferences of male or female customers.
With age information, the store can avoid mailing coupons for baby food to elderly
customers, or promotions for feminine hygiene products to households with a single
male occupant.
These are only a few the many examples of potential uses for data mining. Perhaps as you read
through this introduction, some other potential uses for data mining came to your mind. You may
have also wondered how ethical some of these applications might be. This text has been designed
to help you understand not only the possibilities brought about through data mining, but also the
techniques involved in making those possibilities a reality while accepting the responsibility that
accompanies the collection and use of such vast amounts of personal information.
LEARNING OBJECTIVES
After completing the reading and exercises in this chapter, you should be able to:
Define the discipline of Data Mining
List and define various types of data
List and define various sources of data
Explain the fundamental differences between databases, data warehouses and data sets
Dostları ilə paylaş: |