Data Mining for the Masses

Yüklə 4,8 Kb.

Pdf görüntüsü

səhifə	44/65
tarix	08.10.2017
ölçüsü	4,8 Kb.
	#3815

1 ... 40 41 42 43 44 45 46 47 ... 65

LEARNING OBJECTIVES
ORGANIZATIONAL UNDERSTANDING
DATA UNDERSTANDING Richard has engaged us to help him with his project. We have decided to use a decision tree
User_ID
Marital_Status

Chapter 10: Decision Trees
157

CHAPTER TEN:
DECISION TREES

CONTEXT AND PERSPECTIVE

Richard  works  for  a  large  online  retailer.    His  company  is  launching  a  next-generation  eReader
soon, and they want to maximize the effectiveness of their marketing.  They have many customers,
some of whom purchased one of the company’s previous generation digital readers.  Richard has
noticed that certain types of people were the most anxious to get the previous generation device,
while  other  folks  seemed  to  content  to wait  to  buy  the electronic  gadget  later.    He’s  wondering
what makes some people motivated to buy something as soon as it comes out, while others are less
driven to have the product.

Richard’s employer helps to drive the sales of its new eReader by offering specific products and
services for the eReader through its massive web site—for example, eReader owners can use the
company’s  web  site  to  buy  digital  magazines,  newspapers,  books,  music,  and  so  forth.    The
company  also  sells  thousands  of  other  types  of  media,  such  as  traditional  printed  books  and
electronics of every kind.  Richard believes that by mining the customers’ data regarding general
consumer behaviors on the web site, he’ll be able to figure out which customers will buy the new
eReader  early,  which  ones  will  buy  next,  and  which  ones  will  buy  later  on.    He  hopes  that  by
predicting  when  a  customer  will  be  ready  to  buy  the  next-gen  eReader,  he’ll  be  able  to  time  his
target marketing to the people most ready to respond to advertisements and promotions.

LEARNING OBJECTIVES

After completing the reading and exercises in this chapter, you should be able to:


Explain what decision trees are, how they are used and the benefits of using them.


Recognize  the  necessary  format  for  data  in  order  to  perform  predictive  decision  tree
mining.

Data Mining for the Masses
158


Develop a decision tree data mining model in RapidMiner using a training data set.


Interpret the visual tree’s nodes and leaves, and apply them to a scoring data set in order
to deploy the model.


Use different tree algorithms in order to increase the granularity of the tree’s detail.

ORGANIZATIONAL UNDERSTANDING

Richard  wants  to  be  able  to  predict  the  timing  of  buying  behaviors,  but  he  also  wants  to
understand how his customers’ behaviors on his company’s web site indicate the timing of their
purchase of the new eReader.  Richard has studied the classic diffusion theories that noted scholar
and sociologist Everett Rogers first published in the 1960s.  Rogers surmised that the adoption of
a new technology or innovation tends to follow an ‘S’ shaped curve, with a smaller group of the
most  enterprising  and  innovative  customers  adopting  the  technology  first,  followed  by  larger
groups of middle majority adopters, followed by smaller groups of late adopters (Figure 10-1).

Figure 10-1. Everett Rogers’ theory of adoption of new innovations.

Those  at  the  front  of  the  blue  curve  are  the  smaller  group  that  are  first  to  want  and  buy  the
technology.    Most  of  us,  the  masses,  fall  within  the  middle  70-80%  of  people  who  eventually
acquire the technology.  The low end tail on the right side of the blue curve are the laggards, the
ones who eventually adopt. Consider how DVD players and cell phones have followed this curve.

Understanding  Rogers’  theory,  Richard  believes  that  he  can  categorize  his  company’s  customers
into  one  of  four  groups  that  will  eventually  buy  the  new  eReader:  Innovators,  Early  Adopters,
Early Majority or Late Majority.  These groups track with Rogers’ social adoption theories on the
diffusion  of  technological  innovations,  and  also  with  Richard’s  informal  observations  about  the
speed of adoption of his company’s previous generation product.  He hopes that by watching the
Number of adopters by group
Cumulative number of adopters
over time

Chapter 10: Decision Trees
159
customers’ activity on the company’s web site, he can anticipate approximately when each person
will be most likely to buy an eReader.  He feels like data mining can help him figure out which
activities are the best predictors of which category a customer will fall into.  Knowing this, he can
time his marketing to each customer to coincide with their likelihood of buying.

DATA UNDERSTANDING

Richard  has  engaged  us  to  help  him  with  his  project.    We  have  decided  to  use  a  decision  tree
model in order to find good early predictors of buying behavior.  Because Richard’s company does
all of its business through its web site, there is a rich data set of information for each customer,
including  items  they  have  just  browsed  for,  and  those  they  have  actually  purchased.    He  has
prepared  two  data  sets  for  us  to  use.    The  training  data  set  contains  the  web  site  activities  of
customers who bought the company’s previous generation reader, and the timing with which they
bought their reader.  The second is comprised of attributes of current customers which Richard
hopes will buy the new eReader.  He hopes to figure out which category of adopter each person in
the scoring data set will fall into based on the profiles and buying timing of those people in the
training data set.

In analyzing his data set, Richard has found that customers’ activity in the areas of digital media
and books, and their general activity with electronics for sale on his company’s site, seem to have a
lot in common with when a person buys an eReader.  With this in mind,  we have  worked with
Richard to compile data sets comprised of the following attributes:


User_ID:  A numeric, unique identifier assigned to each person who has an account on
the company’s web site.


Gender: The customer’s gender, as identified in their customer account.  In this data set, it
is recorded a ‘M’ for male and ‘F’ for Female. The Decision Tree operator can handle non-
numeric data types.


Age:  The person’s age at the time the data were extracted from the web site’s database.
This is calculated to the nearest year by taking the difference between the system date and
the person’s birthdate as recorded in their account.


Marital_Status:    The  person’s  marital  status  as  recorded  in  their  account.    People  who
indicated on their account that they are married are entered in the data set as ‘M’. Since the

Yüklə 4,8 Kb.

Dostları ilə paylaş:

1 ... 40 41 42 43 44 45 46 47 ... 65