Data Mining for the Masses

Yüklə 4,8 Kb.

Pdf görüntüsü

səhifə	26/65
tarix	08.10.2017
ölçüsü	4,8 Kb.
	#3815

1 ... 22 23 24 25 26 27 28 29 ... 65

Chapter 5: Association Rules
87

If you would like, you may return to design perspective and experiment.  If you click on
the FP-Growth operator, you can modify the min support value.  Note that while support
percent is the metric calculated and displayed by the Create Association Rules operator, the
min support parameter in the FP-Growth actually calls for a confidence level.  The default of
.95 is very common in much data analysis, but you may want to lower it a bit and re-run
your  model  to  see  what  happens.    Lowering  min  support  to  .5  does  yield  additional  rules,
including some with more than two attributes in the association rules.  As you experiment
you can see that a data miner might need to go back and forth a number of times between
modeling and evaluating before moving on to…

DEPLOYMENT

We  have  been  able  to  help  Roger  with  his  question.    Do  existing  linkages  between  types  of
community groups exist?  Yes, they do.  We have found that the community’s churches, family,
and hobby organizations have some common members. It may be a bit surprising that the political
and professional groups do not appear to be interconnected, but these groups may also be more
specialized (e.g. a local chapter of the bar association) and thus may not have tremendous cross-
organizational appeal or need.  It seems that Roger will have the most luck finding groups that will
collaborate  on  projects  around  town  by  engaging  churches,  hobbyists  and  family-related
organizations.    Using  his  contacts  among  local  pastors  and  other  clergy,  he  might  ask  for
volunteers  from  their  congregations  to  spearhead  projects  to  clean  up  city  parks  used  for  youth
sports (family organization association rule) or to improve a local biking trail (hobby organization
association rule).

CHAPTER SUMMARY

This chapter’s fictional scenario with Roger’s desire to use community groups to improve his city
has shown how association rule data mining can identify linkages in data that can have a practical
application.    In  addition  to  learning  about  the  process  of  creating  association  rule  models  in
RapidMiner, we introduced a new operator that enabled us to change attributes’ data types.  We
also  used  CRISP-DM’s  cyclical  nature  to  understand  that  sometimes  data  mining  involves  some
back  and  forth  ‘digging’  before  moving  on  to  the  next  step.    You  learned  how  support  and

Data Mining for the Masses
88
confidence percentages are calculated and about the importance of these two metrics in identifying
rules and determining their strength in a data set.

REVIEW QUESTIONS

1)

What are association rules? What are they good for?

2)

What  are  the  two  main  metrics  that  are  calculated  in  association  rules  and  how  are  they
calculated?

3)

What data type must a data set’s attributes be in order to use Frequent Pattern operators in
RapidMiner?

4)

How are rule results interpreted?  In this chapter’s example, what was our strongest rule?
How do we know?

EXERCISE

In explaining support and confidence percentages in this chapter, the classic example of shopping
basket analysis was used.  For this exercise, you will do a shopping basket association rule analysis.
Complete the following steps:

1)

Using  the  Internet,  locate  a  sample  shopping  basket  data  set.    Search  terms  such  as
‘association rule data set’ or ‘shopping basket data set’ will yield a number of downloadable
examples. With a little effort, you will be able to find a suitable example.

2)

If  necessary,  convert  your  data  set  to  CSV  format  and  import  it  into  your  RapidMiner
repository. Give it a descriptive name and drag it into a new process window.

3)

As  necessary,  conduct  your  Data  Understanding  and  Data  Preparation  activities  on  your
data set.  Ensure that all of your variables have consistent data and that their data types are
appropriate for the FP-Growth operator.

Chapter 5: Association Rules
89
4)

Generate association rules for your data set.  Modify your confidence and support values in
order to identify their most ideal levels such that you will have some interesting rules with
reasonable confidence and support.  Look at the other measures of rule strength such as
LaPlace or Conviction.

5)

Document  your  findings.    What  rules  did  you  find?    What  attributes  are  most  strongly
associated  with  one  another.    Are  there  products  that  are  frequently  connected  that
surprise you?  Why do you think this might be?  How much did you have to test different
support and confidence values before you found some association rules?  Were any of your
association rules good enough that you would base decisions on them? Why or why not?

Challenge Step!
6)

Build  a  new  association  rule  model  using  your  same  data  set,  but  this  time,  use  the  W-
FPGrowth operator.  (Hints for using the W-FPGrowth operator: (1) This operator creates
its  own  rules  without  help  from  other  operators;  and  (2)  This  operator’s  support  and
confidence parameters are labeled U and C, respectively.

Exploration!

7)

The  Apriori  algorithm  is  often  used  in  data  mining  for  associations.    Search  the
RapidMiner Operators tree for Apriori operators and add them to your data set in a new
process.  Use the Help tab in RapidMiner’s lower right hand corner to learn about these
operators’ parameters and functions (be sure you have the operator selected in your main
process window in order to see its help content).

Yüklə 4,8 Kb.

Dostları ilə paylaş:

1 ... 22 23 24 25 26 27 28 29 ... 65