Chapter 5:
Association Rules
87
If you would like, you may return to design perspective and experiment. If you click on
the FP-Growth operator, you can modify the
min support value. Note that while support
percent is the metric calculated and displayed by the Create Association Rules operator, the
min support parameter in the FP-Growth actually calls for a confidence level. The default of
.95 is very common in much data analysis, but you may want to lower it a bit and re-run
your model to see what happens. Lowering
min support to .5 does yield additional rules,
including some with more than two attributes in the association rules. As you experiment
you can see that a data miner might need to go back and forth a number of times between
modeling and evaluating before moving on to…
DEPLOYMENT
We have been able to help Roger with his question. Do existing linkages between types of
community groups exist? Yes, they do. We have found that the community’s churches, family,
and hobby organizations have some common members. It may be a bit surprising that
the political
and professional groups do not appear to be interconnected, but these groups may also be more
specialized (e.g. a local chapter of the bar association) and thus may not have tremendous cross-
organizational appeal or need. It seems that Roger will have the most luck finding groups that will
collaborate on projects around town by engaging churches, hobbyists and family-related
organizations. Using his contacts among local pastors and other clergy, he might ask for
volunteers from their congregations to spearhead projects to clean up city parks used for youth
sports (family organization association rule) or to improve a local biking trail (hobby organization
association rule).
CHAPTER SUMMARY
This chapter’s fictional scenario with Roger’s desire to use community groups to improve his city
has shown how association rule data mining can identify linkages in data that can have a practical
application. In addition to learning about the process of creating association rule models in
RapidMiner, we introduced a new operator that enabled us to change attributes’ data types. We
also used CRISP-DM’s cyclical nature to understand that sometimes data mining involves some
back and forth ‘digging’ before moving on to the next step. You learned how support and
Data Mining
for the Masses
88
confidence percentages are calculated and about the importance of these two metrics in identifying
rules and determining their strength in a data set.
REVIEW QUESTIONS
1)
What are association rules? What are they good for?
2)
What are the two main metrics that are calculated in association rules and how are they
calculated?
3)
What data type must a data set’s attributes be in order to use Frequent Pattern operators in
RapidMiner?
4)
How are rule results interpreted? In this chapter’s example, what was our strongest rule?
How do we know?
EXERCISE
In explaining support and confidence percentages in this chapter, the classic example of shopping
basket analysis was used. For this exercise, you will do a shopping basket association rule analysis.
Complete the following steps:
1)
Using the Internet, locate a sample shopping basket data set. Search terms such as
‘association rule data set’ or ‘shopping basket data set’ will yield a number of downloadable
examples.
With a little effort, you will be able to find a suitable example.
2)
If necessary, convert your data set to CSV format and import it into your RapidMiner
repository. Give it a descriptive name and drag it into a new process window.
3)
As necessary, conduct your Data Understanding and Data Preparation activities on your
data set. Ensure that all of your variables have consistent data and that their data types are
appropriate for the FP-Growth operator.
Chapter 5: Association Rules
89
4)
Generate association rules for your data set. Modify your confidence and support values in
order to identify their most ideal levels such that you will have some interesting rules with
reasonable confidence and support. Look at the other measures of rule strength such as
LaPlace or Conviction.
5)
Document your findings. What rules did you find? What attributes are most strongly
associated with one another. Are there products that are frequently connected that
surprise you? Why do you think this might be? How much did you have to test different
support and confidence values before you found some association rules? Were any of your
association rules good enough that you would base decisions on them? Why or why not?
Challenge Step!
6)
Build a new association rule model using your same data set, but this time, use the W-
FPGrowth operator. (Hints for using the W-FPGrowth operator: (1) This operator creates
its own rules without help from other operators; and (2) This operator’s support and
confidence
parameters are labeled U and C, respectively.
Exploration!
7)
The Apriori algorithm is often used in data mining for associations. Search the
RapidMiner Operators tree for Apriori operators and add them to your data set in a new
process. Use the Help tab in RapidMiner’s lower right hand corner to learn about these
operators’ parameters and functions (be sure you have the operator selected in your main
process window in order to see its help content).