Chapter 5: Association Rules
73
CHAPTER FIVE:
ASSOCIATION RULES
CONTEXT AND PERSPECTIVE
Roger is a city manager for a medium-sized, but steadily growing, city. The city has limited
resources, and like most municipalities, there are more needs than there are resources. He feels
like the citizens in the community are fairly active in various community organizations, and
believes that he may be able to get a number of groups to work together to meet some of the
needs in the community. He knows there are churches, social clubs, hobby enthusiasts and other
types of groups in the community. What he doesn’t know is if there are connections between the
groups that might enable natural collaborations between two or more groups that could work
together on projects around town. He decides that before he can begin asking community
organizations to begin working together and to accept responsibility for projects, he needs to find
out if there are any existing associations between the different types of groups in the area.
LEARNING OBJECTIVES
After completing the reading and exercises in this chapter, you should be able to:
Explain what association rules are, how they are found and the benefits of using them.
Recognize the necessary format for data in order to create association rules.
Develop an association rule model in RapidMiner.
Interpret the rules generated by an association rule model and explain their significance, if
any.
ORGANIZATIONAL UNDERSTANDING
Roger’s goal is to identify and then try to take advantage of existing connections in his local
community to get some work done that will benefit the entire community. He knows of many of
Data Mining for the Masses
74
the organizations in town, has contact information for them and is even involved in some of them
himself. His family is involved in an even broader group of organizations, so he understands on a
personal level the diversity of groups and their interests. Because people he and his family knows
are involved in other groups around town, he is aware in a more general sense of many different
types of organizations, their interests, objectives and potential contributions. He knows that to
start, his main concern is finding types of organizations that seem to be connected with one
another. Identifying individuals to work with at each church, social club or political organization
will be overwhelming without first categorizing the organizations into groups and looking for
associations between the groups. Only once he’s checked for existing connections will he feel
ready to begin contacting people and asking them to use their cross-organizational contacts and
take on project ownership. His first need is to find where such associations exist.
DATA UNDERSTANDING
In order to answer his question, Roger has enlisted our help in creating an association rules data
mining model. Association rules are a data mining methodology that seeks to find frequent
connections between attributes in a data set. Association rules are very common when doing
shopping basket analysis. Marketers and vendors in many sectors use this data mining approach to
try to find which products are most frequently purchased together. If you have ever purchased
items on an e-Commerce retail site like Amazon.com, you have probably seen the fruits of
association rule data mining. These are most commonly found in the recommendations sections
of such web sites. You might notice that when you search for a smartphone, recommendations for
screen protectors, protective cases, and other accessories such as charging cords or data cables are
often recommended to you. The items being recommended are identified by mining for items that
previous customers bought in conjunction with the item you search for. In other words, those
items are found to be associated with the item you are looking for, and that association is so frequent
in the web site’s data set, that the association might be considered a rule. Thus is born the name of
this data mining approach: “association rules”. While association rules are most common in
shopping basket analysis, this modeling technique can be applied to a broad range of questions.
We will help Roger by creating an association rule model to try to find linkages across types of
community organizations.
Chapter 5: Association Rules
75
Working together, we using Roger’s knowledge of the local community to create a short survey
which we will administer online via a web site. In order to ensure a measure of data integrity and
to try to protect against possible abuse, our web survey is password protected. Each organization
invited to participate in the survey is given a unique password. The leader of that organization is
asked to share the password with his or her membership and to encourage participation in the
survey. Community members are given a month to respond, and each time an individual logs on
complete the survey, the password used is recorded so that we can determine how many people
from each organization responded. After the month ends, we have a data set comprised of the
following attributes:
Elapsed_Time: This is the amount of time each respondent spent completing our survey.
It is expressed in decimal minutes (e.g. 4.5 in this attribute would be four minutes, thirty
seconds).
Time_in_Community: This question on the survey asked the person if they have lived in
the area for 0-2 years, 3-9 years, or 10+ years; and is recorded in the data set as Short,
Medium, or Long respectively.
Gender: The survey respondent’s gender.
Working: A yes/no column indicating whether or not the respondent currently has a paid
job.
Age: The survey respondent’s age in years.
Family: A yes/no column indicating whether or not the respondent is currently a member
of a family-oriented community organization, such as Big Brothers/Big Sisters, childrens’
recreation or sports leagues, genealogy groups, etc.
Hobbies: A yes/no column indicating whether or not the respondent is currently a
member of a hobby-oriented community organization, such as amateur radio, outdoor
recreation, motorcycle or bicycle riding, etc.
Social_Club: A yes/no column indicating whether or not the respondent is currently a
member of a community social organization, such as Rotary International, Lion’s Club, etc.
Political: A yes/no column indicating whether or not the respondent is currently a
member of a political organization with regular meetings in the community, such as a
political party, a grass-roots action group, a lobbying effort, etc.
Dostları ilə paylaş: |