The ASSOC Procedure
PROC ASSOC Statement
Invokes the ASSOC procedure.
PROC ASSOC <
option(s)>;
Required Argument
OUT=SAS-data-set
Specifies the output data set that contains the following variables: SET_SIZE, COUNT, ITEM1,
ITEM2,...ITEMn (where n is the maximum number of variables). See
Details
for more
information.
SET_SIZE:
Variable that contains the total number of transactions in the data set. The first observation
has the SET_SIZE equal to 0. SET_SIZE is labeled as Relations in the Results Browser.
COUNT:
Contains the number of transactions meeting the rule.
ITEM1, ITEM2,...ITEMn:
Contains the individual items forming the rule including the arrow.
Tip:
The OUT= data set created by PROC ASSOC is input to the RULEGEN and
SEQ procedures. Run PROC ASSOC and PROC RULEGEN to perform
association discovery. Run PROC ASSOC and PROC SEQ to perform
sequence discovery.
Options
DATA=SAS-data-set
Identifies the input data source. To perform association discovery, the input data set must have a
separate observation for each product purchased by each customer. You must also assign the ID
model role to a variable and the TARGET model role to another variable in the Input Data Source.
DMDBCAT=SAS-catalog
Identifies the data catalog of the input data source.
ITEMS=integer
Specifies the maximum number of events or transactions to chain (or associate) together.
SUPPORT=integer
Specifies the minimum number of transactions that must be considered in order for a rule to be
accepted. Rules that do not meet this support level are rejected. The
level of support represents
how frequently the combination occurs in the market basket (input data source).
Default:
5% of the largest item frequency count
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The ASSOC Procedure
CUSTOMER Statement
Specifies the customer(s) to be analyzed.
Alias: CUST
CUSTOMER variable-list;
Required Argument
variable-list
Specifies one or more names of customers to be analyzed.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The ASSOC Procedure
TARGET Statement
Specifies the target to be analyzed.
TARGET variable;
Required Argument
variable
Specifies the NOMINAL variable, which contains items usually ordered by customers.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The ASSOC Procedure
Details
The input to the ASSOC procedure has the following role variables: ID and TARGET. All records with
the same ID values form a transaction. Every transaction has a unique ID value and one or more
TARGET values.
You may have more than one ID variable. However, associations analysis can only be performed on one
target variable at a time. When there are multiple ID variables, PROC ASSOC concatenates them into a
single identifier value during computation.
For numeric target variables, missing values constitute a separate item or target level and show up in the
rules as a period (.). For character target variables, completely blank values constitute a separate item
(target level) and show up in the rules as a period (.). All records with missing ID values are considered a
single valid transaction.
Output Processing
PROC ASSOC makes a pass through the data and obtains transaction counts for each item. It outputs
these counts with a SET_SIZE of 1 and the items listed under ITEM1. Items that do not meet the support
level are discarded. By default, the support level is set to 5% of the largest item count.
PROC ASSOC then generates all potential 2-item sets, makes a pass through the data and obtains
transaction counts for each of the 2-item sets. The sets that meet the support level are output with
SET_SIZE of 2 and items listed under ITEM1 and ITEM2.
The entire process is repeated for up to n-item sets. The output from PROC ASSOC is saved as SAS
data sets. The data sets enable you to define your own evaluation criteria and/or reports.
Note that the ordering of n-items within an n-item set is not important. Any individual transaction,
where each of the n-items occurs in any order, qualifies for a count to that particular set. The support
level, once set, remains constant throughout the process.
Caution: The theoretical potential number of item sets can grow very quickly. For example, with 50
different items, you have 1225 potential 2-item sets and 19,600 3-item sets. With 5,000 items, you have
over 12 million of the 2-item sets, and a correspondingly large number of 3-item sets.
Processing an extremely large number of sets could cause your system to run out of disk and/or memory
resources. However, by using a higher support level, you can reduce the item sets to a more manageable
number.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.