Contains the number of items in the rule.
SUPPORT= COUNT/total
Contains the percent of support, that is, the percent of the total number of transactions that
qualify for the rule.
Definition:
total is the total number of transactions in the data set.
Options
IN=SAS-data-set
Specifies the input data source. The input to PROC RULEGEN is the OUT= data set created in
PROC ASSOC.
Default:
_LAST_
MINCONF=integer
Specifies the minimum confidence level needed in order to generate a rule. This parameter can be
adjusted so that only high confidence rules are retained.
Default:
10%
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
Details
Output Processing
The output data set created by PROC RULEGEN has the following variables:
SET_SIZE
Contains the number of items in the rule.
RULE
Contains the rule text, for example, A & B ==> C & D.
COUNT
Contains the count of transactions meeting the rule.
CONF
Contains the percent of confidence.
EXP_CONF
Contains the percent of expected confidence.
LIFT
Contains the lift ratio.
SUPPORT
Contains the percent of support.
_LHAND
Contains the left side of the rule.
_RHAND
Contains the right side of the rule.
ITEM1, ITEM2, ..., ITEMn+1
Contains the individual items forming the rule, including the arrow.
Only the rules meeting the minimum confidence value are output. This parameter can be adjusted to
retain only the high confidence rules.
The statistical computation is based on Bayes' theorem, stated as probability of event A conditional on
event B occurring, and is calculated as the probability of both events A and B occurring divided by the
probability of event B.
PROC RULEGEN automatically discovers complex rules with multiple events on either side such as A
& B ==> C, implying event C occurred, given that both events A and B occurred.
Consider the rule lhs ==> rhs.
In terms of the output data set variables, the statistics are computed as follows:
CONF= COUNT/lhs_count
q
EXP_CONF=rhs_count/total
q
LIFT=CONF/EXP_CONF
q
SUPPORT=COUNT/total
q
where total is the number of transactions in the data set.
As you can see, positioning of items on the left or right side does impact statistical calculations, that is,
A ==> B and B ==> A are entirely different rules.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
Example
The following example was executed using the HP-UX version 10.20 operating system and the SAS
software release 6.12TS045.
Example 1: Performing an Association Discovery
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The RULEGEN Procedure
Example 1: Performing an Association Discovery
Featured Tasks ASSOCIATION and RULEGEN Procedures
Specifying the maximum item-set size
q
Setting the support level
q
Specifying the minimum confidence level
q
The following example demonstrates how to perform an association discovery using the ASSOCIATION and RULEGEN
procedures. The example data set SAMPSIO. ASSOCS (stored in the sample library) contains 7,007 separate customer
transactions. The variable CUSTOMER is the ID variable that identifies the customers. The variable PRODUCT is the
nominal target variable that identifies the items. As a marketing analyst for a grocery chain, you want to identify the top 10
item sets, where the purchase of one item has a high impact on the purchase of another item(s).
Program
proc dmdb batch data=sampsio.assocs out=dmassoc dmdbcat=catassoc;
id customer;
class product(desc);
run;
proc assoc data=dmassoc dmdbcat=catassoc
out=datassoc(label='Output from Proc Assoc')
items=5 support=20;
cust customer;
target product;
run;
proc rulegen in=datassoc
out=datrule(label='Output from Proc Rulegen')
minconf=75;
run;
proc sort data=datrule;
by descending lift;
run;
proc print data=datrule(obs=5) label;
var set_size exp_conf conf support lift count
rule _lhand _rhand;
title 'Top Ten Rules based on Lift';run;
Output
Top Ten Rules based on Lift 1
OBS SET_SIZE EXP_CONF CONF SUPPORT LIFT COUNT
1 1 7.39 100.00 7.39 13.53 74.00
2 5 12.59 94.74 8.99 7.53 90.00
3 5 10.79 78.99 9.39 7.32 94.00
4 5 11.89 87.04 9.39 7.32 94.00
5 5 12.69 92.78 8.99 7.31 90.00
OBS RULE
1 bordeaux
2 sardines & baguette & apples ==> peppers & avocado
3 turkey & coke ==> olives & ice_cream & bourbon
4 olives & ice_crea & bourbon ==> turkey & coke
5 peppers & baguette & apples ==> sardines & avocado
OBS _LHAND _RHAND
1
2 sardines & baguette & apple peppers & avocado
3 turkey & coke olives & ice_crea & bourbon
4 olives & ice_crea & bourbon turkey & coke
5 peppers & baguette & apple sardines & avocado
Log
1 proc dmdb batch data=sampsio.assocs out=dmassoc dmdbcat=catassoc;
2 id customer;
3 class product(desc);
4 run;
Records processed= 7007 Mem used = 511K.
NOTE: The PROCEDURE DMDB used 0:00:01.68 real 0:00:00.83 cpu.
5
6 proc assoc data=dmassoc dmdbcat=catassoc
7 out=datassoc(label='Output from Proc Assoc')
8
Dostları ilə paylaş: |