The arboretum procedure



Yüklə 3,07 Mb.

səhifə114/148
tarix30.04.2018
ölçüsü3,07 Mb.
1   ...   110   111   112   113   114   115   116   117   ...   148

Contains the number of items in the rule.

SUPPORT= COUNT/total

Contains the percent of support, that is, the percent of the total number of transactions that

qualify for the rule.



Definition:

total is the total number of transactions in the data set.

Options

IN=SAS-data-set

Specifies the input data source. The input to PROC RULEGEN is the OUT= data set created in

PROC ASSOC.

Default:

_LAST_


MINCONF=integer

Specifies the minimum confidence level needed in order to generate a rule. This parameter can be

adjusted so that only high confidence rules are retained.

Default:

10%


Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.


The RULEGEN Procedure

Details

Output Processing

The output data set created by PROC RULEGEN has the following variables:

SET_SIZE

Contains the number of items in the rule.

RULE

Contains the rule text, for example, A & B ==> C & D.



COUNT

Contains the count of transactions meeting the rule.

CONF

Contains the percent of confidence.



EXP_CONF

Contains the percent of expected confidence.

LIFT

Contains the lift ratio.



SUPPORT

Contains the percent of support.

_LHAND

Contains the left side of the rule.



_RHAND

Contains the right side of the rule.

ITEM1, ITEM2, ..., ITEMn+1

Contains the individual items forming the rule, including the arrow.

Only the rules meeting the minimum confidence value are output. This parameter can be adjusted to

retain only the high confidence rules.

The statistical computation is based on Bayes' theorem, stated as probability of event A conditional on

event B occurring, and is calculated as the probability of both events A and B occurring divided by the

probability of event B.

PROC RULEGEN automatically discovers complex rules with multiple events on either side such as A

& B ==> C, implying event C occurred, given that both events A and B occurred.

Consider the rule lhs ==> rhs.




In terms of the output data set variables, the statistics are computed as follows:

CONF= COUNT/lhs_count

q   

EXP_CONF=rhs_count/total



q   

LIFT=CONF/EXP_CONF

q   

SUPPORT=COUNT/total



q   

where total is the number of transactions in the data set.

As you can see, positioning of items on the left or right side does impact statistical calculations, that is,

A ==> B and B ==> A are entirely different rules.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The RULEGEN Procedure

Example

The following example was executed using the HP-UX version 10.20 operating system and the SAS

software release 6.12TS045.

Example 1: Performing an Association Discovery

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The RULEGEN Procedure

Example 1: Performing an Association Discovery

Featured Tasks ASSOCIATION and RULEGEN Procedures

Specifying the maximum item-set size

q   

Setting the support level



q   

Specifying the minimum confidence level

q   

The following example demonstrates how to perform an association discovery using the ASSOCIATION and RULEGEN



procedures. The example data set SAMPSIO. ASSOCS (stored in the sample library) contains 7,007 separate customer

transactions. The variable CUSTOMER is the ID variable that identifies the customers. The variable PRODUCT is the

nominal target variable that identifies the items. As a marketing analyst for a grocery chain, you want to identify the top 10

item sets, where the purchase of one item has a high impact on the purchase of another item(s).



Program

 

proc dmdb batch data=sampsio.assocs out=dmassoc dmdbcat=catassoc;



   id customer;

   class product(desc);

run;

 

proc assoc data=dmassoc dmdbcat=catassoc 



   out=datassoc(label='Output from Proc Assoc')

 

   



    items=5 support=20;

 

   cust customer;



   target product;

run;


 

proc rulegen in=datassoc 

   out=datrule(label='Output from Proc Rulegen')

 

   minconf=75;



run;


 

proc sort data=datrule;

   by descending lift;

run;


proc print data=datrule(obs=5) label;

   var set_size exp_conf conf support lift count 

       rule _lhand _rhand;

   title 'Top Ten Rules based on Lift';run;



Output

 

      Top Ten Rules based on Lift                          1



      OBS    SET_SIZE    EXP_CONF      CONF    SUPPORT      LIFT     COUNT

        1          1        7.39     100.00      7.39      13.53     74.00

        2          5       12.59      94.74      8.99       7.53     90.00

        3          5       10.79      78.99      9.39       7.32     94.00

        4          5       11.89      87.04      9.39       7.32     94.00

        5          5       12.69      92.78      8.99       7.31     90.00

      OBS    RULE

        1    bordeaux                                          

        2    sardines & baguette & apples ==> peppers & avocado

        3    turkey & coke ==> olives & ice_cream & bourbon     

        4    olives & ice_crea & bourbon ==> turkey & coke     

        5    peppers & baguette & apples ==> sardines & avocado

      OBS    _LHAND                         _RHAND

        1                                                            

        2    sardines & baguette & apple    peppers & avocado        

        3    turkey & coke                  olives & ice_crea & bourbon

        4    olives & ice_crea & bourbon      turkey & coke            

        5    peppers & baguette & apple     sardines & avocado     



Log

 

1    proc dmdb batch data=sampsio.assocs out=dmassoc dmdbcat=catassoc;



2       id customer;

3       class product(desc);

4    run;

Records processed=    7007  Mem used = 511K.

NOTE: The PROCEDURE DMDB used 0:00:01.68 real 0:00:00.83 cpu.

6   proc assoc data=dmassoc dmdbcat=catassoc



7      out=datassoc(label='Output from Proc Assoc')

8   





Dostları ilə paylaş:
1   ...   110   111   112   113   114   115   116   117   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə