The arboretum procedure



Yüklə 3.07 Mb.

səhifə119/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   115   116   117   118   119   120   121   122   ...   148

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.


 

The PROC PRINT procedure lists the first 10 observations in the SAMPSIO.ASSOCS

data set.

proc print data=sampsio.assocs(obs=10);

   title 'Partial Listing of the ASSOCS Data Set';

run;



 

Before you can run the ASSOCIATION and SEQUENCE procedures, you must

create the DMDB data set and the DMDB catalog by using a PROC DMBD step.

proc dmdb batch data=sampsio.assocs out=dmseq dmdbcat=catseq;

   id customer time;

   class product(desc);

run;



 

The ASSOCIATION procedure determines the products that are related.

The DATA= and DMDB= options identify the DMDB data set and catalog, respectively.

PROC ASSOC writes the related products to the  OUT= data set; this data set

is used as input by the SEQUENCE procedure.

proc assoc data=dmseq dmdbcat=catseq 

   out=aout(label='Output from Proc Assoc')



 

The ITEMS= option specifies the maximum size of the item set to be considered

(default=4). The SUPPORT= option specifies the minimum support level that

is required for a rule to be accepted (default =5% of the largest frequency). 

   

    items=5 support=20;




 

The CUST statement (alias = CUSTOMER) specifies the ID variable. The

TARGET statement specifies the nominal target variable.

   cust customer;

   target product;

run;



 

The DATA= and DMDB= options identify the DMDB data set and catalog,

respectively.  The ASSOC= option identifies the name of the input data set

from the previous PROC ASSOC run.

proc sequence data=dmseq dmdbcat=catseq

              assoc=aout

              out=sout(label='Output from Proc Sequence')                 



 

The NITEMS= option specifies the maximum number of events for which

rules, or chains, are  generated. By default, the SEQUENCE procedure computes

binary sequences (NITEMS=2). 

             nitems=2;



 

The CUST statement (alias = CUSTOMER) specifies the ID variable. The

TARGET statement specifies the nominal target variable.

   cust customer;

   target product;



 

The VISIT statement names the timing or sequence variable. 

   visit time;

run;



 

The SORT procedure sorts the observations in descending order by the

values of support.

proc sort data=sout;

   by descending support;

run;



 

The PRINT procedure lists the first 10 observations in the sorted sequence

data set.

proc print data=sout(obs=10);

   var count support conf rule;

   title 'Partial Listing of the 2-Item Sequences';

run;



The SEQUENCE Procedure

Example 2: Specifying the Maximum Number of Item

Events and Setting the Lower Timing Limit

SEQUENCE Procedure

Using the NITEMS= option to Specify the Maximum Number of Event Items

q   

Using the SAME= option to Set the Lower Timing Limit



q   

This example demonstrates how to specify the maximum number of item events and how to set the lower timing

limit of a sequence rule. Before you run the example program, you should submit the PROC DMDB and PROC

ASSOC steps from Example 1.

proc sequence data=dmseq

              dmdbcat=catseq

              assoc=aout

              out=s4out(label = 'Output from Proc Sequence')

 

             nitems=4;



   cust customer;

   target product;

 

   visit time / same=2;



run;

 

proc sort data=s4out;



   by descending support;

run;


 

proc print data=s4out(obs=10);

   var count support conf rule;

   title 'Partial Listing of the 4-Item Sequences';

   title2 'Lower Timing Limit Set to 2';

run;


Output


Partial PROC PRINT Listing of the 4-Item Sequence Data Set, Lower Time

Set to 2

When the lower time limit is set to 2, the rule with the highest support is now a herring purchase followed by a

heineken purchase. Twenty-three percent of the customer population supports it, with a 48% confidence.

           Partial Listing of the 4-Item Sequences

                  Lower Timing Limit Set to 2

   OBS       COUNT     SUPPORT        CONF    RULE

     1         235     23.4765     48.3539    hering ==> heineken           

     2         225     22.4775     57.3980    baguette ==> heineken         

     3         220     21.9780     69.1824    soda ==> cracker              

     4         218     21.7782     68.5535    soda ==> heineken & cracker   

     5         218     21.7782     68.5535    soda ==> heineken             

     6         215     21.4785     45.4545    olives ==> turkey             

     7         213     21.2787     52.8536    bourbon ==> cracker           

     8         209     20.8791    100.0000    hering & baguette ==> heineken

     9         201     20.0799     55.3719    avocado ==> heineken          

    10         150     14.9850     30.8642    hering ==> cracker          



Partial Log Listing

1  proc sequence data=dmseq

2                dmdbcat=catseq

3                assoc=aout

4                out=s4out(label = 'Output from Proc Sequence')

5  


6               nitems=4;

7     cust customer;

8     target product;

9  


10     visit time / same=2;

11  run;


Large itemsets:            1206

Total records read:        7007

Customer count:            1001

Support set to:              20

Total Litem Sequences:     5641

Number >= support           466

--- Number Items:         3 ---

Total records read:        7007

Customer count:            1001

Total Litem Sequences:     5086

Number >= support            12

--- Number Items:         4 ---

Total records read:        7007

Customer count:            1001

Total Litem Sequences:        0





Dostları ilə paylaş:
1   ...   115   116   117   118   119   120   121   122   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə