The arboretum procedure



Yüklə 3.07 Mb.

səhifə115/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   111   112   113   114   115   116   117   118   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

9   

10       items=5 support=20;

11   

12      cust customer;



13      target product;

14   run;

----- Potential 1 item sets = 20 -----

Counting items, records read:     7007

Number of customers:              1001

Support level for item sets:        20

Maximum count for a set:           600

Sets meeting support level:         20

Megs of memory used:              0.51

----- Potential 2 item sets = 190 -----

Counting items, records read:     7007

Maximum count for a set:           366

Sets meeting support level:        183

Megs of memory used:              0.51

----- Potential 3 item sets = 1035 -----

Counting items, records read:     7007

Maximum count for a set:           234

Sets meeting support level:        615

Megs of memory used:              0.51

----- Potential 4 item sets = 1071 -----

Counting items, records read:     7007

Maximum count for a set:           137

Sets meeting support level:        317

Megs of memory used:              0.51

----- Potential 5 item sets = 85 -----

Counting items, records read:     7007

Maximum count for a set:           116

Sets meeting support level:         71

Megs of memory used:              0.51

NOTE: The PROCEDURE ASSOC used 0:00:07.86 real 0:00:03.45 cpu.

15 

16   proc rulegen in=datassoc



17      out=datrule(label='Output from Proc Rulegen')

18   


19      minconf=75;

20   run;

write set 1

write set 2

write set 3

write set 4

write set 5

NOTE: The PROCEDURE RULEGEN used 0:00:06.07 real 0:00:02.69 cpu.

21 

22   proc sort data=datrule;




23      by descending lift;

24   run;

NOTE: The data set WORK.DATRULE has 939 observations and 15 variables.

NOTE: The PROCEDURE SORT used 0:00:00.92 real 0:00:00.31 cpu.

25   proc print data=datrule(obs=5) label;

26      var set_size exp_conf conf support lift count

27          rule _lhand _rhand;

28      title 'Top Ten Rules based on Lift';

29   run;

NOTE: The PROCEDURE PRINT used 0:00:00.18 real 0:00:00.11 cpu.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



 

Before you can run PROC ASSOC, you must create the DMDB data set and

the DMDB catalog by using a PROC DMBD step.

proc dmdb batch data=sampsio.assocs out=dmassoc dmdbcat=catassoc;

   id customer;

   class product(desc);

run;



 

The ASSOCIATION procedure determines the products that are related.

The DATA= and DMDB= options identify the DMDB data set and catalog, respectively.

PROC ASSOC writes the related products to the OUT= data set, which is used

as input by the RULEGEN procedure.

proc assoc data=dmassoc dmdbcat=catassoc 

   out=datassoc(label='Output from Proc Assoc')



 

The ITEMS= option specifies the maximum size of the item set to be considered

(default=4). The SUPPORT= option specifies the minimum support level that

is required for a rule to be accepted (default =5% of the largest frequency). 

   

    items=5 support=20;




 

The CUST statement (alias = CUSTOMER) specifies the ID variable. The

TARGET statement specifies the nominal target variable.

   cust customer;

   target product;

run;



 

The RULEGEN procedure uses the output from PROC ASSOC to generate the

rules. The rules are written to the OUT=data set. 

proc rulegen in=datassoc 

   out=datrule(label='Output from Proc Rulegen')



 

The MINCONF= option specifies the minimum confidence required in order

to generate a rule (default =10). 

   minconf=75;

run;



 

Because neither PROC ASSOC nor RULEGEN generates printed output, the

remaining code sorts the data by the LIFT values and then generates a simple

list report of the rules that have the top 10 values for LIFT. This is done

primarily to limit the amount of output displayed in this example.

proc sort data=datrule;

   by descending lift;

run;


proc print data=datrule(obs=5) label;

   var set_size exp_conf conf support lift count 

       rule _lhand _rhand;

   title 'Top Ten Rules based on Lift';run;




The PROC PRINT list report of the top 10 rules based on the LIFT value. The output data set from PROC

RULEGEN contains the following variables:

SET_SIZE - contains the number of items in the rule.

q   


EXP_CONF - the expected confidence (right side count/total).

q   


CONF - the confidence (count / left side).

q   


SUPPORT - the support level (count/total).

q   


LIFT - the lift ratio (confidence/expected confidence).

q   


COUNT - number of transactions meeting the rule.

q   


RULE - contains the text rule, for example, Right side ==> Left side.

q   


_LHAND - contains the left side of the rule.

q   


_RHAND - contains the right side of the rule.

q   


ITEM1, ITEM2, .... ITEMn+1 - contains the individual items forming the rule, including the arrow. For this

example, the individual items have been omitted from the list report.

q   

      Top Ten Rules based on Lift                          1



      OBS    SET_SIZE    EXP_CONF      CONF    SUPPORT      LIFT     COUNT

        1          1        7.39     100.00      7.39      13.53     74.00

        2          5       12.59      94.74      8.99       7.53     90.00

        3          5       10.79      78.99      9.39       7.32     94.00

        4          5       11.89      87.04      9.39       7.32     94.00

        5          5       12.69      92.78      8.99       7.31     90.00

      OBS    RULE

        1    bordeaux                                          

        2    sardines & baguette & apples ==> peppers & avocado

        3    turkey & coke ==> olives & ice_cream & bourbon     

        4    olives & ice_crea & bourbon ==> turkey & coke     

        5    peppers & baguette & apples ==> sardines & avocado

      OBS    _LHAND                         _RHAND

        1                                                            

        2    sardines & baguette & apple    peppers & avocado        

        3    turkey & coke                  olives & ice_crea & bourbon

        4    olives & ice_crea & bourbon      turkey & coke            

        5    peppers & baguette & apple     sardines & avocado     





Dostları ilə paylaş:
1   ...   111   112   113   114   115   116   117   118   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə