The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə30/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   26   27   28   29   30   31   32   33   ...   148

The ASSOC Procedure

PROC ASSOC Statement

Invokes the ASSOC procedure.

PROC ASSOC <option(s)>;

Required Argument

OUT=SAS-data-set

Specifies the output data set that contains the following variables: SET_SIZE, COUNT, ITEM1,

ITEM2,...ITEMn (where n is the maximum number of variables). See 

Details


 for more

information.

SET_SIZE:

Variable that contains the total number of transactions in the data set. The first observation

has the SET_SIZE equal to 0. SET_SIZE is labeled as Relations in the Results Browser.

COUNT:


Contains the number of transactions meeting the rule.

ITEM1, ITEM2,...ITEMn:

Contains the individual items forming the rule including the arrow.

Tip:

The OUT= data set created by PROC ASSOC is input to the RULEGEN and

SEQ procedures. Run PROC ASSOC and PROC RULEGEN to perform

association discovery. Run PROC ASSOC and PROC SEQ to perform

sequence discovery.

Options

DATA=SAS-data-set

Identifies the input data source. To perform association discovery, the input data set must have a

separate observation for each product purchased by each customer. You must also assign the ID

model role to a variable and the TARGET model role to another variable in the Input Data Source.



DMDBCAT=SAS-catalog

Identifies the data catalog of the input data source.



ITEMS=integer

Specifies the maximum number of events or transactions to chain (or associate) together.



SUPPORT=integer

Specifies the minimum number of transactions that must be considered in order for a rule to be




accepted. Rules that do not meet this support level are rejected. The level of support represents

how frequently the combination occurs in the market basket (input data source).



Default:

5% of the largest item frequency count

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The ASSOC Procedure

CUSTOMER Statement

Specifies the customer(s) to be analyzed.

Alias: CUST

CUSTOMER variable-list;

Required Argument

variable-list

Specifies one or more names of customers to be analyzed.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The ASSOC Procedure

TARGET Statement

Specifies the target to be analyzed.

TARGET variable;

Required Argument

variable

Specifies the NOMINAL variable, which contains items usually ordered by customers.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The ASSOC Procedure

Details

The input to the ASSOC procedure has the following role variables: ID and TARGET. All records with

the same ID values form a transaction. Every transaction has a unique ID value and one or more

TARGET values.

You may have more than one ID variable. However, associations analysis can only be performed on one

target variable at a time. When there are multiple ID variables, PROC ASSOC concatenates them into a

single identifier value during computation.

For numeric target variables, missing values constitute a separate item or target level and show up in the

rules as a period (.). For character target variables, completely blank values constitute a separate item

(target level) and show up in the rules as a period (.). All records with missing ID values are considered a

single valid transaction.

Output Processing

PROC ASSOC makes a pass through the data and obtains transaction counts for each item. It outputs

these counts with a SET_SIZE of 1 and the items listed under ITEM1. Items that do not meet the support

level are discarded. By default, the support level is set to 5% of the largest item count.

PROC ASSOC then generates all potential 2-item sets, makes a pass through the data and obtains

transaction counts for each of the 2-item sets. The sets that meet the support level are output with

SET_SIZE of 2 and items listed under ITEM1 and ITEM2.

The entire process is repeated for up to n-item sets. The output from PROC ASSOC is saved as SAS

data sets. The data sets enable you to define your own evaluation criteria and/or reports.

Note that the ordering of n-items within an n-item set is not important. Any individual transaction,

where each of the n-items occurs in any order, qualifies for a count to that particular set. The support

level, once set, remains constant throughout the process.

Caution: The theoretical potential number of item sets can grow very quickly. For example, with 50

different items, you have 1225 potential 2-item sets and 19,600 3-item sets. With 5,000 items, you have

over 12 million of the 2-item sets, and a correspondingly large number of 3-item sets.

Processing an extremely large number of sets could cause your system to run out of disk and/or memory

resources. However, by using a higher support level, you can reduce the item sets to a more manageable

number.


Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.


Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   26   27   28   29   30   31   32   33   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə