The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	78/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 74 75 76 77 78 79 80 81 ... 148

Default:

0; even the smallest Chi-Square values are eligible.

OUTVARS=SAS-data-set

Specifies an optional output data set containing most of the output table information for the splits.

PASSES=integer

Specifies an upper bound for the number of passes through the input data set that are used for

performing the binary splits.

Range:

Integer > 0

Default:

12

PRINT | NOPRINT

Specifies whether or not to suppress all output printed in the Output window.

Default:

NOPRINT

The DMSPLIT Procedure

FREQ Statement

Alias: FREQUENCY

Tip: Specify the FREQ variable in PROC DMDB so that the information is saved in the catalog and

so that the variable is automatically used as a FREQ variable in PROC DMSPLIT. This also

ensures that the FREQ variable is automatically used by all other Enterprise Miner procedures

in the project.

FREQ variable;

Required Argument

variable

Specifies one numeric (interval scaled) FREQUENCY variable.

Range:

Any integer. A rational value is truncated to the next integer.

CAUTION:__If_a_target_is_specified_in_PROC_DMDB,_it_must_not_be_specified_in_PROC_DMSPLIT.'>CAUTION:

If the FREQ variable is specified in PROC DMDB, it must not be specified in PROC

DMSPLIT.

The DMSPLIT Procedure

TARGET Statement

Tip: One or more variables may be specified already in PROC DMDB.

TARGET variable;

Required Argument

variable

Specifies the target variable. One variable name can be specified identifying the target (response)

variable for the least squares and logistic regressions.

CAUTION:

If a target is specified in PROC DMDB, it must not be specified in PROC DMSPLIT.

The DMSPLIT Procedure

VARIABLE Statement

Alias: VAR

VARIABLE variable-list;

Required Argument

variable-list

Specifies all the variables (numeric and categorical, that is, INTERVAL and CLASS) that can be

used for independent variables in the prediction or modeling of the target variable.

The DMSPLIT Procedure

WEIGHT Statement

Alias: WEIGHTS

Tip: Specify the WEIGHT variable in PROC DMDB so that the information is saved in the catalog

and so that the variable is used automatically as a WEIGHT variable in PROC DMSPLIT.

WEIGHT variable;

Required Argument

variable

Specifies one numeric (interval scaled) variable that is used to weight the input variables.

CAUTION:

If the WEIGHT variable is specified in PROC DMDB, it must not be specified in

PROC DMSPLIT.

The DMSPLIT Procedure

Details

Missing Values

For numeric variables, missing values are replaced by the (weighted) mean of the variable. For

categorical (CLASS) variables, missing values are treated as an additional category.

The DMSPLIT Procedure

Examples

The following examples were executed on the Windows NT operating system; the version of the SAS

System was 6.12TS045.

Example 1: Creating a Decision Tree for a Binary Target with the DMSPLIT Procedure

The DMSPLIT Procedure

Example 1: Creating a Decision Tree for a Binary Target

with the DMSPLIT Procedure

Features:

Specifying the target and input variables.

Setting the number of categories in which the range of each interval variable is divided for splits.

Setting the number of passes the procedure makes to determine the optimum number of splits.

Setting the chi-square lower bound for evaluating the splits.

Importing the DMSPLIT tree to the SPLIT procedure.

Producing summary statistics for the training data.

Saving the decision tree from within PROC SPLIT.

Scoring/validating with a test data set.

As a marketing analyst at a catalog company, you want to determine the inputs that best predict whether or not a

customer will make a purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named

SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The binary target

(PURCHASE) contains a formatted value of "Yes" if a purchase was made and a formatted value of "No" if a purchase

was not made.

Although there are 48 input variables available for predicting the target, only 17 inputs are used to construct the tree.

Note that AMOUNT is an interval target and ACCTNUM is an id variable; these variables are not suitable model inputs.

To demonstrate how to score a data set, a sample of customers is selected from the SAMPSIO.DMEXA1 training data

set.

Program

proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;

id acctnum;

var amount income homeval frequent recency age

domestic apparel;

class purchase(desc) marital ntitle gender telind

origin job statecod numcars edlevel;

run;

proc dmsplit data=dmbexa1 dmdbcat=catexa1

bins=30

chisq=2.00

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 74 75 76 77 78 79 80 81 ... 148