Default:
0; even the smallest Chi-Square values are eligible.
OUTVARS=SAS-data-set
Specifies an optional output data set containing most of the output table information for the splits.
PASSES=integer
Specifies an upper bound for the number of passes through the input data set that are used for
performing the binary splits.
Range:
Integer > 0
Default:
12
PRINT | NOPRINT
Specifies whether or not to suppress all output printed in the Output window.
Default:
NOPRINT
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
FREQ Statement
Alias: FREQUENCY
Tip: Specify the FREQ variable in PROC DMDB so that the information is saved in the catalog and
so that the variable is automatically used as a FREQ variable in PROC DMSPLIT. This also
ensures that the FREQ variable is automatically used by all other Enterprise Miner procedures
in the project.
FREQ variable;
Required Argument
variable
Specifies one numeric (interval scaled) FREQUENCY variable.
Range:
Any integer. A rational value is truncated to the next integer.
CAUTION:__If_a_target_is_specified_in_PROC_DMDB,_it_must_not_be_specified_in_PROC_DMSPLIT.'>CAUTION:
If the FREQ variable is specified in PROC DMDB, it must not be specified in PROC
DMSPLIT.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
TARGET Statement
Tip: One or more variables may be specified already in PROC DMDB.
TARGET variable;
Required Argument
variable
Specifies the target variable. One variable name can be specified identifying the target (response)
variable for the least squares and logistic regressions.
CAUTION:
If a target is specified in PROC DMDB, it must not be specified in PROC DMSPLIT.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
VARIABLE Statement
Alias: VAR
VARIABLE variable-list;
Required Argument
variable-list
Specifies all the variables (numeric and categorical, that is, INTERVAL and CLASS) that can be
used for independent variables in the prediction or modeling of the target variable.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
WEIGHT Statement
Alias: WEIGHTS
Tip: Specify the WEIGHT variable in PROC DMDB so that the information is saved in the catalog
and so that the variable is used automatically as a WEIGHT variable in PROC DMSPLIT.
WEIGHT variable;
Required Argument
variable
Specifies one numeric (interval scaled) variable that is used to weight the input variables.
CAUTION:
If the WEIGHT variable is specified in PROC DMDB, it must not be specified in
PROC DMSPLIT.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
Details
Missing Values
For numeric variables, missing values are replaced by the (weighted) mean of the variable. For
categorical (CLASS) variables, missing values are treated as an additional category.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
Examples
The following examples were executed on the Windows NT operating system; the version of the SAS
System was 6.12TS045.
Example 1: Creating a Decision Tree for a Binary Target with the DMSPLIT Procedure
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
Example 1: Creating a Decision Tree for a Binary Target
with the DMSPLIT Procedure
Features:
Specifying the target and input variables.
q
Setting the number of categories in which the range of each interval variable is divided for splits.
q
Setting the number of passes the procedure makes to determine the optimum number of splits.
q
Setting the chi-square lower bound for evaluating the splits.
q
Importing the DMSPLIT tree to the SPLIT procedure.
q
Producing summary statistics for the training data.
q
Saving the decision tree from within PROC SPLIT.
q
Scoring/validating with a test data set.
q
As a marketing analyst at a catalog company, you want to determine the inputs that best predict whether or not a
customer will make a purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named
SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The binary target
(PURCHASE) contains a formatted value of "Yes" if a purchase was made and a formatted value of "No" if a purchase
was not made.
Although there are 48 input variables available for predicting the target, only 17 inputs are used to construct the tree.
Note that AMOUNT is an interval target and ACCTNUM is an id variable; these variables are not suitable model inputs.
To demonstrate how to score a data set, a sample of customers is selected from the SAMPSIO.DMEXA1 training data
set.
Program
proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;
id acctnum;
var amount income homeval frequent recency age
domestic apparel;
class purchase(desc) marital ntitle gender telind
origin job statecod numcars edlevel;
run;
proc dmsplit data=dmbexa1 dmdbcat=catexa1
bins=30
chisq=2.00