0; even the smallest Chi-Square values are eligible.
Specifies an optional output data set containing most of the output table information for the splits.
Specifies an upper bound for the number of passes through the input data set that are used for
performing the binary splits.
Integer > 0
PRINT | NOPRINT
Specifies whether or not to suppress all output printed in the Output window.
so that the variable is automatically used as a FREQ variable in PROC DMSPLIT. This also
ensures that the FREQ variable is automatically used by all other Enterprise Miner procedures
in the project.
Specifies one numeric (interval scaled) FREQUENCY variable.
Any integer. A rational value is truncated to the next integer.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
Specifies the target variable. One variable name can be specified identifying the target (response)
variable for the least squares and logistic regressions.
If a target is specified in PROC DMDB, it must not be specified in PROC DMSPLIT.
Specifies all the variables (numeric and categorical, that is, INTERVAL and CLASS) that can be
used for independent variables in the prediction or modeling of the target variable.
and so that the variable is used automatically as a WEIGHT variable in PROC DMSPLIT.
Specifies one numeric (interval scaled) variable that is used to weight the input variables.
For numeric variables, missing values are replaced by the (weighted) mean of the variable. For
categorical (CLASS) variables, missing values are treated as an additional category.
The following examples were executed on the Windows NT operating system; the version of the SAS
System was 6.12TS045.
Example 1: Creating a Decision Tree for a Binary Target with the DMSPLIT Procedure
Specifying the target and input variables.
Setting the number of categories in which the range of each interval variable is divided for splits.
Setting the number of passes the procedure makes to determine the optimum number of splits.
Setting the chi-square lower bound for evaluating the splits.
Importing the DMSPLIT tree to the SPLIT procedure.
Producing summary statistics for the training data.
Saving the decision tree from within PROC SPLIT.
Scoring/validating with a test data set.
As a marketing analyst at a catalog company, you want to determine the inputs that best predict whether or not a
customer will make a purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named
SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The binary target
(PURCHASE) contains a formatted value of "Yes" if a purchase was made and a formatted value of "No" if a purchase
was not made.
Although there are 48 input variables available for predicting the target, only 17 inputs are used to construct the tree.
Note that AMOUNT is an interval target and ACCTNUM is an id variable; these variables are not suitable model inputs.
To demonstrate how to score a data set, a sample of customers is selected from the SAMPSIO.DMEXA1 training data
proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;
var amount income homeval frequent recency age
class purchase(desc) marital ntitle gender telind
origin job statecod numcars edlevel;
proc dmsplit data=dmbexa1 dmdbcat=catexa1