WORTH=
Specifies worth
required of
splitting rule.
Required Arguments
DATA=SAS-data-set
Names the input training data set if constructing a tree. Variables named in the FREQ, INPUT,
and TARGET statements refer to variables in the DATA= SAS data set.
Default:
None
DMDBCAT=SAS-catalog
Names the SAS catalog describing the DMDB metabase. The DMDB metabase contains the
formatted values of all NOMINAL variables, and how they are coded in the DATA= SAS data set.
Required with the DATA= option.
Default:
None
To learn how to create the DMDB encoded data set and catalog, see the PROC DMDB chapter.
Options
ASSESS=
Specifies how to evaluate a tree. The construction of the sequence of sub-trees uses the assessment
measure. Possible measures are:
IMPURITY
Total leaf impurity (Gini index or Average Squared Error ).
LIFT
Average assessment in highest ranked observations.
PROFIT
Average profit or loss from the decision function.
STATISTIC
Nominal Classification Rate or Average Squared Error.
Default:
PROFIT
The default PROFIT measure is set to STATISTIC if no DECISION
statement is specified.
LIFT restricts the default PROFIT or STATISTIC measure to those
observations predicted to have the best assessment. The LIFTDEPTH=
option specifies the proportion of observations to use.
If ASSESS=IMPURITY, then the assessment of the tree is measured as the
total impurity of all its leaves. For interval targets, this is the same as using
Average Squared Error (ASSESS=STATISTIC).
For categorical targets, the impurity of each leaf is evaluated using the Gini
index. The impurity measure produces a finer separation of leaves than a
classification rate and is, therefore, preferable for lift charts. ASSESS=LIFT
generates the sequence of sub-trees using ASSESS=IMPURITY and then
prunes using the LIFT measure.
ASSESS=IMPURITY implements class probability trees as described in
Brieman et al., section 4.6 (1984).
COSTSPLIT
Requests that the split search criterion incorporate the decision matrix. To use COSTSPLIT,
CRITERION must equal ENTROPY or GINI, and the type of the DECDATA data set must be
PROFIT or LOSS. For ordinal targets, COSTSPLIT is superfluous because the decision matrix is
always incorporated into the criterion.
CRITERION=method
Specifies the method of searching for and evaluating candidate splitting rules. Possible methods
depend on the level of measurement appropriate for the target variable, as follows:
BINARY or NOMINAL TARGETS:
Method=CHISQ
Pearson Chi-square statistic for target vs. segments.
Method=PROBCHISQ
p-value of Pearson Chi-square statistic for target vs. segments. Default for
NOMINAL.
Method=ENTROPY
Reduction in entropy measure of node impurity.
Method=ERATIO
Reduction in entropy of split.
Method=GINI
Reduction in Gini measure of node impurity.
INTERVAL TARGETS