WORTH=

Specifies worth

required of

splitting rule.

**Required Arguments**
**DATA=***SAS-data-set*
Names the input training data set if constructing a tree. Variables named in the FREQ, INPUT,

and TARGET statements refer to variables in the DATA= SAS data set.

**Default:**

None

**DMDBCAT=***SAS-catalog*
Names the SAS catalog describing the DMDB metabase. The DMDB metabase contains the

formatted values of all NOMINAL variables, and how they are coded in the DATA= SAS data set.

Required with the DATA= option.

**Default:**
None

To learn how to create the DMDB encoded data set and catalog, see the PROC DMDB chapter.

**Options**
**ASSESS=**
Specifies how to evaluate a tree. The construction of the sequence of sub-trees uses the assessment

measure. Possible measures are:

IMPURITY

Total leaf impurity (Gini index or Average Squared Error ).

LIFT

Average assessment in highest ranked observations.

PROFIT

Average profit or loss from the decision function.

STATISTIC

Nominal Classification Rate or Average Squared Error.

**Default:**
PROFIT

The default PROFIT measure is set to STATISTIC if no DECISION

statement is specified.

LIFT restricts the default PROFIT or STATISTIC measure to those

observations predicted to have the best assessment. The LIFTDEPTH=

option specifies the proportion of observations to use.

If ASSESS=IMPURITY, then the assessment of the tree is measured as the

total impurity of all its leaves. For interval targets, this is the same as using

Average Squared Error (ASSESS=STATISTIC).

For categorical targets, the impurity of each leaf is evaluated using the Gini

index. The impurity measure produces a finer separation of leaves than a

classification rate and is, therefore, preferable for lift charts. ASSESS=LIFT

generates the sequence of sub-trees using ASSESS=IMPURITY and then

prunes using the LIFT measure.

ASSESS=IMPURITY implements class probability trees as described in

Brieman et al., section 4.6 (1984).

**COSTSPLIT**

Requests that the split search criterion incorporate the decision matrix. To use COSTSPLIT,

CRITERION must equal ENTROPY or GINI, and the type of the DECDATA data set must be

PROFIT or LOSS. For ordinal targets, COSTSPLIT is superfluous because the decision matrix is

always incorporated into the criterion.

**CRITERION=***method*

Specifies the method of searching for and evaluating candidate splitting rules. Possible methods

depend on the level of measurement appropriate for the target variable, as follows:

BINARY or NOMINAL TARGETS:

Method=CHISQ

Pearson Chi-square statistic for target vs. segments.

Method=PROBCHISQ

*p*-value of Pearson Chi-square statistic for target vs. segments. Default for

NOMINAL.

Method=ENTROPY

Reduction in entropy measure of node impurity.

Method=ERATIO

Reduction in entropy of split.

Method=GINI

Reduction in Gini measure of node impurity.

INTERVAL TARGETS