statistics and predictions in the saved tree.
speciﬁes that the split search should incorporate the proﬁt or loss function speciﬁed
in the DECISION statement. See the
“Incorporating Decisions, Proﬁt, and Loss”
section on page 38 for more information. The DECSEARCH option only works with
a categorical target.
names a data set created from the SAVE MODEL= option, or saved from the
Enterprise Miner Tree Desktop Application. When using the INMODEL option, the
INPUT, TARGET, FREQ and DECISION statements are prohibited.
Beginning with SAS 9.1, the MODEL= data set contains the name of the training and
validation data. The DATA= option is therefore unnecessary to resume training with
the same data as was used to create the saved tree (assuming the saved name of the
training data is still valid).
speciﬁes how a splitting rule handles an observation with missing values. Table
lists the available policies.
Missing Value Policies
assign the observation to the largest branch
assign the observation to each branch with a fractional frequency propor-
tional to the number of training observations in the branch
assign to the branch minimizing SSE among observations with missing
use missing values during the split search (default)
The default policy is USEINSEARCH. The MISSING= option in the INPUT state-
ment assigns a policy to the variables listed in the statement, and supersedes
the MISSING= option to the PROC ARBORETUM statement. See the
section on page 25.
If a surrogate rule can assign an observation to a branch, then it does, and the missing
value policy is ignored for the speciﬁc observation. Using the CODE statement for
a tree containing a rule with MISSING=DISTRIBUTE is an error. See the
section on page 45 for a complete description of the missing value options.
names one or more methods for adjusting the p-values used with the PROBCHISQ
and PROBF criteria. The following methods are available.
applies a Bonferroni adjustment after split is chosen.
CHAIDBEFORE applies Bonferroni adjustment before split is chosen.
adjusts for the number of ancestor splits.
The ARBORETUM Procedure
suppresses an adjustment that sometimes overrides CHAID.
suppresses all adjustments.
with any other method is an error. If the PADJUST= option is not speciﬁed, the
CHAIDBEFORE and DEPTH methods are used. The PADJUST= option is ignored
unless CRITERION= PROBCHISQ or PROBF. See the
“Adjusting p-Values for the
Number of Input Values and Branches”
section on page 43 for more information.
requests that the prior probabilities deﬁned in the DECISION statement be incorpo-
rated in the split search criterion for a categorical target. See the
section on page 37 for more information.
speciﬁes the number of input variables n to regard as independent when adjusting
-values for the number of inputs. PVARS=ALL speciﬁes all the input variables as
variables whose values are constant in the node being split, and ignores categorical
variables unless at least two values occur in more observations than speciﬁed in the
MINCATSIZE= option in the TRAIN statement. Consequently, the ARBORETUM
procedure may only search for rules using m ≤ N of the original N input variables.
The procedure will regard max((n/N )m, 1) of the m variables as independent. See
“Adjusting p-Values for the Number of Input Variables”
more detail. The default number n is 0, requesting no adjustment for the number of
requests that a split on an interval input equal the value of the observation, if the
value is an integer, or slightly less than the value if the value is not an integer. The
alternative is to split halfway between two data values. The SPLITBETWEEN option
requests the alternative.
requests that a split on an interval input be halfway between two data values. The
SPLITBETWEEN option is default. The SPLITATDATUM option is an alternative.
ASSESS < options > ;
The ASSESS statement speciﬁes a measure for evaluating trees, evaluates all subtrees
(with the original root), chooses a best one for each possible number of leaves, and
organizes the chosen ones in a sequence, beginning with the subtree consisting of the
root only, and ending with the largest tree consisting of all the nodes. (For assessment
measures LIFT and LIFTPROFIT, the subtrees are evaluated with measures ASE and
PROFIT, respectively. See the section
“Tree Assessment and the Subtree Sequence”
on page 49.)
The ARBORETUM procedure selects the best subtree in the sequence consistent with
the options in the ASSESS statement. A subsequent SUBTREE statement can change