Specifies the nodes that will have no children.
Integer > 0
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
Specifies input data that contains inputs and, optionally, targets.
Output data set with outputs.
Includes dummy variables for each node. For each observation the value of the dummy variables
is 1 if the observation appears in the node and 0 if it does not.
Specifies a list of nodes used to score the observations. If an observation does not fall into any
node list, it does not contribute to the statistics and is not output. If an observation occurs in more
than one node, it contributes multiple times to the statistics and is output once for each node it
The NODES= option requires the INTREE= or INDMSPLIT procedure
The default is the list of leaf nodes. Omitting the NODES= option results in
the decisions, utilities, and leaf assignment being output for each observation
in the DATA= data set.
Does not include lead identifiers or node numbers.
Output data set with fit statistics.
Specifies the role of the DATA= data set. The ROLE= option primarily affects what fit statistics
are computed and what their names and labels are. Role-value can be:
name in the SCORE statement.
VALID | VALIDATION
The default when DATA= data set name in the SCORE statement is the same as DATA=
data set name in the VALIDATA= option in the PROC statement.
data set name in the DATA= or VALIDATA= option in the PROC statement.
Residuals, computed profit, and fit statistics are not produced.
Specifies the variable that the model-fitting tries to predict.
Specifies the measurement level, where measurement can be:
Observations in which the target value is missing are ignored when training or validating the tree.
If EXCLUDEMISS is specified, then observations with missing values are excluded during the search
for a splitting rule. A search uses only one variable, and so only the observations missing on the single
candidate input are excluded. An observation missing input x but not missing input y is used in the
search for a split on y but not x. After a split is chosen, the rule is amended to assign missing values to
the largest branch.
If EXCLUDEMISS is not specified, the search for a split on an input treats missing values as a special,
acceptable value, and includes them in the search. All observations with missing values are assigned to
the same branch.
The branch may or may not contain other observations. The branch chosen is the one that maximizes the
For splits on a categorical variable, this amounts to treating a missing value as a separate category. For
numerical variables, it amounts to treating missing values as having the same unknown non-missing
One advantage of using missing data during the search is that the worth of split is computed with the
values with the target values can contribute to the predictive ability of the split. One disadvantage is that
missing values could unjustifiably dominate the choice of split.
When a split is applied to an observation in which the required input value is missing, surrogate splitting
rules are considered before assigning the observation to the branch for missing values.
A surrogate splitting rule is a backup to the main splitting rule. For example, the main splitting rule
might use county as input and the surrogate might use region. If the county is unknown and the region is
known, the surrogate is used.
If several surrogate rules exist, each surrogate is considered in sequence until one can be applied to the
observation. If none can be applied, the main rule assigns the observation to the branch designated for
The surrogates are considered in the order of their agreement with the main splitting rule. The agreement
is measured as the proportion of training observations it and the main rule assign to the same branch. The
measure excludes the observations that the main rule cannot be applied to. Among the remaining
observations, those on which the surrogate rule cannot be applied count as observations not assigned to
the same branch. Thus, an observation with a missing value on the input used in the surrogate rule but