The ARBORETUM Procedure
speciﬁes the format to use in the DATA step code for numeric values that do not have
a format from the input data set. The default format is BEST20.
LINESIZE | LS= n
speciﬁes the line size for generated code. The default is 72. The permissible range is
64 to 254.
suppresses the creation of variables –NODE– and –LEAF– containing the node
and leaf identiﬁcation numbers of the leaf to which the observation is assigned. The
variables are created by default.
suppresses the code for computing predicted variables, such as P–:. The default is
PREDICTION, requesting such code.
requests XML output instead of SAS DATA step code.
requests the DATA step code to create variables, such as residuals, that require the
target variable. These variables are the ones with a “yes” in the “Target” column of
in the section
“Variable Names and Conditions for Their Creation”
that does not contain the target variable produces confusing notes and warnings. The
default is NORESIDUAL, suppressing the generation of the DATA step code for these
The DECISION statement speciﬁes decision functions and prior probabilities for cat-
egorical targets. The ARBORETUM procedure uses the term decision in the sense
of decision theory: a decision is one of a set of alternatives, each associated with a
function of posterior probabilities. For an observation i, a tree determines the deci-
whose associated function evaluates to the best value, E
(d). The interpreta-
DECDATA= data set is proﬁt, revenue, or loss. The SAS DATA step TYPE= option
species the data set type. If the DECDATA= data set has no type, the ARBORETUM
procedure assumes a type of proﬁt.
The following formulas deﬁne E
(d) and d
. The sum is over the J categorical target
values, and p
denotes the posterior probability of target value j for observation i.
, for target value j, decision d, is speciﬁed in the DECDATA=
is the cost of decision d for observation i, speciﬁed in
option is speciﬁed in the PROC ARBORETUM statement. However, the decision
functions determine a proﬁt or loss measure for assessing trees, and consequently
may greatly affect what nodes are pruned and omitted from the ﬁnal subtree. See the
“Tree Assessment and the Subtree Sequence”
section on page 49 for more informa-
tion about retrospective pruning.
FREQ, INPUT, and TARGET statements must appear before the DECISION state-
ment. The DECISION statement is optional. When the DECISION statement is
omitted, neither decision alternatives nor prior probabilities are deﬁned. Specifying
the DECISION statement and the INMODEL= option in the PROC statement is an
speciﬁes a list of cost constants and cost variables associated with the decision alter-
natives speciﬁed in the DECVARS= option. The ﬁrst cost in the list corresponds to
the ﬁrst alternative in the DECVARS= list, the second cost with the second alterna-
tive, and so on. The number of costs must equal the number of alternatives speciﬁed
in the DEVARS= list.
The costs specify the terms C
in the REVENUE formula for E
(d), and conse-
quently the COST= option requires a DECDATA= data set of type REVENUE.
A cost constant is a number specifying the same value to C
for all observations i. A
DATA= option in the PROC ARBORETUM statement. The value of this variable for
observation i is assigned to C
. The ARBORETUM procedure does not recognize
XYZ, and PQR: are invalid representations of lists of variables.
speciﬁes the input data set containing the decision coefﬁcients A
and prior proba-
must appear for each target value in the training data set speciﬁed in the DATA=
option of the PROC ARBORETUM statement.
speciﬁes the variables in the DECDATA= data set deﬁning the coefﬁcients, A
without a label, the name of the decision alternative is the name of the variable.
If the DECVARS= option is omitted, no decision functions are deﬁned.
speciﬁes the variable pvar in the DECDATA= data set that contains the prior prob-
abilities of categorical target values. The
section on page 6 deﬁnes
prior probabilities. Pvar must have nonnegative numeric values. The ARBORETUM
procedure rescales the values to sum to one, and ignores training observations with a
target value for which pvar equals zero.
Prior probabilities do not affect the creation of the tree unless the PRIORSSPLIT
option to the PROC ARBORETUM statement is speciﬁed. Prior probabilities affect
the posterior probabilities, and consequently affect the model predictions and assess-
The DESCRIBE statement causes the ARBORETUM procedure to output a simple
description of the rules that deﬁne each leaf, along with a few statistics. The descrip-
tion is much easier to understand than the equivalent information output using the
The options to the DESCRIBE statement have the same form and function as those
in the CODE statement.
CATALOG= catname | FILE= ﬁlename
speciﬁes where to output the description. See the
section on page
21 for more information.
speciﬁes the format to use in the description for numeric values that do not have a
format from the input data set. The default format is BEST20.
LINESIZE | LS= n
speciﬁes the line size for description. The default is 72. The permissible range is 64