The ARBORETUM Procedure
The SAVE statement speciﬁes the output data sets. The SUMMARY= option outputs
summary statistics to the data set SUM1.
shows the result of printing the
SUM1 data set. The sum of square errors produces an R-square of 0.57.
AVE SQ ERR
The SEQUENCE= option in the SAVE statement creates a SAS data set with statistics
for subtrees of every possible size. The number of leaves is stored in variable –NW–.
The variable –NW– is created in other SAS data mining procedures to represent the
complexity of a model. –NW– is an abbreviation for number of weights in a neural
shows the output from printing the number of leaves and the assessment
measure from the SEQUENCE= data set. The default assessment measure for an
interval target is the average square error. The ﬁrst observation shows the average
square error in the training data, before applying the tree. The ﬁrst ﬁve observations
show that the average square error decreases quickly as the number of leaves in the
subtree increases from 1 to 5. The error decreases more slowly as the number of
leaves increases from 6 to 16.
The following PROC ARBORETUM code selects the subtree with 5 leaves and saves
the node statistics and splitting rules in SAS data sets:
The INMODEL= option imports the information saved from the previous execution
of the ARBORETUM procedure, eliminating the need to respecify the training data
set or the variables or to re-create the tree. The SUBTREE statement selects the
subtree with ﬁve leaves. The NODESTATS= option in the SAVE statement saves
information about each node into data set NODES2. The RULES= option in the
SAVE statement saves all the splitting rules into data set RULES2.
Summary Statistics for Subtree 5
The R-square for the subtree with 5 leaves is 0.50, compared to 0.57 for the tree with
The NODES2 data set contains information about each node.
P–SALES contains the predicted sales amount for observations in the node. In this
example, P–SALES equals the average sales among observations in the SHOES
data set assigned to the node. The WHERE statement in the following code selects
the leaf nodes for printing, excluding the nonterminal nodes.
proc print data=nodes2;
var node leaf n p_sales;
where leaf ne .;
Information about Each Leaf
The RULES2 data set contains all the splitting rules in the tree, including the unused
competing rules, and the candidate rules in the leaves. The WHERE statement in the
following code selects the primary rules for printing, and