14
The ARBORETUM Procedure
Changing a Splitting Rule
The following PROC ARBORETUM code changes the rule for node 7 to assign
‘MEN’S DRESS’ to the first branch along with ‘MEN’S CASUAL’:
proc arboretum data=sashelp.shoes inmodel=tree2 ;
interact pruned;
prune node=7;
split node=7 var=product /
"Men’s casual" "Men’s dress",
"Women’s casual" "Women’s dress"
;
save summary=sum3
node=7
rules=rules3
;
run;
proc print data=rules3;
where role = ’PRIMARY’;
run;
The INTERACT statement declares the start of interactive training statements to
modify the tree. The PRUNED option instructs the ARBORETUM procedure to
use the five-leaf subtree and to delete the nodes that only occur in larger subtrees.
The PRUNE statement removes the branches from node 7, converting node 7 to a
leaf. The SPLIT statement specifies the new splitting rule for node 7. The values
of Product appear after the slash (’/’) character. The comma separates values as-
signed to different branches. The NODE= option to the SAVE statement modifies the
RULES= option to only save the rules in node 7.
Figure 6
shows the primary splitting
rule saved using the RULES= option in the SAVE statement. The SUM3 data set (not
shown) contains the R-square for the modified tree. It is 0.46, less than the R-square
for the unmodified tree.
NUMERIC_
CHARACTER_
Obs
NODE
ROLE
RANK
STAT
VALUE
VALUE
1
7
PRIMARY
1
VARIABLE
.
Product
2
7
PRIMARY
1
MISSING
1
3
7
PRIMARY
1
BRANCHES
2
4
7
PRIMARY
1
NOMINAL
1
MEN’S CASUAL
5
7
PRIMARY
1
NOMINAL
1
MEN’S DRESS
6
7
PRIMARY
1
NOMINAL
2
WOMEN’S CASUAL
7
7
PRIMARY
1
NOMINAL
2
WOMEN’S DRESS
Figure 6.
Splitting Rule for Node 7
Syntax
15
Syntax
The following statements are available in PROC ARBORETUM:
PROC ARBORETUM < options >;
DECISION DECDATA=SAS-data-set < options > ;
FREQ variable ;
INPUT variables < / options > ;
TARGET variable < / options > ;
PERFORMANCE < options > ;
INTERACT < subtree > ;
BRANCH < options > ;
PRUNE NODES=list < options > ;
SEARCH < options > ;
SETRULE NODE=id VAR=variable < options > ;
SPLIT < options > ;
TRAIN < options > ;
UNDO ;
REDO ;
ASSESS < options > ;
CODE < options > ;
DESCRIBE < options > ;
MAKEMACRO NLEAVES=macname ;
SAVE < options > ;
SCORE < options > ;
SUBTREE subtree ;
The following table summarizes the function of each statement (other than the PROC
statement) in the ARBORETUM procedure:
Table 1.
Statements in the ARBORETUM Procedure
Statement
Description
DECISION
specify profits and prior probabilities
FREQ
specify a frequency variable
INPUT
specify input variables with common options
TARGET
specify the target variable
PERFORMANCE specify memory size and where to locate data
INTERACT
declare start of interactive training
BRANCH
create branches from candidate splitting rules
PRUNE
prune the descendents of a node
SEARCH
search for candidate splitting rules
SETRULE
specify a candidate splitting rule
SPLIT
search for a splitting rule and create branches
TRAIN
find splitting rules and branch recursively
UNDO
undo the previous interactive training operation
REDO
redo the action the previous UNDO statement undid
ASSESS
evaluate subtrees and declare beginning of results
16
The ARBORETUM Procedure
Table 1.
(continued)
Statement
Description
CODE
generate SAS DATA step code for scoring new cases
DESCRIBE
print description of rules defining each leaf
MAKEMACRO
define a macro variable
SAVE
output data sets containing model results
SCORE
use the model to make predictions on new data
SUBTREE
specify which subtree to use
The rest of this section gives detailed syntax information for each of these statements,
beginning with the PROC ARBORETUM statement. The remaining statements are
covered in alphabetical order.
PROC ARBORETUM Statement
PROC ARBORETUM < options > ;
The PROC ARBORETUM statement starts the ARBORETUM procedure. Either
the DATA= option or the INMODEL= option must appear. The DATA= option must
appear to begin or resume training a model. The INMODEL= option specifies a
previously saved model. Any option available in the TRAIN statement is available
here also.
CRITERION=name
specifies the criterion for evaluating candidate splitting rules. Table
2
summarizes the
criteria available for each level of measurement of the target variable.
Table 2.
Split Search Criteria
Criterion
Measure of Split Worth
Criteria for Interval Targets
VARIANCE
reduction in square error from node means
PROBF
p
-value of F test associated with node variances (default)
Criteria for Nominal Targets
ENTROPY
Reduction in entropy
GINI
Reduction in Gini index
PROBCHISQ
p
-value of Pearson chi-square for target vs. branches (default)
Criteria for Ordinal Targets
ENTROPY
Reduction in entropy, adjusted with ordinal distances
GINI
Reduction in Gini index, adjusted with ordinal distances (default)
The default criterion is PROBF for an interval target, PROBCHISQ for a nominal
target, and GINI for an ordinal target. See the
“Splitting Criteria”
section beginning
on page 38 for more information.
DATA= SAS-data-set
specifies training data. If the INMODEL= option is specified to input a saved tree,
the DATA= option causes the ARBORETUM procedure to recompute all the node