ASSESS Statement
19
the selection. The output statements, CODE, DESCRIBE, SAVE, and SCORE, use
the subtree selected in the most recent ASSESS or SUBTREE statement, and ignore
nodes not in the selected subtree.
The ASSESS, SUBTREE, and output statements end initialization and interactive
training. If the SUBTREE statement or an output statement is the first statement after
initialization or interactive training, an ASSESS statement is implied. The procedure
computes a new subtree sequence using the most recently specified ASSESS state-
ment options, and selects the best subtree before executing the SUBTREE or output
statement.
If the ASSESS, SUBTREE, or output statement immediately follows initialization
so that no interactive training statements appear, and the tree contains no more than
the root node, then these statements will create a tree. Otherwise, if the root node is
already split using information imported using the INMODEL= option in the PROC
statement, then further split searches will not occur unless explicitly requested with
interactive training statements.
Table
4
summarizes the options available in the ASSESS statement. An option re-
mains in effect in subsequent occurrences of the ASSESS statement unless explicitly
specified differently.
Table 4.
Assess Statement Options
Option
Description
EVENT=
specifies categorical target value for LIFT
MEASURE=
specifies the assessment measure
NOPRIORS
ignores prior probabilities in subtree search
PROPORTION=
specifies proportion of observations for LIFT
PRUNEDATA=
specifies training or validation data for choosing subtrees
PRIORS
incorporates prior probabilities in subtree search
VALIDATA=
specifies validation data set
NOVALIDATA
terminates a previous VALIDATA= option
The following list describes these options. See the
“Tree Assessment and the Subtree
Sequence”
section beginning on page 49 for more detail.
EVENT= category
specifies a formatted value of a categorical target to use with the LIFT assessment
measure. If the EVENT= option is absent in one ASSESS statement, the last value
specified in any ASSESS statement is used. If the EVENT= option has never been
specified, the least frequent target value in the training data is used. The EVENT
option is ignored with an interval target and with other assessment measures.
MEASURE=PROFIT | ASE | MISC | LIFT | LIFTPROFIT
specifies the assessment measure. Table
5
summarizes the available measures.
20
The ARBORETUM Procedure
Table 5.
Assessment Measures
Measure
Description
ASE
Average square error
LIFT
Average or proportion among the highest ranked observations
LIFTPROFIT
Average profit or loss among the highest ranked observations
MISC
Proportion misclassified
PROFIT
Average profit or loss from the decision function
The default measure is PROFIT if the DECISION statement specifies a profit or loss
function or if the target variable is ordinal. Otherwise the default measure for a nom-
inal target is MISC, and the default for an interval target is ASE. MISC is applicable
to nominal and ordinal targets. ASE is applicable to any kind of target.
For an interval target, the LIFT measure is the average target value among observa-
tions predicted to have the highest average. The PROPORTION= option specifies
the proportion of observations to use. For a categorical target, the LIFT measure is
the proportion of observations with the target value specified in the EVENT= op-
tion among observations with the highest posterior probability of the EVENT= target
value.
NOPRIORS | PRIORS
specifies whether to ignore prior probabilities when creating the sequence of subtrees.
The default is NOPRIORS, ignoring prior probabilities. The section
“Formulas for
Assessment Measures”
on page 50 describes how prior probabilities enter into the
formulae for evaluating subtrees.
PROPORTION=value
The PROPORTION= option specifies the proportion of observations to use with the
LIFT and LIFTPROFIT assessment measures. The PROPORTION= option is ig-
nored unless LIFT or LIFTPROFIT is specified. The value must be between 0 and
1. If absent, the most recent value specified with LIFT or LIFTPROFIT is used.
Requesting LIFT or LIFTPROFIT without ever specifying the PROPORTION= op-
tion is an error.
PRUNEDATA= VALID | TRAIN
specifies whether to use training or validation data when evaluating subtrees for in-
clusion in the subtree sequence. The default is VALID. If PRUNEDATA=VALID and
validation data exists, then the subtree chosen for a given number of leaves is one
with the best assessment value using the validation data.
NOVALIDATA | VALIDATA= SAS-data-set
specifies the validation data set.
The NOVALIDATA option nullifies any
VALIDATA= option appearing in a previous ASSESS statement.
BRANCH Statement
BRANCH < options > ;
The BRANCH statement is an interactive training statement that splits leaves into
branches using the primary candidate splitting rule defined in the leaves.
The
CODE Statement
21
SETRULE and SEARCH statements create candidate rules. The PRUNE statement
converts primary and competing rules to candidate rules when converting a node to a
leaf. The BRANCH statement will not split a leaf without a candidate rule.
NODES=nodeids
restricts the creation of branches to leaves descendent to nodes in the list of node
identifiers, nodeids.
ONE
restricts branching to the one leaf with the best candidate splitting rule. If a list of
nodes is specified in the NODES= option, the ONE option only considers the leaves
descendent to nodes in the list.
CODE Statement
CODE < options > ;
The CODE statement generates SAS DATA step code that mimics the computations
done by the SCORE statement. The DATA step code creates the same variables de-
scribed in the section
“SCORE Statement OUT= Output Data Set”
on page 59. Using
the CODE statement for a tree containing a rule with MISSING=DISTRIBUTE is an
error.
CATALOG= catname | FILE= filename
specifies where to output the code. Catname specifies a catalog entry by providing a
compound name with one to four of the levels in the form, library.catalog.entry.type.
The default library is determined by the SAS system option USER=, usually WORK.
The default entry is SASCODE, and the default type is SOURCE. Filename specifies
the name of the file to contain the code. Filename can be either:
1. A quoted string, the value of which is the name (including the extension, if
any) of the file to be opened.
2. An unquoted SAS name of no more than eight characters. If this name has
been assigned as a fileref in a FILENAME statement, the file specified in the
FILENAME statement is opened. The special filerefs LOG and PRINT are
always assigned. If the specified name is not an assigned fileref, the specified
value is concatenated with the extension .txt before opening. For example, if
FOO is not an assigned fileref, FILE=FOO would cause FOO.txt to be opened.
If the name has more than eight characters, an error message is printed.
If no catalog or file is specified, then the code is output to the SAS log.
DUMMY | NODUMMY
requests creation of a dummy variable for each leaf node. The variables have names,
–i–, for i = 1, 2, ..., L, where L is the number of leaves. The value of the dummy
variable –i– is 1 for observations assigned exclusively to leaf i, and 0 for observa-
tions not in leaf i. For observations distributed over more than one leaf, –i– equals
the proportion of the observation assigned the leaf i. The default is NODUMMY,
suppressing the creation of dummy variables.
Dostları ilə paylaş: |