The SPLIT Procedure
DESCRIBE Statement
Generates the output of a simple description of the rules that define each leaf, along with a few
statistics. The description is easier to understand than the equivalent information output using the
CODE statement.
DESCRIBE <
option(s)>;
Options
FILE=quoted-filename
Specifies the file name that contains the description.
Default:
LOG
FORMAT= format
Specifies the format to be used in the DATA step code for numeric values that don't have a format
from the input data set.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPLIT Procedure
FREQ Statement
Specifies the frequency variable.
FREQ variable;
Options
variable
Names a variable that provides frequencies for each observation in the DATA= data set. If n is the
value of the FREQ variable for a given observation, then that observation is used n times.
Default:
If the value of the FREQ variable is missing or less than 0, then the
observation is not used in the analysis. The values for FREQ variables are
never truncated.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPLIT Procedure
INPUT Statement
Names input variables with common options.
Tip: Multiple INPUT statements can be used to specify input variables of a different type and order.
INPUT | IN variable-list <
/ option(s)>;
Options
The following options are available:
Input Statement Options
OPTIONS
VALUES
DEFAULT
LEVEL=
NOMINAL |
ORDINAL |
INTERVAL
INTERVAL
ORDER=
ASCENDING |
DESCENDING |
ASCFORMATTED
|DESFORMATTED
|DSORDER
ASCENDING
Note: Interval variables have numeric values, so an average of two values is another meaningful value.
Values of an ordinal variable represent an ordering, but, unlike interval variables, an average of ordinal
values is not meaningful. For example, taking an average of ages 15 and 20 is another meaningful age;
but taking an average of "TEENAGER" and "YOUNG ADULT" is not meaningful.
Values of an ordinal variable can be defined either by their formatted values (ORDER= ASCENDING |
DESCENDING), or by their unformatted values (ORDER= ASCFORMATTED | DESFORMATTED),
or by their order of appearance in the training data (ORDER=DSORDER). The unformatted values can
be either numeric or character. When the unformatted value determines the order, the smallest
unformatted value for a given formatted value represents that formatted value.
The ORDER= option is only allowed for ordinal variables. Values of a nominal variable have no implicit
ordering. Typical nominal inputs are GENDER, GROUP, and JOBCLASS.
A splitting rule based on a nominal input is usually free to assign any subset of categories to any subset
of the node. The number of ways to assign the categories becomes very large if there are many
categories compared to the number of node subsets. For LEVEL=NOMINAL, values are defined by the
formatted value.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The SPLIT Procedure
PRIORS_Statement__Specifies_the_prior_probabilities_of_the_values_of_a_nominal_or_ordinal_target.__Tip'>PRIORS Statement
Specifies the prior probabilities of the values of a nominal or ordinal target.
Tip: A prior probability for a value of a target represents the proportion in which that value
appears in the data to which the tree-model is intended to apply.
Caution: The PRIORS statement is not allowed if a DECISION statement is used; instead use the
PRIORVAR= option to specify prior probabilities. The PRIORS statement is not valid for
an interval target and will result in an error if used.
PRIORS probabilities;
Required Arguments
Probabilities can be one of the following:
PROPORTIONAL | PROP
Specifies that the proportions are the same as in the training data.
EQUAL
Specifies equal proportions.
'value-1'=probability-1 <...'value-n'=probability-n>
Specifies explicit probabilities.
value-1 ... value-n
Specifies each formatted value of the target; each value listed is followed by an equal sign.
Formatted values are enclosed in single quotes. All non-missing values of the target should
be included.
probability-1 ... probability-n
Specifies the probability that is a numeric constant between 0 and 1.
Default:
PROPORTIONAL
Example:
PRIORS '-1'=0.4 '0'=0.2 '1'=0.4;
This example specifies probabilities of 0.4, 0.2, and 0.4 for target values, -1, 0, and 1, respectively.
An error occurs if the training data contains other non-missing values of the target. The formatted
values depend on the format you choose. If the target uses a format of 5.2, then use: PRIORS
'-1.00'=0.4 '0.00'=0.2 '1.00'=0.4;