The arboretum procedure

Yüklə 3,07 Mb.

ölçüsü3,07 Mb.
1   ...   120   121   122   123   124   125   126   127   ...   148

The SPLIT Procedure

DESCRIBE Statement

Generates the output of a simple description of the rules that define each leaf, along with a few

statistics. The description is easier to understand than the equivalent information output using the

CODE statement.

DESCRIBE <option(s)>;



Specifies the file name that contains the description.



FORMAT= format

Specifies the format to be used in the DATA step code for numeric values that don't have a format

from the input data set.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The SPLIT Procedure

FREQ Statement

Specifies the frequency variable.

FREQ variable;



Names a variable that provides frequencies for each observation in the DATA= data set. If n is the

value of the FREQ variable for a given observation, then that observation is used n times.


If the value of the FREQ variable is missing or less than 0, then the

observation is not used in the analysis. The values for FREQ variables are

never truncated.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The SPLIT Procedure

INPUT Statement

Names input variables with common options.

Tip: Multiple INPUT statements can be used to specify input variables of a different type and order.

INPUT | IN variable-list </ option(s)>;


The following options are available:

Input Statement Options
















Note:   Interval variables have numeric values, so an average of two values is another meaningful value.

Values of an ordinal variable represent an ordering, but, unlike interval variables, an average of ordinal

values is not meaningful. For example, taking an average of ages 15 and 20 is another meaningful age;

but taking an average of "TEENAGER" and "YOUNG ADULT" is not meaningful.

Values of an ordinal variable can be defined either by their formatted values (ORDER= ASCENDING |

DESCENDING), or by their unformatted values (ORDER= ASCFORMATTED | DESFORMATTED),

or by their order of appearance in the training data (ORDER=DSORDER). The unformatted values can

be either numeric or character. When the unformatted value determines the order, the smallest

unformatted value for a given formatted value represents that formatted value.

The ORDER= option is only allowed for ordinal variables. Values of a nominal variable have no implicit

ordering. Typical nominal inputs are GENDER, GROUP, and JOBCLASS.

A splitting rule based on a nominal input is usually free to assign any subset of categories to any subset

of the node. The number of ways to assign the categories becomes very large if there are many

categories compared to the number of node subsets. For LEVEL=NOMINAL, values are defined by the

formatted value.  

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

The SPLIT Procedure

PRIORS_Statement__Specifies_the_prior_probabilities_of_the_values_of_a_nominal_or_ordinal_target.__Tip'>PRIORS Statement

Specifies the prior probabilities of the values of a nominal or ordinal target.

Tip: A prior probability for a value of a target represents the proportion in which that value

appears in the data to which the tree-model is intended to apply.

Caution: The PRIORS statement is not allowed if a DECISION statement is used; instead use the

PRIORVAR= option to specify prior probabilities. The PRIORS statement is not valid for

an interval target and will result in an error if used.

PRIORS probabilities;

Required Arguments

Probabilities can be one of the following:


Specifies that the proportions are the same as in the training data.


Specifies equal proportions.

'value-1'=probability-1 <...'value-n'=probability-n>

Specifies explicit probabilities.

value-1 ... value-n

Specifies each formatted value of the target; each value listed is followed by an equal sign.

Formatted values are enclosed in single quotes. All non-missing values of the target should

be included.

probability-1 ... probability-n

Specifies the probability that is a numeric constant between 0 and 1.




PRIORS '-1'=0.4 '0'=0.2 '1'=0.4;

This example specifies probabilities of 0.4, 0.2, and 0.4 for target values, -1, 0, and 1, respectively.

An error occurs if the training data contains other non-missing values of the target. The formatted

values depend on the format you choose. If the target uses a format of 5.2, then use: PRIORS

'-1.00'=0.4 '0.00'=0.2 '1.00'=0.4;

Dostları ilə paylaş:
1   ...   120   121   122   123   124   125   126   127   ...   148

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə