The arboretum procedure

Sub-Tree Selection Methods

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	123/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 119 120 121 122 123 124 125 126 ... 148

possible:

Sub-Tree Selection Methods

Method

Description

ASSESSMENT

Best

assessment

value

LARGEST

The largest

tree in the

sequence

Largest

sub-tree

with no

more than n

leaves

Default:

ASSESSMENT

USEVARONCE

Specifies that no node is split on an input an ancestor is split on.

VALIDATA= SAS-data-set

Names the input SAS data set for validation.

WORTH=threshold

Specifies a threshold p-value for the worth of a candidate splitting rules. The measure of worth

depends on the CRITERION= method.

Range:

For a method based on p-values, the threshold is a maximum acceptable

p-value; for other criteria, the threshold is the minimal acceptable increase in

the measure of worth.

Default:

For a method based on p-values, the default is 0.20; for other criteria, the

default is 0.

The SPLIT Procedure

CODE Statement

Generates SAS DATA step code that generally mimics the computations done by the SCORE

statement.

CODE <option(s)>;

Options

DUMMY

Requests creation of a dummy variable for each leaf node. The value of the dummy variable is 1

for observations in the leaf and 0 for all other observations.

FILE=quoted-filename

Specifies the file name that contains the code.

Default:

LOG

FORMAT=format

Specifies the format to be used in the DATA step code for numeric values that don't have a format

from the input data set.

NOLEAFID

Suppresses the creation of the _NODE_ variable containing a numeric id of the leaf to which the

observation is assigned.

NOPRED

Suppresses the creation of predicted variables, such as P_*.

RESIDUAL

Requests code that assumes the existence of the target variable.

Default:

By default, the code contains no reference to the target variable (to avoid

confusing notes or warnings). The code computes values that depend on the

target variable (such as the R_*, E_*, F_*, CL_*, CP_*, BL_*, BP_*, or

ROI_* variables) only if the RESIDUAL option is specified.

The SPLIT Procedure

DECISION Statement

Specifies information used for decision processing in the DECIDE, DMREG, NEURAL, and

SPLIT procedures. This documentation applies to all four procedures.

Tip: The DECISION statement is required for the DECIDE and NEURAL procedures. It is optional

for the DMREG and SPLIT procedures.

DECISION DECDATA= SAS-data-set <DECVARS=decision-variable(s)><option(s)>;

DECDATA= SAS-data-set

Specifies the input data set that contains the decision matrix. The DECDATA= data set must

contain the target variable.

Note: The DECDATA= data set may also contain decision variables specified by means of the

DECVARS= option, and prior probability variable(s) specified by means of the PRIORVAR=

option or the OLDPRIORVAR= option, or both.

The target variable is specified by means of the TARGET statement in the DECIDE, NEURAL,

and SPLIT procedures or the MODEL statement in the DMREG procedure. If the target variable

in the DATA= data set is categorical then the target variable of the DECDATA= data set should

contain the category values, and the decision variables will contain the common consequences of

making those decisions for the corresponding target level. If the target variable is interval, then

each decision variable will contain the value of the consequence for that decision at a point

specified in the target variable. The unspecified regions of the decision function are interpolated

by a piecewise linear spline.

Tip:

The DECDATA= data set may be of TYPE=LOSS, PROFIT, or REVENUE.

If unspecified, TYPE= is assumed to be PROFIT by default. TYPE= is a data

set option that should be specified when the data set is created.

DECVARS=decision-variable(s)

Specifies the decision variables in the DECDATA= data set that contain the target-specific

consequences for each decision.

Default:

None

COST=cost-option(s)

Specifies numeric constants giving the cost of a decision, or variables in the DATA= data set that

contain the case-specific costs, or any combination of constants and variables. There must be the

same number of cost constants and variables as there are decision variables in the DECVARS=

option. In the COST= option, you may not use abbreviated variable lists such as D1-D3,

ABC--XYZ, or PQR:.

Default:

All costs are assumed to be 0.

CAUTION:

The COST= option may only be specified when the DECDATA= data set is of

TYPE=REVENUE.

PRIORVAR=variable

Specifies the variable in the DECDATA= data set that contains the prior probabilities to use for

making decisions. In the DECIDE procedure, if PRIORVAR= is specified, OLDPRIORVAR=

must also be specified.

Default:

None

OLDPRIORVAR=variable

Specifies the variable in the DECDATA= data set that contains the prior probabilities that were

used when originally fitting the model. If OLDPRIORVAR= is specified, PRIORVAR= must also

be specified.

CAUTION:

OLDPRIORVAR= is not allowed in PROC SPLIT.

Default:

None

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 119 120 121 122 123 124 125 126 ... 148