The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	24/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 20 21 22 23 24 25 26 27 ... 148

SCORE Statement OUT= Output Data Set
Variable Names and Conditions for Their Creation
Decision Variables
Leaf Assignment Variables

Variable Names and Conditions for Their Creation

59

Table 8.

(continued)

STAT

NUMERIC–VALUE

CHARACTER–VALUE

LABEL

Variable label

MISSING

Branch

‘MISSING VALUES ONLY’, or

blank

WORTH

worth, or − log

(p)

blank

AGREEMENT

agreement

BRANCHES

number of branches

CUTPOINT

split value of interval

BRANCH

branch

formatted category value

ORDER

branch

branch in interval surrogate

SCORE Statement OUT= Output Data Set

The OUT= option in the SCORE statement creates a data set by appending new vari-

ables to the data set speciﬁed in the DATA= option. Which new variables appear

depends on other options in the SCORE statement, the level of measurement of the

target variable, and whether a proﬁt or loss function is speciﬁed in the DECISION

statement.

Variable Names and Conditions for Their Creation

The names of all the possible new variables are listed in Table

.

Table 9.

New Variables in the OUT= Data Set

Variable

Description

Target

Other

Variables for Prediction

F–name

actual, formatted category

yes

I–name

predicted, formatted category

P–namevalue

predicted value

R–namevalue residual from the prediction

yes

U–name

predicted, unformatted category

V–namevalue

predicted value computed with validation data

–WARN–

indications of problems with the prediction

Variables for Decisions

DECDATA= type

BL–name–

best possible loss from any decision

yes

LOSS

BP–name–

best possible proﬁt from any decision

yes

PROFIT, REVENUE

CL–name–

loss computed from the target value

yes

LOSS

CP–name–

proﬁt computed from the target value

yes

PROFIT, REVENUE

D–name–

label of the chosen decision alternative

any

EL–name–

expected loss from the chosen decision

LOSS

EP–name–

expected proﬁt from the chosen decision

PROFIT, REVENUE

IC–name–

investment cost

REVENUE

ROI–name–

return on investment

yes

REVENUE

Variables for Leaf Assignment

Option

–i–

proportion of the observation in leaf i

DUMMY

–LEAF–

leaf identiﬁcation number

LEAF

–NODE–

node identiﬁcation number

LEAF

The ARBORETUM Procedure

The names of most of these variables incorporate the name of the target variable. For

a categorical target variable, namevalue represents the name of the target concate-

nated with a formatted target value. For example, a categorical target variable named

Response, with values ‘0’ and ‘1’, will generate new variables, P–Response0 and

P–Response1. For an interval target, namevalue simply represents the name of

the target. For example, an interval target variable, Sales, will generate the variable

P–Sales.

The NOPREDICTION option to the SCORE statement suppresses the creation of

the prediction and decision variables. Otherwise, the conditions necessary for cre-

ating these variables are as follows. Variables P–namevalue and –WARN– are

always created. Variables I–name and U–name appear when the target is categor-

ical. When ROLE=TRAIN, VALID, or TEST, the DATA= data set must contain the

target variable, and the OUT= data set will contain R–namevalue and, for a categor-

ical target, F–name. The V–namevalue variable is created if validation data was

used during the creation of the tree.

When decision alternatives are speciﬁed in the DECVARS= option in the DECISION

statement, the variable D–name– is created, as is either EL–name– or EP–name–

depending on whether or not the type of the DECDATA= data set is LOSS or PROFIT,

respectively. If the type is REVENUE, then variables IC–name– and ROI–name–

are also created.

When ROLE=TRAIN, VALID, or TEST, either the variables

BL–name– and CL–name–, or the variables BP–name– and CP–name–, are

created.

Decision Variables

The labels of the variables speciﬁed in the DECVARS= option in the DECISION

statement are the names of the decision alternatives. For a variable without a label, the

name of the decision alternative is the name of the variable. The variable D–name–

in the OUT= data set contains the name of the decision alternative assigned to the

observation.

Leaf Assignment Variables

Each node is uniquely identiﬁed with a positive integer. Once an identiﬁcation num-

ber is assigned to a node, the number is never reassigned to another node, even after

the node is pruned. Consequently, most subtrees in the subtree sequence will not have

consecutive node identiﬁers.

Each leaf has a leaf identiﬁcation number in addition to the node identiﬁer. The leaf

identiﬁers range from 1 to the number of leaves. The leaf numbers are reassigned

whenever a new subtree is selected from the subtree sequence.

For an observation in the OUT= data set assigned to a single leaf, the variables

–NODE– and –LEAF– contain the node and leaf identiﬁcation numbers, respec-

tively. For an observation assigned to more than one leaf, the variables –NODE–

and –LEAF– contain missing values. An observation is assigned to more than one

leaf when the observation is missing a value required by one of the splitting rules,

and the MISSING=DISTRIBUTE option in the INPUT statement for the required

variable dictates that the observation be distributed among the branches.

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 20 21 22 23 24 25 26 27 ... 148