The arboretum procedure

§ PROC DMNEURL: Approximation to PROC NEURAL

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	59/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 55 56 57 58 59 60 61 62 ... 148

– TARGET
– BSTDEC– – CONSEQ– – EVALUE
– TYPE
– STAGE
– PROF– – APROF– – LOSS– – ALOSS– – IC– – ROI– OUTSTAT=
ABSGCONV, ABSGTOL
GCONV, GTOL

PROC DMNEURL: Approximation to PROC NEURAL

(b) The –TYPE–=–PARMS– observations contain for each activation

function the

¾

·

parameter estimates. Here, the –MEAN–

variable contains the value for the optimization criterion and the

– STDEV– variable contains the accuracy value of the prediction.

OUT=SASdataset : speciﬁes an output data set generated by PROC DMNEURL

which contains the predicted values (posteriors) and residuals for all observa-

tions in the DATA= input data set.

Variables of the output data set:

Ö

Ò

values of all ID variables

– TARGET– (character) name of the target

– STAGE– number of stage

– P– predicted value (

)

– R– residual (

)

The following variables are added if a DECISION statement is used:

– BSTDEC–

– CONSEQ–

– EVALUE– expected proﬁt or cost value

expected values for all decision variables

The number of observations in the OUT= data set agrees with that of the

DATA= input data set.

TESTOUT=SASdataset :

speciﬁes an output data set which is in structur identical to the OUT= output

data set but relates to the information given in the TESTDATA= input data set

rather than that of the DATA= input data set used in the OUT= output data set.

The number of observations in the TESTOUT= data set agrees with that of the

TESTDATA= input data set.

OUTFIT=SASdataset :

speciﬁes an output data set generated by PROC DMNEURL which contains

a number of ﬁt indices for each stage and for the ﬁnal model estimates. For

a binary target (response variable) it also contains the frequencies of the

¢

¾

accuracy table of the best ﬁt at the ﬁnal stage. The same information is

additionally provided if a TESTDATA= input data set is speciﬁed.

Variables of the output data set:

– TARGET– (character) name of the target

– DATA– (character) speciﬁes the data set to which the ﬁt criteria correspond:

=TRAINING: ﬁt criteria belong to DATA= input data set =TESTDATA:

ﬁr criterai belong to TESTDATA= input data set

– TYPE– (character) describes type of observation

– TYPE–=– FITIND– for ﬁt indices;

– TYPE–=– ACCTAB– for frequencies of accuracy table (only for bi-

nary target)

Purpose of PROC DMNEURL

– STAGE– number of stages in the estimation process

– SSE– sum-of-squared error of solution

– RMSE– root mean squared error of solution

– ACCU– percentage of accuracy of prediction (only for categorical target)

– AIC– Akaike information criterion

– SBC– Schwarz’ information criterion

The following variables are added if a DECISION statement is used:

– PROF–

– APROF–

– LOSS–

– ALOSS–

– IC–

– ROI–

OUTSTAT=SASdataset :

speciﬁes an output data set generated by PROC DMNEURL which contains all

eigenvalues and eigenvectors of the

matrix. When this option is speciﬁed,

no other computations are performed and the procedure terminates after writing

this data set.

Variables of the OUTSTAT= output data set:

– TYPE– (character) type of observation

– EIGVAL– contains different numeric information

variables in the model; the ﬁrst variables correspond to CLASS

(categorical) the remaining variables are continuously (interval or ratio)

scaled. Note, that for nonbinary CLASS (nominal or ordinal categorical)

variables a set of binary dummy variables is created. In those cases the

preﬁx of variable names

Ö

Ò

used for a group of variables in the

data set may be the same for a successive group of variables which differs

only by a numeric sufﬁx.

Observations of the OUTSTAT= output data set:

1. The ﬁrst three observations, –TYPE–=–V–MAP– and –TYPE–=–C–MAP–,

contain the mapping indices between the variables used in the model and

the number of the variables in the data set. The –EIGVAL– variable

contains the number of index mappings. This is the same information

as in the ﬁrst observation of the OUTEST= data set, except that here

the –TYPE–=–EIGVAL– variables replaces the –TYPE–=–MEAN–

variable in the OUTEST= data set.

2. The –TYPE–=–EIGVAL– observation contains the sorted eigenvalues of

the

matrix.

3. The –TYPE–=–EIGVEC– observations contain a set of

eigenvectors

of the

matrix. Here, the –EIGVAL– variable contains the eigen-

value to which the eigenvector corresponds.

PROC DMNEURL: Approximation to PROC NEURAL

ABSGCONV, ABSGTOL :

speciﬁes an absolute gradient convergence criterion for the default

(OPTCRIT=SSE) optimization process.

See the document of PROC NLP

in SAS/OR for more details. Default is ABSGCONV=5e-4 in general and

ABSCONV=1e-3 for FUNCTION=EXP.

CORRDF : speciﬁes that the correct number of degrees of freedom is used for the

values of RMSE, AIC, and SBC. Without specifying CORRDF the error de-

grees of freedom are computed as

, where

is the sum of weights

(if the WEIGHT statement is not used, each observation has a weight of 1 as-

signed, and

is the total number of observations) and

is the number of

parameters. When CORRDF is speﬁﬁed the value

is replaced by the rank of

the joint Jacobian.

COV, CORR : speciﬁes that a covariance or correlation matrix is used for comput-

ing eigenvalues and eigenvectors compatible with the PRINCOMP procedure.

The COV and CORR options are valid only if an OUTSTAT= data set is speci-

ﬁed. If neither COV nor CORR are speciﬁed, the eigenvalues and eigenvectors

of the cross product matrix

are computed and written to the OUTSTAT=

data set.

CRITWGT=r :

speciﬁes a positive weight for a weighted least squares ﬁt. Currently this option

is valid only for binary target. Values of

½

will enforce a better ﬁt of the

(1,1) entry in the accuracy table which may be useful for ﬁtting rare events.

Values of

Ö

½

will enforce a better ﬁt of the (0,0) entry in the accuracy

table. Note, that values for

which are far away from

will reduce the ﬁt

quality of the remaining entries in the frequency table. At this time values of

either

are preferred.

CUTOFF=r :

speciﬁes a cutoff threshold for deciding when a predicted value of a binary

response is classiﬁed as 0 or 1. The default is

ÙØÓ

. If the value of

the posterior,

´Ý

µ

, for observation

is smaller the speciﬁed cutoff value, the

observation is counted in the ﬁrst column of the accuracy table (i.e. as 0),

otherwise it is counted in the second column (i.e. as 1). For nonbinary target

the cutoff= value is not used.

GCONV, GTOL :

speciﬁes a relative gradient convergence criterion for the optimization process.

See the document of PROC NLP in SAS/OR for more details. Default is

GCONV=1e-8.

FCRIT speciﬁes that the probability of the

test is being used for the selction of

principal components rather than the default

criterium.

MAXCOMP=i :

speciﬁes an upper bound for the number of components selected for predicting

the target in each stage. Good values for MAXCOMP are inbetween 3 and 5.

Note, that the computer time and core memory will increase superlinear for

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 55 56 57 58 59 60 61 62 ... 148