Summary of the Stepwise Selection Process
The Summary of Stepwise Procedure section provides the step number, the explanatory input or inputs entered or removed at each step,
the F statistic, and the corresponding p-value in which the entry or removal of the input is based. For this example, 8 of the 19 original
inputs met the 0.25 entry and stay probability values.
List Report of Selected Variables in the OUTEST= data set
The example PROC PRINT report of the OUTEST= data set lists selected fit statistics for the training and test data sets. The default
OUTEST data set contains two observations for each step number. These observations are distinguished by value of the _TYPE_ variable:
_TYPE_ = "PARMS" - contains parameter estimate statistics
q
_TYPE_= "T" - contains the t-value for the parameter estimate
q
Because a WHERE statement was used to select only values of TYPE = "PARMS", this report contains one observation per step number.
An additional observation is displayed that identifies the model chosen based on the SBC criterion (CHOOSE="SBC").
GPLOT Diagnostic Plots for the Scored Baseball Test Data
Plot of the log of salary versus the predicted log of salary.
Plot of the residual values versus the predicted log of salary.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The PROC DMREG statement invokes the procedure. The DATA= option identifies
the training data set that is used to fit the model. The DMDBCAT= option identifies
the training data catalog.
proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase
The TESTDATA= option identifies the test data set. The OUTEST= option
creates the output data set containing estimates and fit statistics.
testdata=sampsio.dmtbase outest=regest;
The CLASS statement specifies the categorical variables to be used in
the regression analysis.
class league division position;
The MODEL statement specifies the linear model. The ERROR=normal model
option specifies to use the normal error distribution. The CHOOSE=SBC model
option specifies to choose the model subset with the smallest Schwarz Bayesian
criterion.
model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb
yr_major cr_atbat cr_hits cr_home cr_runs
cr_rbi cr_bb league division position no_outs
no_assts no_error
/ error=normal
choose=sbc
The MODEL option SELECTION=STEPWISE specifies to use the stepwise variable
selection method. Stepwise selection systematically adds and deletes inputs
from the model based on the SLENTRY= and SLSTAY= significance levels. The
subset models are created based on the SLENTRY and SLSTAY significance levels,
but the model that is chosen is based solely on the subset model that has
the smallest SBC criterion.
selection=stepwise
slentry=0.25 slstay=0.25;
The SCORE statement specifies the data set that you want to score in
conjunction with training. The DATA= option identifies the score data set
(for this example, the test data set).
score data=sampsio.dmtbase nodmdb
The OUT=option identifies the output data set that contains estimates
and fit statistics for the scored data set. The RENAME=option enables you
to rename variables in the OUT= data set.
out=regout(rename=(p_logsal=predict r_logsal=residual));
title 'Output from the DMREG Procedure';
run;
PROC PRINT produces a report of selected variables from the OUTEST=
data set.
proc print data=regest noobs label;
var _step_ _chosen_ _sbc_ _mse_ _averr_ _tmse_ _taverr_;
where _type_ = 'PARMS';
title 'Partial Listing of the OUTEST= Data Set';
run;
PROC GPLOT produces diagnostic plots of the scored test data. The first
PLOT statement plots the response versus the predicted values.
proc gplot data=regout;
plot logsalar*predict / haxis=axis1 vaxis=axis2 frame;
symbol c=black i=none v=dot h=3 pct;
axis1 c=black width=2.5;
axis2 c=black width=2.5;
title 'Diagnostic Plots for the Scored Baseball Data';
The second PLOT statement plots the residuals versus the predicted values.
plot residual*predict / haxis=axis1 vaxis=axis2;
run;
quit;
Dostları ilə paylaş: |