The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə74/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   70   71   72   73   74   75   76   77   ...   148


Summary of the Stepwise Selection Process

The Summary of Stepwise Procedure section provides the step number, the explanatory input or inputs entered or removed at each step,

the F statistic, and the corresponding p-value in which the entry or removal of the input is based. For this example, 8 of the 19 original

inputs met the 0.25 entry and stay probability values.




List Report of Selected Variables in the OUTEST= data set

The example PROC PRINT report of the OUTEST= data set lists selected fit statistics for the training and test data sets. The default

OUTEST data set contains two observations for each step number. These observations are distinguished by value of the _TYPE_ variable:

_TYPE_ = "PARMS" - contains parameter estimate statistics

q   

_TYPE_= "T" - contains the t-value for the parameter estimate



q   

Because a WHERE statement was used to select only values of TYPE = "PARMS", this report contains one observation per step number.

An additional observation is displayed that identifies the model chosen based on the SBC criterion (CHOOSE="SBC").

GPLOT Diagnostic Plots for the Scored Baseball Test Data

Plot of the log of salary versus the predicted log of salary.




Plot of the residual values versus the predicted log of salary.


Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.


 

The PROC DMREG statement invokes the procedure. The DATA= option identifies

the training data set that is used to fit the model. The DMDBCAT= option identifies

the training data catalog. 

proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase



 

The TESTDATA= option identifies the test data set. The OUTEST= option

creates the output data set containing estimates and fit statistics.

               testdata=sampsio.dmtbase outest=regest;




 

The CLASS statement specifies the categorical variables to be used in

the regression analysis.

   class league division position;




 

The MODEL statement specifies the linear model. The ERROR=normal model

option specifies to use the normal error distribution. The CHOOSE=SBC model

option specifies to choose the model subset with the smallest Schwarz Bayesian

criterion.

   


   model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb

                    yr_major cr_atbat cr_hits cr_home cr_runs 

                    cr_rbi cr_bb league division position no_outs 

                    no_assts no_error 

                   

                    / error=normal  

                      choose=sbc



 

The MODEL option SELECTION=STEPWISE specifies to use the stepwise variable

selection method. Stepwise selection systematically adds and deletes inputs

from the model based on the SLENTRY= and SLSTAY= significance levels. The

subset models are created based on the SLENTRY and SLSTAY significance levels,

but the model that is chosen is based solely on the subset model that has

the smallest SBC criterion.

                     selection=stepwise 

                     slentry=0.25 slstay=0.25;



 

The SCORE statement specifies the data set that you want to score in

conjunction with training. The DATA= option identifies the score data set

(for this example, the test data set). 

   score data=sampsio.dmtbase nodmdb



 

The OUT=option identifies the output data set that contains estimates

and fit statistics for the scored data set. The RENAME=option enables you

to rename variables in the OUT= data set.

    out=regout(rename=(p_logsal=predict r_logsal=residual));

    title 'Output from the DMREG Procedure';

run;



 

PROC PRINT produces a report of selected variables from the OUTEST=

data set.

proc print data=regest noobs label;

   var _step_ _chosen_  _sbc_  _mse_  _averr_ _tmse_ _taverr_;

   where _type_ = 'PARMS';

   title 'Partial Listing of the OUTEST= Data Set';

run;



 

PROC GPLOT produces diagnostic plots of the scored test data. The first

PLOT statement plots the response versus the predicted values.

proc gplot data=regout;

   plot logsalar*predict / haxis=axis1 vaxis=axis2 frame;

   symbol c=black i=none v=dot h=3 pct;

   axis1 c=black width=2.5;

   axis2 c=black width=2.5;

   title 'Diagnostic Plots for the Scored Baseball Data';



 

The second PLOT statement plots the residuals versus the predicted values.

   

   plot residual*predict / haxis=axis1 vaxis=axis2;



run; 

quit;



Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   70   71   72   73   74   75   76   77   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə