The arboretum procedure



Yüklə 3,07 Mb.

səhifə72/148
tarix30.04.2018
ölçüsü3,07 Mb.
1   ...   68   69   70   71   72   73   74   75   ...   148

 

The SCORE statement scores the training data set and outputs fit statistics

to the OUTFIT= data set. A note is printed in the log that indicates the training

data set is scored when the DATA= option is omitted.

 score out=out outfit=fit;



 

The second SCORE statement scores the SAMPSIO.DMSRING data set. The

NODMDB option specifies that the score data set contains raw values instead

of DMDB encoded data.

score data=sampsio.dmsring nodmdb out=gridout;

     title 'Linear-Logistic Regression with Ordinal Target';

run;



 

PROC PRINT report of selected fit statistics for the training data.

proc print data=fit noobs label;

   var _aic_ _max_ _rfpe_ _misc_ ;

   title2 'Fit Statistics for the Training Data Set';

 run;



 

PROC FREQ report of the misclassification rate for the training data

set. The F_C variable is the actual target value for each case and the I_C

variable is the target value into which the case is classified. 

proc freq data=out;

   tables f_c*i_c;

   title2 'Misclassification Table: Training Data';

run;



 

PROC GPLOT produces a plot of the classification results for the training

data.

proc gplot data=out;



   plot y*x=i_c / haxis=axis1 vaxis=axis2;

   symbol  c=black i=none v=dot;

   symbol2 c=red i=none v=square;

   symbol3 c=green i=none v=triangle;

   axis1 c=black width=2.5 order=(0 to 30 by 5);

   axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

   title2 'Classification Results';

run;



 

PROC GCONTOUR produces plots of the posterior probabilities.

proc gcontour data=gridout;

   plot y*x=p_c1 / pattern ctext=black coutline=gray;

   plot y*x=p_c2 / pattern ctext=black coutline=gray;

   plot y*x=p_c3 / pattern ctext=black coutline=gray;

   title2 'Posterior Probabilities';

   pattern v=msolid;

   legend frame;

run;



 

The model statement specifies the quadratic-logistic model. The vertical

bars indicate that interactions of the specified inputs should be generated. "@2"

indicates that only interactions up to the second order should be used.

proc dmreg data=sampsio.dmdring dmdbcat=sampsio.dmdring;

   class c;

   model c=x|x|y|y @2;

   score out=qout outfit=qfit;

   score data=sampsio.dmsring nodmdb out=qgridout;

   title1 'Quadratic-Logistic Regression with Ordinal Target';

run;



 

PROC PRINT produces a report of selected fit statistics for the training

data.

proc print data=qfit noobs label;



   var _aic_ _max_ _rfpe_ _misc_;

  title2 'Fit Statistics for the Training Data Set';

 run;



 

PROC FREQ creates a report of the misclassification matrix for the training

data set.

proc freq data=qout;

   tables f_c*i_c;

   title2 'Misclassification Table: Training Data';

run;



 

PROC GPLOT plots the classification results for the training data set.

proc gplot data=qout;

   plot y*x=i_c / haxis=axis1 vaxis=axis2;

   symbol  c=black i=none v=dot;

   symbol2 c=red i=none v=square;

   symbol3 c=green i=none v=triangle;

   axis1 c=black width=2.5 order=(0 to 30 by 5);

   axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

   title2 'Classification Results';

run;



 

PROC GCONTOUR plots the posterior probabilities.

proc gcontour data=qgridout;

   plot y*x=p_c1 / pattern ctext=black coutline=gray;

   plot y*x=p_c2 / pattern ctext=black coutline=gray;;

   plot y*x=p_c3 / pattern ctext=black coutline=gray;;

   title2 'Posterior Probabilities';

   pattern v=msolid;

   legend frame;

run;



The DMREG Procedure

Example 2: Performing a Stepwise OLS Regression (DMREG

Baseball Data)

Features

Stepwise Regression using the SBC selection criterion

q   

Scoring a Test Data Set with the Score statement



q   

Outputting Fit Statistics

q   

Creating Diagnostic Plots



q   

This example demonstrates how to perform a stepwise OLS regression using the DMREG procedure. The example DMDB training data

set SAMPSIO.DMBASE (baseball data set) contains performance measures and salary levels for regular hitters and leading substitute

hitters in major league baseball for the year 1986 (Collier 1987). There is one observation per hitter. The continuous response variable is

the log of the players salary (logsalar). The SAMPSIO.DMTBASE data set is a test data set which is scored using the scoring formula

from the trained model. The SAMPSIO.DMBASE and SAMPSIO.DMTBASE data sets and the SAMPSIO.DMDBASE data mining

catalog are stored in the sample library.

Program

 

proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase



 

               testdata=sampsio.dmtbase outest=regest;

 

   class league division position;



 

   


   model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb

                    yr_major cr_atbat cr_hits cr_home cr_runs 

                    cr_rbi cr_bb league division position no_outs 

                    no_assts no_error 

                   

                    / error=normal  

                      choose=sbc

 

                     selection=stepwise 



                     slentry=0.25 slstay=0.25;

 

   score data=sampsio.dmtbase nodmdb



 

    out=regout(rename=(p_logsal=predict r_logsal=residual));

    title 'Output from the DMREG Procedure';

run;





Dostları ilə paylaş:
1   ...   68   69   70   71   72   73   74   75   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə