The SCORE statement scores the training data set and outputs fit statistics
to the OUTFIT= data set. A note is printed in the log that indicates the training
data set is scored when the DATA= option is omitted.
score out=out outfit=fit;
The second SCORE statement scores the SAMPSIO.DMSRING data set. The
NODMDB option specifies that the score data set contains raw values instead
of DMDB encoded data.
score data=sampsio.dmsring nodmdb out=gridout;
title 'Linear-Logistic Regression with Ordinal Target';
run;
PROC PRINT report of selected fit statistics for the training data.
proc print data=fit noobs label;
var _aic_ _max_ _rfpe_ _misc_ ;
title2 'Fit Statistics for the Training Data Set';
run;
PROC FREQ report of the misclassification rate for the training data
set. The F_C variable is the actual target value for each case and the I_C
variable is the target value into which the case is classified.
proc freq data=out;
tables f_c*i_c;
title2 'Misclassification Table: Training Data';
run;
PROC GPLOT produces a plot of the classification results for the training
data.
proc gplot data=out;
plot y*x=i_c / haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot;
symbol2 c=red i=none v=square;
symbol3 c=green i=none v=triangle;
axis1 c=black width=2.5 order=(0 to 30 by 5);
axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);
title2 'Classification Results';
run;
PROC GCONTOUR produces plots of the posterior probabilities.
proc gcontour data=gridout;
plot y*x=p_c1 / pattern ctext=black coutline=gray;
plot y*x=p_c2 / pattern ctext=black coutline=gray;
plot y*x=p_c3 / pattern ctext=black coutline=gray;
title2 'Posterior Probabilities';
pattern v=msolid;
legend frame;
run;
The model statement specifies the quadratic-logistic model. The vertical
bars indicate that interactions of the specified inputs should be generated. "@2"
indicates that only interactions up to the second order should be used.
proc dmreg data=sampsio.dmdring dmdbcat=sampsio.dmdring;
class c;
model c=x|x|y|y @2;
score out=qout outfit=qfit;
score data=sampsio.dmsring nodmdb out=qgridout;
title1 'Quadratic-Logistic Regression with Ordinal Target';
run;
PROC PRINT produces a report of selected fit statistics for the training
data.
proc print data=qfit noobs label;
var _aic_ _max_ _rfpe_ _misc_;
title2 'Fit Statistics for the Training Data Set';
run;
PROC FREQ creates a report of the misclassification matrix for the training
data set.
proc freq data=qout;
tables f_c*i_c;
title2 'Misclassification Table: Training Data';
run;
PROC GPLOT plots the classification results for the training data set.
proc gplot data=qout;
plot y*x=i_c / haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot;
symbol2 c=red i=none v=square;
symbol3 c=green i=none v=triangle;
axis1 c=black width=2.5 order=(0 to 30 by 5);
axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);
title2 'Classification Results';
run;
PROC GCONTOUR plots the posterior probabilities.
proc gcontour data=qgridout;
plot y*x=p_c1 / pattern ctext=black coutline=gray;
plot y*x=p_c2 / pattern ctext=black coutline=gray;;
plot y*x=p_c3 / pattern ctext=black coutline=gray;;
title2 'Posterior Probabilities';
pattern v=msolid;
legend frame;
run;
The DMREG Procedure
Example 2: Performing a Stepwise OLS Regression (DMREG
Baseball Data)
Features
Stepwise Regression using the SBC selection criterion
q
Scoring a Test Data Set with the Score statement
q
Outputting Fit Statistics
q
Creating Diagnostic Plots
q
This example demonstrates how to perform a stepwise OLS regression using the DMREG procedure. The example DMDB training data
set SAMPSIO.DMBASE (baseball data set) contains performance measures and salary levels for regular hitters and leading substitute
hitters in major league baseball for the year 1986 (Collier 1987). There is one observation per hitter. The continuous response variable is
the log of the players salary (logsalar). The SAMPSIO.DMTBASE data set is a test data set which is scored using the scoring formula
from the trained model. The SAMPSIO.DMBASE and SAMPSIO.DMTBASE data sets and the SAMPSIO.DMDBASE data mining
catalog are stored in the sample library.
Program
proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase
testdata=sampsio.dmtbase outest=regest;
class league division position;
model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb
yr_major cr_atbat cr_hits cr_home cr_runs
cr_rbi cr_bb league division position no_outs
no_assts no_error
/ error=normal
choose=sbc
selection=stepwise
slentry=0.25 slstay=0.25;
score data=sampsio.dmtbase nodmdb
out=regout(rename=(p_logsal=predict r_logsal=residual));
title 'Output from the DMREG Procedure';
run;
Dostları ilə paylaş: |