The TRAIN statement trains the network in order to find the best weights
(parameter estimates) that accurately reflect the training data.
train;
The first SCORE statement scores the training data. The OUT= option
identifies the output data set that contains outputs. The OUTFIT= option
identifies the output data set that contains fit statistics.
score out=out outfit=fit;
The second SCORE statement specifies the score data set that you want
to score in conjunction with training.
score data=sampsio.dmsring out=gridout;
title 'MLP with 3 Hidden Units';
run;
PROC PRINT lists selected training fit statistics.
proc print data=fit noobs label;
var _aic_ _ase_ _max_ _rfpe_ _misc_ _wrong_;
where _name_ = 'OVERALL';
title2 'Fits Statistics for the Training Data Set';
run;
PROC FREQ creates a misclassification table for the training data. The
F_C variable is the actual target value for each case and the I_C variable
is the target value into which the case is classified.
proc freq data=out;
tables f_c*i_c;
title2 'Misclassification Table';
run;
PROC GPLOT plots the classification results for the training data.
proc gplot data=out;
plot y*x=i_c /haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot;
symbol2 c=black i=none v=square;
symbol3 c=black i=none v=triangle;
axis1 c=black width=2.5 order=(0 to 30 by 5);
axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);
title2 'Classification Results';
run;
PROC GCONTOUR produces a contour plot of the posterior probabilities
for the scored data set.
proc gcontour data=gridout;
plot y*x=p_c1 / pattern ctext=black coutline=gray;
plot y*x=p_c2 / pattern ctext=black coutline=gray;;
plot y*x=p_c3 / pattern ctext=black coutline=gray;;
pattern v=msolid;
legend frame;
title2 'Posterior Probabilities';
run;
The NEURAL Procedure
Example 2: Developing a Neural Network for a Continuous
Target
Features
Specifying Input, Hidden, and Output Layers
q
Defining Direct Connections
q
Scoring Data with the Score Statement
q
Outputting Fit Statistics
q
This example demonstrates how to develop a neural network model for a continuous target. A simple multilayer perceptron
architecture is employed with one hidden unit and direct connections. The example DMDB training data set SAMPSIO.DMBASE
(baseball data set) contains performance measures and salary levels for regular hitters and leading substitute hitters in major
league baseball for the year 1986 (Collier 1987). There is one observation per hitter. The continuous target variable is log of salary
(logsalar).
Prior to fitting the neural network model, the number of original model inputs was reduced based on a preliminary stepwise PROC
DMREG run. The input set from the model with the smallest SBC (Schwarz's Bayesian Criterion) is used as input to the network.
The output from the PROC DMREG analysis can be found in the PROC DMREG chapter, "Example 2. Performing a Stepwise
OLS Regression".
The SAMPSIO.DMTBASE data set is a test data set that is scored using the scoring formula from the trained model. The
SAMPSIO.DMBASE and SAMPSIO.DMTBASE data sets and the SAMPIO.DMDBASE catalog are stored in the sample library.
Program
proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase
testdata=sampsio.dmtbase outest=regest;
class league division position;
model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb
yr_major cr_atbat cr_hits cr_home cr_runs
cr_rbi cr_bb league division position no_outs
no_assts no_error /
error=normal selection=stepwise
slentry=0.25 slstay=0.25 choose=sbc;
title1 'Preliminary DMDREG Stepwise Selection';
run;
proc neural data=sampsio.dmdbase
dmdbcat=sampsio.dmdbase
random=12345;
input cr_hits no_hits no_outs no_error no_bb
/ level=interval id=int;
input division / level=nominal id=nom;
hidden 1 / id=hu;
target logsalar /
level=interval
id=tar ;
connect int tar;
connect nom tar;
connect int hu;
connect nom hu;
connect hu tar;
prelim 10;
train;
score data=sampsio.dmtbase outfit=netfit
out=netout(rename=(p_logsal=predict r_logsal=residual));
title 'NN:1 Hidden Unit, Direct Connections,
and Reduced Input Set';
run;
proc print data=netfit noobs label;
where _name_ = 'LOGSALAR';
var _iter_ _pname_ _tmse_ _trmse_ _tmax_;
title 'Partial Listing of the Score OUTFIT= Data Set';
run;
proc gplot data=netout;
plot logsalar*predict / haxis=axis1 vaxis=axis2;
symbol c=black i=none v=dot h=3 pct;
axis1 c=black width=2.5;
axis2 c=black width=2.5;
title 'Diagnostic Plots for the Scored Test Baseball Data';
plot residual*predict / haxis=axis1 vaxis=axis2;
run;
quit;
Output