The arboretum procedure



Yüklə 3,07 Mb.
Pdf görüntüsü
səhifə104/148
tarix30.04.2018
ölçüsü3,07 Mb.
#40673
1   ...   100   101   102   103   104   105   106   107   ...   148

 

The TRAIN statement trains the network in order to find the best weights

(parameter estimates) that accurately reflect the training data.

   


    train;


 

The first SCORE statement scores the training data. The OUT= option

identifies the output data set that contains outputs. The  OUTFIT= option

identifies the output data set that contains fit statistics.

    

   score out=out outfit=fit;




 

The second SCORE statement specifies the score data set that you want

to score in conjunction with training. 

   score data=sampsio.dmsring out=gridout;

   title 'MLP with 3 Hidden Units';

run;



 

PROC PRINT lists selected training fit statistics.

proc print data=fit noobs label;

   var _aic_ _ase_ _max_ _rfpe_ _misc_ _wrong_;

   where _name_ = 'OVERALL';

   title2 'Fits Statistics for the Training Data Set';

 run;



 

PROC FREQ creates a misclassification table for the training data. The

F_C variable is the actual target value for each case and the I_C variable

is the target value into which the case is classified. 

proc freq data=out;

   tables f_c*i_c;

   title2 'Misclassification Table';

run;



 

PROC GPLOT plots  the classification results for the training data.

proc gplot data=out;

   plot y*x=i_c /haxis=axis1 vaxis=axis2;

   symbol c=black i=none v=dot;

   symbol2 c=black i=none v=square;

   symbol3 c=black i=none v=triangle;

   axis1 c=black width=2.5  order=(0 to 30 by 5);

   axis2 c=black width=2.5 minor=none order=(0 to 20 by 2);

   title2 'Classification Results';

run;



 

PROC GCONTOUR produces a contour plot of the posterior probabilities

for the scored data set.

proc gcontour data=gridout;

   plot y*x=p_c1 / pattern ctext=black coutline=gray;

   plot y*x=p_c2 / pattern ctext=black coutline=gray;;

   plot y*x=p_c3 / pattern ctext=black coutline=gray;;

   pattern v=msolid;

   legend frame;

   title2 'Posterior Probabilities';

run;



The NEURAL Procedure

Example 2: Developing a Neural Network for a Continuous

Target

Features

Specifying Input, Hidden, and Output Layers

q   

Defining Direct Connections



q   

Scoring Data with the Score Statement

q   

Outputting Fit Statistics



q   

This example demonstrates how to develop a neural network model for a continuous target. A simple multilayer perceptron

architecture is employed with one hidden unit and direct connections. The example DMDB training data set SAMPSIO.DMBASE

(baseball data set) contains performance measures and salary levels for regular hitters and leading substitute hitters in major

league baseball for the year 1986 (Collier 1987). There is one observation per hitter. The continuous target variable is log of salary

(logsalar).

Prior to fitting the neural network model, the number of original model inputs was reduced based on a preliminary stepwise PROC

DMREG run. The input set from the model with the smallest SBC (Schwarz's Bayesian Criterion) is used as input to the network.

The output from the PROC DMREG analysis can be found in the PROC DMREG chapter, "Example 2. Performing a Stepwise

OLS Regression".

The SAMPSIO.DMTBASE data set is a test data set that is scored using the scoring formula from the trained model. The

SAMPSIO.DMBASE and SAMPSIO.DMTBASE data sets and the SAMPIO.DMDBASE catalog are stored in the sample library.



Program

 

proc dmreg data=sampsio.dmdbase dmdbcat=sampsio.dmdbase



   testdata=sampsio.dmtbase outest=regest;

   class league division position;

   model logsalar = no_atbat no_hits no_home no_runs no_rbi no_bb

                    yr_major cr_atbat cr_hits cr_home cr_runs

                    cr_rbi cr_bb league division position no_outs

                    no_assts no_error /

                    error=normal selection=stepwise 

                    slentry=0.25 slstay=0.25 choose=sbc;

   title1 'Preliminary DMDREG Stepwise Selection';

run;


 

proc neural data=sampsio.dmdbase 

            dmdbcat=sampsio.dmdbase 

            random=12345; 

 

   input cr_hits no_hits no_outs no_error no_bb



         / level=interval id=int;

   input division / level=nominal id=nom;

 

    hidden 1 / id=hu;   




 

target logsalar / 

                 level=interval 

                 id=tar ;

 

    connect int tar;



    connect nom tar;

    connect int hu;

    connect nom hu;

    connect hu tar;

 

   prelim 10;



 

   train;

 

       score data=sampsio.dmtbase outfit=netfit   



 

    out=netout(rename=(p_logsal=predict r_logsal=residual));

    title 'NN:1 Hidden Unit, Direct Connections, 

           and Reduced Input Set';

run;

 

proc print data=netfit noobs label;



   where _name_ = 'LOGSALAR';

   var _iter_ _pname_ _tmse_  _trmse_ _tmax_;

   title 'Partial Listing of the Score OUTFIT= Data Set';

run;


 

proc gplot data=netout;

    plot logsalar*predict / haxis=axis1 vaxis=axis2;

       symbol c=black i=none v=dot h=3 pct;

    axis1 c=black width=2.5;

    axis2 c=black width=2.5;

    title 'Diagnostic Plots for the Scored Test Baseball Data';

    plot residual*predict / haxis=axis1 vaxis=axis2;

run;

quit;


Output


Yüklə 3,07 Mb.

Dostları ilə paylaş:
1   ...   100   101   102   103   104   105   106   107   ...   148




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə