2
§
PROC DMNEURL: Approximation to PROC NEURAL
set of about 8 activation functions and select the best result. Since the optimization
processes for different activation functions do not depend on each other, the computer
time could be reduced greatly by parallel processing.
Except for applications where PROC NEURAL would hit a local solution much
worse than the global solution, it is not expected that PROC DMNEURL can beat
PROC NEURAL in the precision of the prediction. However, for the applications we
have run until now we found the results of PROC DMNEURL very close to those of
PROC NEURAL. PROC DMNEURL will be faster than PROC NEURAL only for
very large data sets. For small data sets, PROC NEURAL could be much faster than
PROC DMNEURL, especially for an interval target. The most efficient application
of PROC DMNEURL is the analysis of a binary target variable without FREQ and
WEIGHT statement and without COST variables in the input data set.
Application: HMEQ Data Set:
Binary Target BAD
To illustrate the use of PROC DMNEURL we choose the HMEQ data set:
libname sampsio ’/sas/a612/dmine/sampsio’;
proc dmdb batch data=sampsio.hmeq out=dmdbout dmdbcat=outcat;
var LOAN MORTDUE VALUE YOJ DELINQ CLAGE NINQ CLNO DEBTINC;
class BAD(DESC) REASON(ASC) JOB(ASC) DEROG(ASC);
target BAD;
run;
When selecting the binary target variable BAD a typical run of PROC DMNEURL
would be the following:
proc dmneurl data=dmdbout dmdbcat=outcat
outclass=oclass outest=estout out=dsout outfit=ofit
ptable maxcomp=3 maxstage=5;
var LOAN MORTDUE VALUE REASON JOB YOJ DEROG DELINQ
CLAGE NINQ CLNO DEBTINC;
target BAD;
run;
The number of parameters
Ô
estimated in each stage of the optimization is
Ô
¾
£
·
½
, where
is the number of components that is selected at the stage. Since here
¿
is specified with the MAXCOMP= option each optimization process estimates
only
Ô
parameters.
First some general information is printed and the four moments of the numeric data
set variables involved in the analysis:
The DMNEURL Procedure
Binary Target
BAD
Number Observations
5960
NOBS w/o Missing Target
5960
Purpose of PROC DMNEURL
§
3
Link Function
LOGIST
Selection Criterion
SSE
Optimization Criterion
SSE
Estimation Stages
5
Max. Number Components
3
Minimum R2 Value
0.000050
Number Grid Points
17
Response Profile for Target: BAD
Level
Nobs
Frequency
Weight
1
1189
1189
1189.000000
0
4771
4771
4771.000000
Variable
Mean
Std Dev
Skewness
Kurtosis
LOAN
18608
11207
2.02378
6.93259
MORTDUE
67350
44458
1.81448
6.48187
VALUE
99863
57386
3.05334
24.36280
YOJ
8.15130
7.57398
0.98846
0.37207
DELINQ
0.40570
1.12727
4.02315
23.56545
CLAGE
170.47634
85.81009
1.34341
7.59955
NINQ
1.08456
1.72867
2.62198
9.78651
CLNO
20.50285
10.13893
0.77505
1.15767
DEBTINC
26.59885
8.60175
2.85235
50.50404
For the first stage we select three eigenvectors corresponding to the 4th, 11th, and
2nd largest eigenvalues. Obviously, there is no relationship between
¯
the
Ê
¾
value which measures the prediction of the response (target) variable by
each eigenvector
¯
and the eigenvalue corresponding to each eigenvector which measures the vari-
ance explained in the
Ì
data matrix.
Therefore, the eigenvalues are not used in the analysis of PROC DMNEURL and are
printed only for curiosity.
Component Selection: SS(y) and R2 (SS_total=4771)
Comp
Eigval
R-Square
F Value
p-Value
SSE
4
9397.769045
0.017419
105.640645
<.0001
4687.893424
11
6327.041282
0.006317
38.550835
<.0001
4657.755732
2
13164
0.005931
36.408247
<.0001
4629.461194
The optimization history indicates a maximum of 11 iterations for the activation func-
tion LOGIST: