contains the following columns:
N0 - number of observations that have an observed value of PURCHASE = 'No'
q
N1 - number of observations that have an observed value of PURCHASE = 'Yes'.
q
Nmiss - number of missing values.
q
X - center value of the corresponding interval.
q
P - predicted value for the center value X of the sub-interval.
q
Estimating logistic
Iter Alpha Beta
0 -2.0520 4.0998
1 -3.1242 6.1777
2 -3.6975 7.2727
3 -3.8236 7.5127
4 -3.8284 7.5219
N0 N1 Nmiss X P
----------------------------------------------------
46 1 0 -0.135 0.0078
2 0 0 -0.130 0.0081
1 0 0 -0.125 0.0084
1 0 0 -0.120 0.0087
4 0 0 -0.115 0.0091
4 0 0 -0.110 0.0094
1 0 0 -0.105 0.0098
1 0 0 -0.100 0.0101
2 0 0 -0.095 0.0105
3 0 0 -0.090 0.0109
3 0 0 -0.085 0.0113
4 0 0 -0.080 0.0118
1 0 0 -0.075 0.0122
2 0 0 -0.070 0.0127
Additional subintervals are not listed.
0 4 0 1.085 0.9870
0 3 0 1.090 0.9875
0 2 0 1.095 0.9880
0 1 0 1.100 0.9884
0 4 0 1.105 0.9888
0 2 0 1.110 0.9892
0 1 0 1.115 0.9896
0 1 0 1.125 0.9904
0 4 0 1.130 0.9907
0 23 0 1.140 0.9914
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
Before you analyze the data using the DMINE procedure, you must create
the DMDB encoded data set and catalog. For more information about how to do
this, see "Example 1: Getting Started with the DMDB Procedure"
in the DMDB procedure documentation. Since the (DESCENDING) ORDER option is
specified for the target PURCHASE on the CLASS statement, the DMINE procedure
reads this encoded information from the metadata and then models the probability
that a customer will make a purchase (PURCHASE = 'Yes'). The default ORDER
is set to ASCENDING for all class variables.
proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;
id acctnum;
var amount income homeval frequent recency age
domestic apparel leisure promo7 promo13 dpm12
county return mensware flatware homeacc lamps
linens blankets towels outdoor coats wcoat
wappar hhappar jewelry custdate numkids travtime job;
class purchase(desc) marital ntitle gender telind
aprtmnt snglmom mobile kitchen luxury dishes tmktord
statecod race origin heat numcars edlevel;
run;
The PROC DMINE statement invokes the procedure. The DATA= option identifies
the DMDB encoded training data set that is used to fit the model. The DMDBCAT=
option identifies the DMDB training data catalog.
proc dmine data=WORK.dmbexa1 dmdbcat=catexa1
The MINR2= option specifies a lower bound for the individual R-square
value to be eligible for the model selection process. Variables with R2 values
less than the MINR2 cutoff are not entered into the model. The STOPR2 specifies
a lower value for the incremental model R-square value at which the forward
selection process is stopped.
minr2=0.020 stopr2=0.0050;
The VAR statement specifies the numeric and categorical inputs (independent
variables).
var income homeval frequent recency age
domestic apparel leisure promo7 promo13 dpm12
county return mensware flatware homeacc lamps
linens blankets towels outdoor coats wcoat
wappar hhappar jewelry custdate numkids travtime job
marital ntitle gender telind aprtmnt snglmom mobile
kitchen luxury dishes tmktord statecod race origin heat
numcars edlevel;
PROC DMNEURL: Approximation to
PROC NEURAL
Purpose of PROC DMNEURL
In its current form, PROC DMNEURL tries to establish a nonlinear model for the
prediction of a binary or interval scaled response variable (called target in data mining
terminology). The approach will soon be extended to nominal and ordinal scaled
response variables.
The algorithm used in DMNEURL was developed to overcome some problems of
PROC NEURAL for data mining purposes, especially when the data set contains
many highly collinear variables:
1. The nonlinear estimation problem in common Neural Networks is seriously
underdetermined yielding to highly rankdeficient Hessian matrices and result-
ing in extremely slow convergence (close to linear) of nonlinear optimization
algorithms.
µ
Full-rank estimation.
2. Each function call in PROC NEURAL corresponds to a single run through
the entire (training) data set and normally many function calls are needed for
convergent nonlinear optimization with rankdeficient Hessians.
µ
Optimization of discrete problem with all data incore.
3. Because the zero eigenvalues in a Hessian matrix correspond to long and very
flat valleys in the shape of the objective function, the traditional Neural Net
approach has serious problems to decide when an estimate is close to an appro-
priate solution and the optimization process can be terminated.
µ
Quadratic convergence.
4. For the same reasons, the common Neural Net algorithms suffer from a high
sensibility toward finding local rather than global optimal solutions and the
optimization result often is very sensitive w.r.t. the starting point of the opti-
mization.
µ
Good starting point.
With PROC DMNEURL we deal with specified optimization problems (with full
rank Hessian matrices) which have not many parameters and for which good starting
points can be obtained. The convergence of the nonlinear optimizer is normally very
fast, resulting mostly in less than 10 iterations per optimization. The function and
derivative calls during the optimization do not need any passes through the data set,
however, the search for obtaining good starting points and the final evaluations of
the solutions (scoring of all observations) need passes through the data, as well as
a number of preliminary tasks. In PROC DMNEURL we fit separately an entire