The arboretum procedure

Yüklə 3,07 Mb.

Pdf görüntüsü

səhifə	47/148
tarix	30.04.2018
ölçüsü	3,07 Mb.
	#40673

1 ... 43 44 45 46 47 48 49 50 ... 148

Purpose of PROC DMNEURL

contains the following columns:

N0 - number of observations that have an observed value of PURCHASE = 'No'

N1 - number of observations that have an observed value of PURCHASE = 'Yes'.

Nmiss - number of missing values.

X - center value of the corresponding interval.

P - predicted value for the center value X of the sub-interval.

Estimating logistic

Iter Alpha Beta

0 -2.0520 4.0998

1 -3.1242 6.1777

2 -3.6975 7.2727

3 -3.8236 7.5127

4 -3.8284 7.5219

N0 N1 Nmiss X P

----------------------------------------------------

46 1 0 -0.135 0.0078

2 0 0 -0.130 0.0081

1 0 0 -0.125 0.0084

1 0 0 -0.120 0.0087

4 0 0 -0.115 0.0091

4 0 0 -0.110 0.0094

1 0 0 -0.105 0.0098

1 0 0 -0.100 0.0101

2 0 0 -0.095 0.0105

3 0 0 -0.090 0.0109

3 0 0 -0.085 0.0113

4 0 0 -0.080 0.0118

1 0 0 -0.075 0.0122

2 0 0 -0.070 0.0127

Additional subintervals are not listed.

0 4 0 1.085 0.9870

0 3 0 1.090 0.9875

0 2 0 1.095 0.9880

0 1 0 1.100 0.9884

0 4 0 1.105 0.9888

0 2 0 1.110 0.9892

0 1 0 1.115 0.9896

0 1 0 1.125 0.9904

0 4 0 1.130 0.9907

0 23 0 1.140 0.9914

Before you analyze the data using the DMINE procedure, you must create

the DMDB encoded data set and catalog. For more information about how to do

this, see "Example 1: Getting Started with the DMDB Procedure"

in the DMDB procedure documentation. Since the (DESCENDING) ORDER option is

specified for the target PURCHASE on the CLASS statement, the DMINE procedure

reads this encoded information from the metadata and then models the probability

that a customer will make a purchase (PURCHASE = 'Yes'). The default ORDER

is set to ASCENDING for all class variables.

proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;

id acctnum;

var amount income homeval frequent recency age

domestic apparel leisure promo7 promo13 dpm12

county return mensware flatware homeacc lamps

linens blankets towels outdoor coats wcoat

wappar hhappar jewelry custdate numkids travtime job;

class purchase(desc) marital ntitle gender telind

aprtmnt snglmom mobile kitchen luxury dishes tmktord

statecod race origin heat numcars edlevel;

run;

The PROC DMINE statement invokes the procedure. The DATA= option identifies

the DMDB encoded training data set that is used to fit the model. The DMDBCAT=

option identifies the DMDB training data catalog.

proc dmine data=WORK.dmbexa1 dmdbcat=catexa1

The MINR2= option specifies a lower bound for the individual R-square

value to be eligible for the model selection process. Variables with R2 values

less than the MINR2 cutoff are not entered into the model. The STOPR2 specifies

a lower value for the incremental model R-square value at which the forward

selection process is stopped.

minr2=0.020 stopr2=0.0050;

The VAR statement specifies the numeric and categorical inputs (independent

variables).

var income homeval frequent recency age

domestic apparel leisure promo7 promo13 dpm12

county return mensware flatware homeacc lamps

linens blankets towels outdoor coats wcoat

wappar hhappar jewelry custdate numkids travtime job

marital ntitle gender telind aprtmnt snglmom mobile

kitchen luxury dishes tmktord statecod race origin heat

numcars edlevel;

PROC DMNEURL: Approximation to

PROC NEURAL

Purpose of PROC DMNEURL

In its current form, PROC DMNEURL tries to establish a nonlinear model for the

prediction of a binary or interval scaled response variable (called target in data mining

terminology). The approach will soon be extended to nominal and ordinal scaled

response variables.

The algorithm used in DMNEURL was developed to overcome some problems of

PROC NEURAL for data mining purposes, especially when the data set contains

many highly collinear variables:

1. The nonlinear estimation problem in common Neural Networks is seriously

underdetermined yielding to highly rankdeﬁcient Hessian matrices and result-

ing in extremely slow convergence (close to linear) of nonlinear optimization

algorithms.

Full-rank estimation.

2. Each function call in PROC NEURAL corresponds to a single run through

the entire (training) data set and normally many function calls are needed for

convergent nonlinear optimization with rankdeﬁcient Hessians.

Optimization of discrete problem with all data incore.

3. Because the zero eigenvalues in a Hessian matrix correspond to long and very

ﬂat valleys in the shape of the objective function, the traditional Neural Net

approach has serious problems to decide when an estimate is close to an appro-

priate solution and the optimization process can be terminated.

Quadratic convergence.

4. For the same reasons, the common Neural Net algorithms suffer from a high

sensibility toward ﬁnding local rather than global optimal solutions and the

optimization result often is very sensitive w.r.t. the starting point of the opti-

mization.

Good starting point.

With PROC DMNEURL we deal with speciﬁed optimization problems (with full

rank Hessian matrices) which have not many parameters and for which good starting

points can be obtained. The convergence of the nonlinear optimizer is normally very

fast, resulting mostly in less than 10 iterations per optimization. The function and

derivative calls during the optimization do not need any passes through the data set,

however, the search for obtaining good starting points and the ﬁnal evaluations of

the solutions (scoring of all observations) need passes through the data, as well as

a number of preliminary tasks. In PROC DMNEURL we ﬁt separately an entire

Yüklə 3,07 Mb.

Dostları ilə paylaş:

1 ... 43 44 45 46 47 48 49 50 ... 148