The arboretum procedure



Yüklə 3.07 Mb.

səhifə40/148
tarix30.04.2018
ölçüsü3.07 Mb.
1   ...   36   37   38   39   40   41   42   43   ...   148
: documentation
documentation -> From cyber-crime to insider trading, digital investigators are increasingly being asked to
documentation -> EnCase Forensic Transform Your Investigations
documentation -> File Sharing Documentation Prepared by Alan Halter Created: 1/7/2016 Modified: 1/7/2016
documentation -> Gaia Data Release 1 Documentation release 0

The DMINE Procedure

WEIGHT Statement

Alias: WEIGHTS

Tip: Specify the WEIGHT variable in PROC DMDB so that the information is saved in the catalog

and so that the variable is used automatically as a WEIGHT variable in PROC DMINE.



WEIGHT variable;

Required Argument

variable

Specifies one numeric (interval-scaled) variable that is used to weight the input variables.

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.



The DMINE Procedure

Details

PROC DMINE performs the following two tasks:

PROC DMINE first computes a forward stepwise least-squares regression. In each step, an

independent variable is selected, which contributes maximally to the model R-square value. Two

parameters, MINR2 and STOPR2, can be specified to guide the variable selection process.

MINR2


If a variable has an individual R-square value smaller than MINR2, the variable is not

considered for selection into the model.

STOPR2

A second test is performed using the STOPR2 value: the remaining independent variable



with the largest contribution to the model R-square is added to the model. If the resulting

global R-square value changes from its former value by less than the STOPR2 value, then

the stepwise regression is terminated.

1.  


For a binary target (CLASS response variable), a fast algorithm for (approximate) logistic

regression is computed in the second part of PROC DMINE. The independent variable is the

prediction from the former least squares regression. Since only one regression variable is used in

the logistic regression, only two parameters are estimated, the intercept and slope. The range of

predicted values is divided into a number of equidistant intervals (knots), on which the logistic

function is interpolated.

If NOPRINT is not specified, a table is printed indicating the accuracy of the prediction of the

target.


2.  

Missing Values

Missing values are handled differently, depending on the type of variable.

Missing values in categorical variables are replaced with a new category that represents missing

values.


q   

Missing values in noncategorical variables are replaced with the mean.

q   

Observations with missing target values are dropped from the data.



q   

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The DMINE Procedure

Examples

The following examples were executed on the Windows NT operating system; the version of the SAS

System was 6.12TS045.

Example 1: Modeling a Continuous Target with the DMINE Procedure (Simple Selection

Settings)

Example 2: Including the AOV16 and Grouping Variables into the Analysis (Detailed

Selection Settings)

Example 3: Modeling a Binary Target with the DMINE Procedure

Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.




The DMINE Procedure

Example 1: Modeling a Continuous Target with the DMINE

Procedure (Simple Selection Settings)

Features:

Setting the MINR2= and STOPR2= cutoff values.

q   

Specifying the target and input variables.



q   

Excluding the AOV16 variables by specifying the NOAOV16 option.

q   

Excluding the two-way class interactions by specifying the NOINTER option.



q   

As a marketing analyst at a catalog company, you want to quickly identify the inputs that best predict the dollar amount

that customers will purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named

SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The interval target

AMOUNT contains the purchase amount in dollars.

There are 48 input variables available for predicting the target. Note that PURCHASE is a binary target that is modeled

in "Example 3: Modeling a Binary Target with the DMINE Procedure". ACCTNUM is an id variable, which is not a

suitable input variable.



Program

 

proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;



   id  acctnum;

   var  amount income homeval frequent recency age

        domestic apparel leisure promo7 promo13 dpm12

        county return mensware flatware homeacc lamps

        linens blankets towels outdoor coats wcoat

        wappar hhappar jewelry custdate numkids travtime job;

   class purchase(desc) marital ntitle gender telind

         aprtmnt snglmom mobile kitchen luxury dishes tmktord

         statecod race origin heat numcars edlevel;

run;


 

proc dmine data=dmbexa1 dmdbcat=catexa1

 

           minr2=0.020 stopr2=0.0050



 

           noaov16

 

           nointer;




 

   var  income homeval frequent recency age

        domestic apparel leisure promo7 promo13 dpm12

        county return mensware flatware homeacc lamps

        linens blankets towels outdoor coats wcoat

        wappar hhappar jewelry custdate numkids travtime job

        marital ntitle gender telind aprtmnt snglmom mobile 

        kitchen luxury dishes tmktord statecod race origin heat 

        numcars edlevel;

 

   target amount;



   title 'DMINE: Continuous Target';

run;


Output

DMINE Status Monitor

When you invoke the DMINE procedure, the Dmine Status Monitor window appears. This window monitors the

execution time of the procedure. To suppress the display of this window, specify the NOMONITOR option on the PROC

DMINE statement.





Dostları ilə paylaş:
1   ...   36   37   38   39   40   41   42   43   ...   148


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2017
rəhbərliyinə müraciət

    Ana səhifə