The arboretum procedure

Yüklə 3.07 Mb.

ölçüsü3.07 Mb.
1   ...   41   42   43   44   45   46   47   48   ...   148


The NOMONITOR option suppresses the monitor that displays the execution

status of the DMINE procedure.


The DMINE Procedure

Example 3: Modeling a Binary Target with the DMINE



Setting the MINR2= and STOPR2= cutoff values.


Specifying the target and input variables.


As a marketing analyst at a catalog company, you want to determine the inputs that best predict whether or not a customer

will make a purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named

SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The binary target

(PURCHASE) contains a formatted value of "Yes" if a purchase was made and a formatted value of "No" if a purchase was

not made.

There are 48 input variables available for predicting the target. Note that AMOUNT is an interval target that is modeled in

Examples 1 and 2 of this chapter. ACCTNUM is an id variable, which is not a suitable input variable.



proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;

   id  acctnum;

   var  amount income homeval frequent recency age

        domestic apparel leisure promo7 promo13 dpm12

        county return mensware flatware homeacc lamps

        linens blankets towels outdoor coats wcoat

        wappar hhappar jewelry custdate numkids travtime job;

   class purchase(desc) marital ntitle gender telind

         aprtmnt snglmom mobile kitchen luxury dishes tmktord

         statecod race origin heat numcars edlevel;



proc dmine data=WORK.dmbexa1 dmdbcat=catexa1


     minr2=0.020 stopr2=0.0050;


   var income homeval frequent recency age

        domestic apparel leisure promo7 promo13 dpm12

        county return mensware flatware homeacc lamps

        linens blankets towels outdoor coats wcoat

        wappar hhappar jewelry custdate numkids travtime job

        marital ntitle gender telind aprtmnt snglmom mobile 

        kitchen luxury dishes tmktord statecod race origin heat 

        numcars edlevel;


   target purchase;

   title 'DMINE: Binary Target';



DMINE Status Monitor

When you invoke the DMINE procedure, the Dmine Status Monitor window appears. This window monitors the execution

time of the procedure.

Partial Listing of the R-Squares for the Target Variable

This section of the output ranks all model effects by their R-square values. The degrees of freedom (DF) associated with each

effect is also listed. The significant variables are analyzed in a subsequent forward stepwise regression. Non-significant

variables are labeled as having an R2 value less than the MINR2 cutoff; these variables are not chosen in the final model.

There are four types of model effects:

Class effects are estimated for each class variable and all possible two-factor class interactions. The R-square statistic

is calculated for each class effect using a one-way analysis of variance. Two-factor interaction effects are constructed

by combining all possible levels of each class variable into one term. The degrees of freedom for a class effect is equal

to: (the number of unique factor levels minus 1). For two-factor interactions, the degrees of freedom is equal to: (the

number of levels in factor A multiplied by the number of levels in factor B minus 1). You can omit the two-factor

interaction effects from the final stepwise analysis by specifying the NOINTER option on the PROC DMINE



Group effects are created by reducing each class effect through an analysis of means. The degrees of freedom for each

group effect is equal to the number of levels. Since the USEGROUPS option was not specified in the PROC DMINE

statement, the group effects and the original class effects can be used in the final model.


VAR effects are estimated from interval variables as standard regression inputs. A simple linear regression is

performed to determine the R2 statistic for interval inputs. The degrees of freedom is always equal to 1.


AOV16 effects are calculated as a result of grouping numeric variables into a maximum of 16 equally spaced buckets.

AOV16 effects may account for possible non-linearity in the target variable PURCHASE. The degrees of freedom are

calculated as the number of groups. The AOV16 variables can be expensive to compute. You can prevent these

variables from being evaluated in the forward stepwise regression by specifying the NOAOV16 option in the PROC

DMINE statement.


For this example, the class KITCHEN*STATECOD interaction has the largest R2 with the target PURCHASE. Class and

group interactions composed of the same terms have very similar R2 values. Of all the AOV16 variables, FREQUENT has

the largest R2 value with the target.


                        DMINE: Binary Target

                    R-Squares for Target variable: PURCHASE

       Effect                                     DF          R2


       Class: KITCHEN*STATECOD                    197      0.1045

       Group: KITCHEN*STATECOD                      8      0.1026

       Class: NTITLE*STATECOD                     191      0.0955

       Group: NTITLE*STATECOD                       9      0.0940

       Class: STATECOD*ORIGIN                     166      0.0917

       Group: STATECOD*ORIGIN                       8      0.0899

       Class: DISHES*STATECOD                     169      0.0875

       Group: DISHES*STATECOD                       8      0.0862

       Class: STATECOD*EDLEVEL                    146      0.0736

       Group: STATECOD*EDLEVEL                      9      0.0723

       Class: STATECOD*HEAT                       141      0.0643

       Group: STATECOD*HEAT                         9      0.0633

       Class: LUXURY*STATECOD                     101      0.0607

       Class: STATECOD*NUMCARS                    121      0.0597

       Group: LUXURY*STATECOD                      10      0.0597

       AOV16: FREQUENT                             11      0.0592

       Group: STATECOD*NUMCARS                      9      0.0585

       Class: STATECOD*RACE                       105      0.0575

       Class: TMKTORD*STATECOD                    110      0.0571

       Group: STATECOD*RACE                         9      0.0566

       Group: TMKTORD*STATECOD                      8      0.0562

       Class: MARITAL*STATECOD                    107      0.0537

       Group: MARITAL*STATECOD                     10      0.0526

       Class: GENDER*STATECOD                     104      0.0514

       Group: GENDER*STATECOD                      10      0.0505

       Var:   FREQUENT                              1      0.0498


       Additional Effects Are Not Listed

       Group: KITCHEN*EDLEVEL                       7      0.0216

       Class: KITCHEN*RACE                         25      0.0214

       Class: TELIND*KITCHEN                       14      0.0213

       Group: TELIND*KITCHEN                        5      0.0211

Dostları ilə paylaş:
1   ...   41   42   43   44   45   46   47   48   ...   148

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2017
rəhbərliyinə müraciət

    Ana səhifə