The DMINE Procedure
Example 3: Modeling a Binary Target with the DMINE
Procedure
Features:
Setting the MINR2= and STOPR2= cutoff values.
q
Specifying the target and input variables.
q
As a marketing analyst at a catalog company, you want to determine the inputs that best predict whether or not a customer
will make a purchase from your new fall outerwear catalog. The fictitious catalog mailing data set is named
SAMPSIO.DMEXA1 (stored in the sample library). The data set contains 1,966 customer cases. The binary target
(PURCHASE) contains a formatted value of "Yes" if a purchase was made and a formatted value of "No" if a purchase was
not made.
There are 48 input variables available for predicting the target. Note that AMOUNT is an interval target that is modeled in
Examples 1 and 2 of this chapter. ACCTNUM is an id variable, which is not a suitable input variable.
Program
proc dmdb batch data=sampsio.dmexa1 out=dmbexa1 dmdbcat=catexa1;
id acctnum;
var amount income homeval frequent recency age
domestic apparel leisure promo7 promo13 dpm12
county return mensware flatware homeacc lamps
linens blankets towels outdoor coats wcoat
wappar hhappar jewelry custdate numkids travtime job;
class purchase(desc) marital ntitle gender telind
aprtmnt snglmom mobile kitchen luxury dishes tmktord
statecod race origin heat numcars edlevel;
run;
proc dmine data=WORK.dmbexa1 dmdbcat=catexa1
minr2=0.020 stopr2=0.0050;
var income homeval frequent recency age
domestic apparel leisure promo7 promo13 dpm12
county return mensware flatware homeacc lamps
linens blankets towels outdoor coats wcoat
wappar hhappar jewelry custdate numkids travtime job
marital ntitle gender telind aprtmnt snglmom mobile
kitchen luxury dishes tmktord statecod race origin heat
numcars edlevel;
target purchase;
title 'DMINE: Binary Target';
run;
Output
DMINE Status Monitor
When you invoke the DMINE procedure, the Dmine Status Monitor window appears. This window monitors the execution
time of the procedure.
Partial Listing of the R-Squares for the Target Variable
This section of the output ranks all model effects by their R-square values. The degrees of freedom (DF) associated with each
effect is also listed. The significant variables are analyzed in a subsequent forward stepwise regression. Non-significant
variables are labeled as having an R2 value less than the MINR2 cutoff; these variables are not chosen in the final model.
There are four types of model effects:
Class effects are estimated for each class variable and all possible two-factor class interactions. The R-square statistic
is calculated for each class effect using a one-way analysis of variance. Two-factor interaction effects are constructed
by combining all possible levels of each class variable into one term. The degrees of freedom for a class effect is equal
to: (the number of unique factor levels minus 1). For two-factor interactions, the degrees of freedom is equal to: (the
number of levels in factor A multiplied by the number of levels in factor B minus 1). You can omit the two-factor
interaction effects from the final stepwise analysis by specifying the NOINTER option on the PROC DMINE
q
statement.
Group effects are created by reducing each class effect through an analysis of means. The degrees of freedom for each
group effect is equal to the number of levels. Since the USEGROUPS option was not specified in the PROC DMINE
statement, the group effects and the original class effects can be used in the final model.
q
VAR effects are estimated from interval variables as standard regression inputs. A simple linear regression is
performed to determine the R2 statistic for interval inputs. The degrees of freedom is always equal to 1.
q
AOV16 effects are calculated as a result of grouping numeric variables into a maximum of 16 equally spaced buckets.
AOV16 effects may account for possible non-linearity in the target variable PURCHASE. The degrees of freedom are
calculated as the number of groups. The AOV16 variables can be expensive to compute. You can prevent these
variables from being evaluated in the forward stepwise regression by specifying the NOAOV16 option in the PROC
DMINE statement.
q
For this example, the class KITCHEN*STATECOD interaction has the largest R2 with the target PURCHASE. Class and
group interactions composed of the same terms have very similar R2 values. Of all the AOV16 variables, FREQUENT has
the largest R2 value with the target.
DMINE: Binary Target
R-Squares for Target variable: PURCHASE
Effect DF R2
------------------------------------------------------------
Class: KITCHEN*STATECOD 197 0.1045
Group: KITCHEN*STATECOD 8 0.1026
Class: NTITLE*STATECOD 191 0.0955
Group: NTITLE*STATECOD 9 0.0940
Class: STATECOD*ORIGIN 166 0.0917
Group: STATECOD*ORIGIN 8 0.0899
Class: DISHES*STATECOD 169 0.0875
Group: DISHES*STATECOD 8 0.0862
Class: STATECOD*EDLEVEL 146 0.0736
Group: STATECOD*EDLEVEL 9 0.0723
Class: STATECOD*HEAT 141 0.0643
Group: STATECOD*HEAT 9 0.0633
Class: LUXURY*STATECOD 101 0.0607
Class: STATECOD*NUMCARS 121 0.0597
Group: LUXURY*STATECOD 10 0.0597
AOV16: FREQUENT 11 0.0592
Group: STATECOD*NUMCARS 9 0.0585
Class: STATECOD*RACE 105 0.0575
Class: TMKTORD*STATECOD 110 0.0571
Group: STATECOD*RACE 9 0.0566
Group: TMKTORD*STATECOD 8 0.0562
Class: MARITAL*STATECOD 107 0.0537
Group: MARITAL*STATECOD 10 0.0526
Class: GENDER*STATECOD 104 0.0514
Group: GENDER*STATECOD 10 0.0505
Var: FREQUENT 1 0.0498
Additional Effects Are Not Listed
Group: KITCHEN*EDLEVEL 7 0.0216
Class: KITCHEN*RACE 25 0.0214
Class: TELIND*KITCHEN 14 0.0213
Group: TELIND*KITCHEN 5 0.0211