PROC DMDB step to create the DMDB data set and catalog that are required
as input to DMREG.
proc dmdb batch data=hmeq
out=dm_data dmdbcat=dm_cat;
var loan mortdue value yoj derog
clage ninq clno debtinc;
class bad(desc)
job(asc);
target bad;
run;
Because the order of the target BAD was set to descending in the DMDB
data set, DMREG also models the probability that BAD=1 (bad applicants). By
default, DMREG using deviation from the means coding to create the design
matrix for the class variables.
proc dmreg data=dm_data
dmdbcat=dm_cat;
class bad job;
model bad = job loan mortdue value yoj derog
clage ninq clno debtinc;
title1 'DMREG Home Equity Data:
Default Deviations from the Mean Coding';
run;
DATA step program to code the class variable JOB using GLM non-full
rank (0, 1) coding.
data dumyhmeq;
set hmeq;
j_mgr=(job='Mgr');
j_off=(job='Office');
j_other=(job='Other');
j_prof=(job='ProfExe');
j_sales=(job='Sales');
j_self=(job='Self');
run;
PROC LOGISTIC step to model the binary target BAD.
proc logistic data=dumyhmeq descending noprint;
model bad = j_mgr j_off j_other j_prof j_sales j_self
loan mortdue value yoj derog
clage ninq clno debtinc;
output out=logfit(keep=bad p_bad1) p=p_bad1;
title 'LOGISTIC Home Equity Data: GLM coding';
run;
The NOPRINT option suppresses the printing of the DMREG output. PROC
COMPARE is used to compare the predicted values from the LOGISTIC and DMREG
models. The CODING=GLM option creates the design matrix for the class variables
using GLM non-full rank coding.
proc dmreg data=dm_data
dmdbcat=dm_cat
noprint;
class bad job;
model bad = job loan mortdue value yoj derog
clage ninq clno debtinc / coding=glm;
score out=dmscore;
title1 'DMREG Home Equity Data: GLM coding';
run;
The DMREG Procedure
References
Berry, M. J. A. and Linoff, G. (1997), Data Mining Techniques for Marketing, Sales, and
Customer Support, New York: John Wiley and Sons, Inc.
Cox, D. R. and Snell, E. J. (1989), The Analysis of Binary Data, 2nd Edition, London: Chapman
and Hall.
Draper, N. and Smith, H. (1981), Applied Regression Analysis, 2nd Edition, New York: John
Wiley and Sons, Inc.
Little, R. J. A. and Rubin, D. B. (1987), Statistical Analysis with Missing Data, New York: John
Wiley and Sons, Inc.
Little, R. J. A. (1992), "Regression with Missing X's: A review," Journal of the American
Statistical Association, 87, 1227-1237.
McCullagh, P. and Nelder, J. A. (1989), Generalized Linear Models, 2nd Edition, New York:
Chapman and Hall.
Rawlings, J. O. (1988), Applied Regression Analysis: A Research Tool, Pacific Grove,
California: Wadsworth and Brooks/Cole Advanced Books and Software.
SAS Institute Inc. (1995), Logistic Regression Examples using the SAS System, Version 6, 1st
Edition, Cary, NC: SAS Institute Inc.
SAS Institute Inc. (1997), SAS/OR Technical Report: The NLP Procedure, Cary, NC: SAS
Institute Inc.
SAS Institute Inc. (1990), SAS/STAT User's Guide, Version 6, 4th Edition, Volumes 1 and 2,
Cary, NC: SAS Institute Inc.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
The DMSPLIT Procedure
Overview
Procedure Syntax
PROC DMSPLIT Statement
FREQ Statement
TARGET Statement
VARIABLE Statement
WEIGHT Statement
Details
Examples
Example 1: Creating a Decision Tree for a Binary Target with the DMSPLIT Procedure
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
Overview
The DMSPLIT procedure performs variable selection using binary variable splits for maximizing the
Chi-Square value of a 2 X 2 frequency table. The cutoff threshold is chosen so that the Chi-Square value
of the table is maximized.
PROC DMINE and PROC DMSPLIT are underlying procedures for the Variable Selection node.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
Procedure Syntax
PROC DMSPLIT <
option(s)>;
FREQ variable;
TARGET variable;
VARIABLE variable-list;
WEIGHT variable;
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
The DMSPLIT Procedure
PROC DMSPLIT Statement
Invokes the DMSPLIT procedure.
PROC DMSPLIT <
option(s)>;
Required Arguments
DATA=SAS-data-set
Specifies an input data set generated by PROC DMDB. The data set is associated with a valid
catalog specified by the DMDBCAT= option. This data set must contain interval scaled variables
and CLASS variables in a specific form written by PROC DMDB.
Default:
None.
DMDBCAT= SAS-catalog
Identifies an input metadata catalog generated by PROC DMDB. The metadata catalog is
associated with a valid data set specified by the DATA= option. The catalog contains important
information (for example, the range of variables, number of missing values of each variable,
moments of variables) that is used by many other Enterprise Miner procedures that require a
DMDB data set. The DMDBCAT= catalog and the DATA= data set must be appropriately related
to each other in order to obtain proper results.
Default:
None.
Options
BINS=integer
Specifies the number of categories in which the range of a numeric (interval) variable is divided
for splits.
Range:
Integer > 0
Default:
100
CHISQ=number
Specifies a low bound for the Chi-Square value still eligible for variable splits. The value of
CHISQ governs the number of splits that are performed: the higher the value of CHISQ, the fewer
splits and passes of the input data will be performed.
Range:
number is a real number > 0