Program: GLM Non-Full Rank (0, 1) Coding
data dumyhmeq;
set hmeq;
j_mgr=(job='Mgr');
j_off=(job='Office');
j_other=(job='Other');
j_prof=(job='ProfExe');
j_sales=(job='Sales');
j_self=(job='Self');
run;
proc logistic data=dumyhmeq descending noprint;
model bad = j_mgr j_off j_other j_prof j_sales j_self
loan mortdue value yoj derog
clage ninq clno debtinc;
output out=logfit(keep=bad p_bad1) p=p_bad1;
title 'LOGISTIC Home Equity Data: GLM coding';
run;
proc dmdb batch data=hmeq
out=dm_data dmdbcat=dm_cat;
var loan mortdue value yoj derog
clage ninq clno debtinc;
class bad(desc)
reason(asc)
job(asc);
target bad;
run;
proc dmreg data=dm_data
dmdbcat=dm_cat
noprint;
class bad job;
model bad = job loan mortdue value yoj derog
clage ninq clno debtinc / coding=glm;
score out=dmscore;
title1 'DMREG Home Equity Data: GLM coding';
run;
proc compare data=dmscore compare=logfit note
method=absolute
criterion=1e-7;
var p_bad1;
run;
Output: GLM Non-Full Rank (0, 1) Coding
PROC COMPARE results.
Copyright 2000 by SAS Institute Inc., Cary, NC, USA. All rights reserved.
PROC FREQ step to create a classification table for the categorical
input JOB.
proc freq data=sampsio.hmeq;
tables job / missing;
title 'JOB Classification Table';
run;
SAS DATA step to replace the missing JOB values with the variable's
mode. It does not matter whether or not you perform data imputation prior
to modeling - DMREG and LOGISTIC will produce the same results if you
use the same method to code the class variables. Some of the continuous inputs
have missing values. DMREG and LOGISTIC do not use observations that have
missing values in the analysis. You can impute the missing values for the
continuous inputs by using the STDIZE procedure.
data hmeq;
set sampsio.hmeq;
if job = ' ' then job='Other';
run;
PROC TRANSREG step to create the design matrix for the classification
input JOB. The DESIGN option specifies that the goal is design matrix creation,
not analysis.
proc transreg data=hmeq design;
The MODEL statement specifies the class variable JOB. The DEVIATIONS
(or EFFECTS) t-option requests a deviations from the means coding.
model class (job/deviations);
The ID statement also specifies to output the target and the continuous
inputs to the temporary design matrix data set. PROC TRANSREG automatically
creates the macro variable &_TRGIND that contains the list of independent
variables. This macro variable is used in the MODEL statement in PROC LOGISTIC.
id bad loan mortdue value yoj derog clage ninq clno debtinc;
output;
run;
You can also create the design matrix for the classification variable(s)
in a SAS DATA step although this task is too time consuming for databases
that contain several class variables. The DATA step is commented out, but
it does demonstrate how to manually code a categorical variable using the
deviations from the MEANS method.
/*
data dumyhmeq;
set hmeq;
select (job);
when ('Mgr')
do;
j_mgr=1;
j_off=0;
j_other=0;
j_prof=0;
j_sales=0;
j_self=-1;
end;
when ('Office')
do;
j_mgr=0;
j_off=1;
j_other=0;
j_prof=0;
j_sales=0;
j_self=-1;
end;
when ('Other')
do;
j_mgr=0;
j_off=0;
j_other=1;
j_prof=0;
j_sales=0;
j_self=-1;
end;
when ('ProfExe')
do;
j_mgr=0;
j_off=0;
j_other=0;
j_prof=1;
j_sales=0;
j_self=-1;
end;
when ('Sales')
do;
j_mgr=0;
j_off=0;
j_other=0;
j_prof=0;
j_sales=1;
j_self=-1;
end;
when ('Self')
do;
j_mgr=-1;
j_off=-1;
j_other=-1;
j_prof=-1;
j_sales=-1;
j_self=-1;
end;
otherwise;
end;
run;
*/
PROC LOGISTIC step to model the binary target BAD. The macro variable
&_TRGIND obtains the classification design matrix from the subsequent
PROC TRANSREG run. The DESCENDING option causes the procedure to model the
probability that BAD = 1 (bad applicants).
proc logistic descending;
model bad = &_trgind loan mortdue value yoj
derog clage ninq clno debtinc;
title 'LOGISTIC Home Equity Data: Deviations from the Mean Coding';
run;
Dostları ilə paylaş: |